ollama

Author	SHA1	Message	Date
Jeffrey Morgan	38255d2af1	Use flash attention flag for now (#4580 ) * put flash attention behind flag for now * add test * remove print * up timeout for sheduler tests	2024-05-22 21:52:09 -07:00
Sang Park	4434d7f447	Correct typo in error message (#4535 ) The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.	2024-05-21 13:39:01 -07:00
Michael Yang	807d092761	fix quantize file types	2024-05-20 15:22:11 -07:00
Michael Yang	f36f1d6be9	tidy intermediate blobs	2024-05-20 15:15:06 -07:00
Michael Yang	3520c0e4d5	cache and reuse intermediate blobs particularly useful for zipfiles and f16s	2024-05-20 13:25:10 -07:00
Patrick Devine	ccdf0b2a44	Move the parser back + handle utf16 files (#4533 )	2024-05-20 11:26:45 -07:00
Daniel Hiltgen	02b31c9dc8	Don't return error on signal exit	2024-05-16 16:25:38 -07:00
Michael Yang	84ed77cbd8	Merge pull request #4436 from ollama/mxyng/done-part return on part done	2024-05-15 17:16:24 -07:00
Patrick Devine	d1692fd3e0	fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461 )	2024-05-15 15:43:16 -07:00
Patrick Devine	f2cf97d6f1	fix typo in modelfile generation (#4439 )	2024-05-14 15:34:29 -07:00
Michael Yang	ac145f75ca	return on part done	2024-05-14 13:04:30 -07:00
Ryo Machida	798b107f19	Fixed the API endpoint /api/tags when the model list is empty. (#4424 ) * Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty. * Update server/routes.go --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-14 11:18:10 -07:00
Daniel Hiltgen	ec231a7923	Remove VRAM convergence check for windows The APIs we query are optimistic on free space, and windows pages VRAM, so we don't have to wait to see reported usage recover on unload	2024-05-14 09:53:46 -07:00
Patrick Devine	7ca71a6b0f	don't abort when an invalid model name is used in /save (#4416 )	2024-05-13 18:48:28 -07:00
Patrick Devine	6845988807	Ollama `ps` command for showing currently loaded models (#4327 )	2024-05-13 17:17:36 -07:00
jmorganca	4ec7445a6f	Revert "use post token" This reverts commit `0fec3525ad`.	2024-05-11 22:19:14 -07:00
Michael Yang	0fec3525ad	use post token	2024-05-11 19:13:16 -07:00
Daniel Hiltgen	824ee5446f	Fix envconfig unit test	2024-05-10 16:49:48 -07:00
Daniel Hiltgen	4142c3ef7c	Always use the sorted list of GPUs Make sure the first GPU has the most free space	2024-05-10 13:53:21 -07:00
Jeffrey Morgan	6602e793c0	Use `--quantize` flag and `quantize` api parameter (#4321 ) * rename `--quantization` to `--quantize` * backwards * Update api/types.go Co-authored-by: Michael Yang <mxyng@pm.me> --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2024-05-10 13:06:13 -07:00
Jeffrey Morgan	bb6fd02298	Don't clamp ctx size in `PredictServerFit` (#4317 ) * dont clamp ctx size in `PredictServerFit` * minimum 4 context * remove context warning	2024-05-10 10:17:12 -07:00
Michael Yang	e03637176d	fix(routes): skip bad manifests	2024-05-10 08:46:11 -07:00
Jeffrey Morgan	302d7fdbf3	prune partial downloads (#4272 )	2024-05-09 16:35:20 -07:00
Daniel Hiltgen	3ae2f441e0	Fix race in shutdown logic Ensure the runners are terminated	2024-05-09 15:54:02 -07:00
Daniel Hiltgen	354ad9254e	Wait for GPU free memory reporting to converge The GPU drivers take a while to update their free memory reporting, so we need to wait until the values converge with what we're expecting before proceeding to start another runner in order to get an accurate picture.	2024-05-09 14:56:01 -07:00
Daniel Hiltgen	8727a9c140	Record more GPU information This cleans up the logging for GPU discovery a bit, and can serve as a foundation to report GPU information in a future UX.	2024-05-09 14:18:14 -07:00
Bruce MacDonald	cfa84b8470	add done_reason to the api (#4235 )	2024-05-09 13:30:14 -07:00
Michael Yang	a7ee84fc31	routes: skip invalid filepaths	2024-05-09 11:23:22 -07:00
Jeffrey Morgan	d5eec16d23	use model defaults for `num_gqa`, `rope_frequency_base` and `rope_frequency_scale` (#1983 )	2024-05-09 09:06:13 -07:00
Bruce MacDonald	cef45feaa4	Add preflight OPTIONS handling and update CORS config (#4086 ) * Add preflight OPTIONS handling and update CORS config - Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling. - Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set. * allow auth, content-type, and user-agent headers * Update routes.go	2024-05-08 13:14:00 -07:00
Michael Yang	b25976aeb8	routes: fix show llava models	2024-05-08 12:43:36 -07:00
Michael Yang	88cf154483	Merge pull request #4244 from ollama/mxyng/skip-if-same skip if same quantization	2024-05-07 19:03:37 -07:00
Bruce MacDonald	8cbd3e7510	skip hidden files in list models handler (#4247 )	2024-05-07 19:01:45 -07:00
Michael Yang	eeb695261f	skip if same quantization	2024-05-07 17:44:19 -07:00
Bruce MacDonald	dc9b1111e0	fix invalid destination error message	2024-05-07 17:35:52 -07:00
Michael Yang	ffbd3d173f	Merge pull request #3715 from ollama/mxyng/modelname-2 update list handler to use model.Name	2024-05-07 15:21:39 -07:00
Michael Yang	1e0a669f75	Merge pull request #3682 from ollama/mxyng/quantize-all-the-things quantize any fp16/fp32 model	2024-05-07 15:20:49 -07:00
Michael Yang	548a7df014	update list handler to use model.Name	2024-05-07 09:38:45 -07:00
Jeffrey Morgan	39d9d22ca3	close server on receiving signal (#4213 )	2024-05-06 16:01:37 -07:00
Michael Yang	b2f00aa977	close zip files	2024-05-06 15:27:19 -07:00
Michael Yang	f5e8b207fb	s/DisplayLongest/String/	2024-05-06 15:24:01 -07:00
Michael Yang	d245460362	only quantize language models	2024-05-06 15:24:01 -07:00
Michael Yang	4d0d0fa383	no iterator	2024-05-06 15:24:01 -07:00
Michael Yang	7ffe45734d	rebase	2024-05-06 15:24:01 -07:00
Michael Yang	01811c176a	comments	2024-05-06 15:24:01 -07:00
Michael Yang	a7248f6ea8	update tests	2024-05-06 15:24:01 -07:00
Michael Yang	9685c34509	quantize any fp16/fp32 model - FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}	2024-05-06 15:24:01 -07:00
Daniel Hiltgen	0963c65027	Merge pull request #4208 from dhiltgen/fix_sched_test Fix stale test logic	2024-05-06 14:23:12 -07:00
Jeffrey Morgan	c9f98622b1	Skip scheduling cancelled requests, always reload unloaded runners (#4189 )	2024-05-06 14:22:24 -07:00
Daniel Hiltgen	0a954e5066	Fix stale test logic The model processing was recently changed to be deferred but this test scenario hadn't been adjusted for that change in behavior.	2024-05-06 14:15:37 -07:00

1 2 3 4 5 ...

554 commits