ollama

Author	SHA1	Message	Date
Blake Mizerany	7d05a6ee8f	cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470 ) This also moves the checkServerHeartbeat call out of the "RunE" Cobra stuff (that's the only word I have for that) to on-site where it's after the check for OLLAMA_MODELS, which allows the helpful error message to be printed before the server heartbeat check. This also arguably makes the code more readable without the magic/superfluous "pre" function caller.	2024-04-02 22:11:13 -07:00
Daniel Hiltgen	464d817824	Merge pull request #3464 from dhiltgen/subprocess Fix numgpu opt miscomparison	2024-04-02 20:10:17 -07:00
Pier Francesco Contino	531324a9be	feat: add OLLAMA_DEBUG in ollama server help message (#3461 ) Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>	2024-04-02 18:20:03 -07:00
Daniel Hiltgen	6589eb8a8c	Revert options as a ref in the server	2024-04-02 16:44:10 -07:00
Michael Yang	90f071c658	default head_kv to 1	2024-04-02 16:37:59 -07:00
Michael Yang	a039e383cd	Merge pull request #3465 from ollama/mxyng/fix-metal fix metal gpu	2024-04-02 16:29:58 -07:00
Michael Yang	80163ebcb5	fix metal gpu	2024-04-02 16:06:45 -07:00
Daniel Hiltgen	a57818d93e	Merge pull request #3343 from dhiltgen/bump_more2 Bump llama.cpp to b2581	2024-04-02 15:08:26 -07:00
Daniel Hiltgen	841adda157	Fix windows lint CI flakiness	2024-04-02 12:22:16 -07:00
Daniel Hiltgen	0035e31af8	Bump to b2581	2024-04-02 11:53:07 -07:00
Daniel Hiltgen	c863c6a96d	Merge pull request #3218 from dhiltgen/subprocess Switch back to subprocessing for llama.cpp	2024-04-02 10:49:44 -07:00
Daniel Hiltgen	1f11b52511	Refined min memory from testing	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	526d4eb204	Release gpu discovery library after use Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	0a74cb31d5	Safeguard for noexec We may have users that run into problems with our current payload model, so this gives us an escape valve.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	10ed1b6292	Detect too-old cuda driver "cudart init failure: 35" isn't particularly helpful in the logs.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	4fec5816d6	Integration test improvements Cleaner shutdown logic, a bit of response hardening	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	0a0e9f3e0f	Apply 01-cache.diff	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Patrick Devine	3b6a9154dd	Simplify model conversion (#3422 )	2024-04-01 16:14:53 -07:00
Michael Yang	d6dd2ff839	Merge pull request #3241 from ollama/mxyng/mem update memory estimations for gpu offloading	2024-04-01 13:59:14 -07:00
Michael Yang	e57a6ba89f	Merge pull request #2926 from ollama/mxyng/decode-ggml-v2 refactor model parsing	2024-04-01 13:58:13 -07:00
Michael Yang	12ec2346ef	Merge pull request #3442 from ollama/mxyng/generate-output fix generate output	2024-04-01 13:56:09 -07:00
Michael Yang	1ec0df1069	fix generate output	2024-04-01 13:47:34 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	d338d70492	refactor model parsing	2024-04-01 13:16:15 -07:00
Philipp Gillé	011bb67351	Add chromem-go to community integrations (#3437 )	2024-04-01 11:17:37 -04:00
Saifeddine ALOUI	d124627202	Update README.md (#3436 )	2024-04-01 11:16:31 -04:00
Jesse Zhang	b0a8246a69	Community Integration: CRAG Ollama Chat (#3423 ) Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗 Support: - Ollama - OpenAI APIs	2024-04-01 11:16:14 -04:00
Yaroslav	e6fb39c182	Update README.md (#3378 ) Plugins list updated	2024-03-31 13:10:05 -04:00
sugarforever	e1f1c374ea	Community Integration: ChatOllama (#3400 ) * Community Integration: ChatOllama * fixed typo	2024-03-30 22:46:50 -04:00
Jeffrey Morgan	06a1508bfe	Update 90_bug_report.yml	2024-03-29 10:11:17 -04:00
Patrick Devine	5a5efee46b	Add gemma safetensors conversion (#3250 ) Co-authored-by: Michael Yang <mxyng@pm.me>	2024-03-28 18:54:01 -07:00
Daniel Hiltgen	97ae517fbf	Merge pull request #3398 from dhiltgen/release_latest CI automation for tagging latest images	2024-03-28 16:25:54 -07:00
Daniel Hiltgen	44b813e459	Merge pull request #3377 from dhiltgen/rocm_v6_bump Bump ROCm to 6.0.2 patch release	2024-03-28 16:07:54 -07:00
Daniel Hiltgen	539043f5e0	CI automation for tagging latest images	2024-03-28 16:07:37 -07:00
Daniel Hiltgen	dbcace6847	Merge pull request #3392 from dhiltgen/ci_build_win_cuda CI windows gpu builds	2024-03-28 16:03:52 -07:00
Daniel Hiltgen	c91a4ebcff	Bump ROCm to 6.0.2 patch release	2024-03-28 15:58:57 -07:00
Daniel Hiltgen	b79c7e4528	CI windows gpu builds If we're doing generate, test windows cuda and rocm as well	2024-03-28 14:39:10 -07:00
Michael Yang	035b274b70	Merge pull request #3379 from ollama/mxyng/origins fix: trim quotes on OLLAMA_ORIGINS	2024-03-28 14:14:18 -07:00
Michael Yang	9c6a254945	Merge pull request #3391 from ollama/mxyng-patch-1	2024-03-28 13:15:56 -07:00
Michael Yang	f31f2bedf4	Update troubleshooting link	2024-03-28 12:05:26 -07:00
Michael Yang	756c257553	Merge pull request #3380 from ollama/mxyng/conditional-generate fix: workflows	2024-03-28 00:35:27 +01:00
Michael Yang	5255d0af8a	fix: workflows	2024-03-27 16:30:01 -07:00
Michael Yang	af8a8a6b59	fix: trim quotes on OLLAMA_ORIGINS	2024-03-27 15:24:29 -07:00
Michael Yang	461ad25015	Merge pull request #3376 from ollama/mxyng/conditional-generate only generate on changes to llm subdirectory	2024-03-27 22:12:53 +01:00
Michael Yang	8838ae787d	stub stub	2024-03-27 13:59:12 -07:00
Michael Yang	db75402ade	mangle arch	2024-03-27 13:44:50 -07:00
Michael Yang	1e85a140a3	only generate on changes to llm subdirectory	2024-03-27 12:45:35 -07:00
Michael Yang	c363282fdc	Merge pull request #3375 from ollama/mxyng/conditional-generate only generate cuda/rocm when changes to llm detected	2024-03-27 20:40:55 +01:00
Michael Yang	5b0c48d29e	only generate cuda/rocm when changes to llm detected	2024-03-27 12:23:09 -07:00

... 7 8 9 10 11 ...

2702 commits