ollama

Author	SHA1	Message	Date
Michael Yang	be517e491c	no rope parameters	2024-04-05 18:05:27 -07:00
Michael Yang	fc8e108642	Merge pull request #3496 from ollama/mxyng/cmd-r-graph add command-r graph estimate	2024-04-05 12:26:21 -07:00
Daniel Hiltgen	c5d5c4a96c	Merge pull request #3491 from dhiltgen/context_bust_test Add test case for context exhaustion	2024-04-04 16:20:20 -07:00
Daniel Hiltgen	dfe330fa1c	Merge pull request #3488 from mofanke/fix-windows-dll-compress fix dll compress in windows building	2024-04-04 16:12:13 -07:00
Michael Yang	01f77ae25d	add command-r graph estimate	2024-04-04 14:07:24 -07:00
Daniel Hiltgen	483b81a863	Merge pull request #3494 from dhiltgen/ci_release Fail fast if mingw missing on windows	2024-04-04 10:15:40 -07:00
Daniel Hiltgen	36bd967722	Fail fast if mingw missing on windows	2024-04-04 09:51:26 -07:00
Jeffrey Morgan	b0e7d35db8	use an older version of the mac os sdk in release (#3484 )	2024-04-04 09:48:54 -07:00
Daniel Hiltgen	aeb1fb5192	Add test case for context exhaustion Confirmed this fails on 0.1.30 with known regression but passes on main	2024-04-04 07:42:17 -07:00
Daniel Hiltgen	a2e60ebcaf	Merge pull request #3490 from dhiltgen/ci_fixes CI missing archive	2024-04-04 07:24:24 -07:00
Daniel Hiltgen	883ec4d1ef	CI missing archive	2024-04-04 07:23:27 -07:00
mofanke	4de0126719	fix dll compress in windows building	2024-04-04 21:27:33 +08:00
Daniel Hiltgen	9768e2dc75	Merge pull request #3481 from dhiltgen/ci_fixes CI subprocess path fix	2024-04-03 19:29:09 -07:00
Daniel Hiltgen	08600d5bec	CI subprocess path fix	2024-04-03 19:12:53 -07:00
Daniel Hiltgen	a624e672d2	Merge pull request #3479 from dhiltgen/ci_fixes Fix CI release glitches	2024-04-03 18:42:27 -07:00
Daniel Hiltgen	e4a7e5b2ca	Fix CI release glitches The subprocess change moved the build directory arm64 builds weren't setting cross-compilation flags when building on x86	2024-04-03 16:41:40 -07:00
Michael Yang	a0a15cfd5b	Merge pull request #3463 from ollama/mxyng/graph-estimate update graph size estimate	2024-04-03 14:27:30 -07:00
Michael Yang	12e923e158	update graph size estimate	2024-04-03 13:34:12 -07:00
Jeffrey Morgan	cd135317d2	Fix macOS builds on older SDKs (#3467 )	2024-04-03 10:45:54 -07:00
Michael Yang	4f895d633f	Merge pull request #3466 from ollama/mxyng/head-kv default head_kv to 1	2024-04-03 10:41:00 -07:00
Blake Mizerany	7d05a6ee8f	cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470 ) This also moves the checkServerHeartbeat call out of the "RunE" Cobra stuff (that's the only word I have for that) to on-site where it's after the check for OLLAMA_MODELS, which allows the helpful error message to be printed before the server heartbeat check. This also arguably makes the code more readable without the magic/superfluous "pre" function caller.	2024-04-02 22:11:13 -07:00
Daniel Hiltgen	464d817824	Merge pull request #3464 from dhiltgen/subprocess Fix numgpu opt miscomparison	2024-04-02 20:10:17 -07:00
Pier Francesco Contino	531324a9be	feat: add OLLAMA_DEBUG in ollama server help message (#3461 ) Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>	2024-04-02 18:20:03 -07:00
Daniel Hiltgen	6589eb8a8c	Revert options as a ref in the server	2024-04-02 16:44:10 -07:00
Michael Yang	90f071c658	default head_kv to 1	2024-04-02 16:37:59 -07:00
Michael Yang	a039e383cd	Merge pull request #3465 from ollama/mxyng/fix-metal fix metal gpu	2024-04-02 16:29:58 -07:00
Michael Yang	80163ebcb5	fix metal gpu	2024-04-02 16:06:45 -07:00
Daniel Hiltgen	a57818d93e	Merge pull request #3343 from dhiltgen/bump_more2 Bump llama.cpp to b2581	2024-04-02 15:08:26 -07:00
Daniel Hiltgen	841adda157	Fix windows lint CI flakiness	2024-04-02 12:22:16 -07:00
Daniel Hiltgen	0035e31af8	Bump to b2581	2024-04-02 11:53:07 -07:00
Daniel Hiltgen	c863c6a96d	Merge pull request #3218 from dhiltgen/subprocess Switch back to subprocessing for llama.cpp	2024-04-02 10:49:44 -07:00
Daniel Hiltgen	1f11b52511	Refined min memory from testing	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	526d4eb204	Release gpu discovery library after use Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	0a74cb31d5	Safeguard for noexec We may have users that run into problems with our current payload model, so this gives us an escape valve.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	10ed1b6292	Detect too-old cuda driver "cudart init failure: 35" isn't particularly helpful in the logs.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	4fec5816d6	Integration test improvements Cleaner shutdown logic, a bit of response hardening	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	0a0e9f3e0f	Apply 01-cache.diff	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Patrick Devine	3b6a9154dd	Simplify model conversion (#3422 )	2024-04-01 16:14:53 -07:00
Michael Yang	d6dd2ff839	Merge pull request #3241 from ollama/mxyng/mem update memory estimations for gpu offloading	2024-04-01 13:59:14 -07:00
Michael Yang	e57a6ba89f	Merge pull request #2926 from ollama/mxyng/decode-ggml-v2 refactor model parsing	2024-04-01 13:58:13 -07:00
Michael Yang	12ec2346ef	Merge pull request #3442 from ollama/mxyng/generate-output fix generate output	2024-04-01 13:56:09 -07:00
Michael Yang	1ec0df1069	fix generate output	2024-04-01 13:47:34 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	d338d70492	refactor model parsing	2024-04-01 13:16:15 -07:00
Philipp Gillé	011bb67351	Add chromem-go to community integrations (#3437 )	2024-04-01 11:17:37 -04:00
Saifeddine ALOUI	d124627202	Update README.md (#3436 )	2024-04-01 11:16:31 -04:00
Jesse Zhang	b0a8246a69	Community Integration: CRAG Ollama Chat (#3423 ) Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗 Support: - Ollama - OpenAI APIs	2024-04-01 11:16:14 -04:00
Yaroslav	e6fb39c182	Update README.md (#3378 ) Plugins list updated	2024-03-31 13:10:05 -04:00
sugarforever	e1f1c374ea	Community Integration: ChatOllama (#3400 ) * Community Integration: ChatOllama * fixed typo	2024-03-30 22:46:50 -04:00

1 2 3 4 5 ...

2321 commits