ollama

Author	SHA1	Message	Date
Jeffrey Morgan	5ec12cec6c	update llama.cpp submodule to `1b67731` (#3561 )	2024-04-09 15:10:17 -04:00
Michael Yang	d9578d2bad	Merge pull request #3559 from ollama/mxyng/ci ci: use go-version-file	2024-04-09 11:03:18 -07:00
Michael Yang	cb8352d6b4	ci: use go-version-file	2024-04-09 09:50:12 -07:00
Alex Mavrogiannis	fc6558f47f	Correct directory reference in macapp/README (#3555 )	2024-04-09 09:48:46 -04:00
Michael Yang	9502e5661f	cgo quantize	2024-04-08 15:31:08 -07:00
Michael Yang	e1c9a2a00f	no blob create if already exists	2024-04-08 15:09:48 -07:00
writinwaters	1341ee1b56	Update README.md (#3539 ) RAGFlow now supports integration with Ollama.	2024-04-08 10:58:14 -04:00
Jeffrey Morgan	63efa075a0	update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528 )	2024-04-07 19:29:51 -04:00
Thomas Vitale	cb03fc9571	Docs: Remove wrong parameter for Chat Completion (#3515 ) Fixes gh-3514 Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>	2024-04-06 09:08:35 -07:00
Michael Yang	a5ec9cfc0f	Merge pull request #3508 from ollama/mxyng/rope	2024-04-05 18:46:06 -07:00
Michael Yang	be517e491c	no rope parameters	2024-04-05 18:05:27 -07:00
Michael Yang	fc8e108642	Merge pull request #3496 from ollama/mxyng/cmd-r-graph add command-r graph estimate	2024-04-05 12:26:21 -07:00
Daniel Hiltgen	c5d5c4a96c	Merge pull request #3491 from dhiltgen/context_bust_test Add test case for context exhaustion	2024-04-04 16:20:20 -07:00
Daniel Hiltgen	dfe330fa1c	Merge pull request #3488 from mofanke/fix-windows-dll-compress fix dll compress in windows building	2024-04-04 16:12:13 -07:00
Michael Yang	01f77ae25d	add command-r graph estimate	2024-04-04 14:07:24 -07:00
Daniel Hiltgen	483b81a863	Merge pull request #3494 from dhiltgen/ci_release Fail fast if mingw missing on windows	2024-04-04 10:15:40 -07:00
Daniel Hiltgen	36bd967722	Fail fast if mingw missing on windows	2024-04-04 09:51:26 -07:00
Jeffrey Morgan	b0e7d35db8	use an older version of the mac os sdk in release (#3484 )	2024-04-04 09:48:54 -07:00
Daniel Hiltgen	aeb1fb5192	Add test case for context exhaustion Confirmed this fails on 0.1.30 with known regression but passes on main	2024-04-04 07:42:17 -07:00
Daniel Hiltgen	a2e60ebcaf	Merge pull request #3490 from dhiltgen/ci_fixes CI missing archive	2024-04-04 07:24:24 -07:00
Daniel Hiltgen	883ec4d1ef	CI missing archive	2024-04-04 07:23:27 -07:00
mofanke	4de0126719	fix dll compress in windows building	2024-04-04 21:27:33 +08:00
Daniel Hiltgen	9768e2dc75	Merge pull request #3481 from dhiltgen/ci_fixes CI subprocess path fix	2024-04-03 19:29:09 -07:00
Daniel Hiltgen	08600d5bec	CI subprocess path fix	2024-04-03 19:12:53 -07:00
Daniel Hiltgen	a624e672d2	Merge pull request #3479 from dhiltgen/ci_fixes Fix CI release glitches	2024-04-03 18:42:27 -07:00
Daniel Hiltgen	e4a7e5b2ca	Fix CI release glitches The subprocess change moved the build directory arm64 builds weren't setting cross-compilation flags when building on x86	2024-04-03 16:41:40 -07:00
Michael Yang	a0a15cfd5b	Merge pull request #3463 from ollama/mxyng/graph-estimate update graph size estimate	2024-04-03 14:27:30 -07:00
Michael Yang	12e923e158	update graph size estimate	2024-04-03 13:34:12 -07:00
Jeffrey Morgan	cd135317d2	Fix macOS builds on older SDKs (#3467 )	2024-04-03 10:45:54 -07:00
Michael Yang	4f895d633f	Merge pull request #3466 from ollama/mxyng/head-kv default head_kv to 1	2024-04-03 10:41:00 -07:00
Blake Mizerany	7d05a6ee8f	cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470 ) This also moves the checkServerHeartbeat call out of the "RunE" Cobra stuff (that's the only word I have for that) to on-site where it's after the check for OLLAMA_MODELS, which allows the helpful error message to be printed before the server heartbeat check. This also arguably makes the code more readable without the magic/superfluous "pre" function caller.	2024-04-02 22:11:13 -07:00
Daniel Hiltgen	464d817824	Merge pull request #3464 from dhiltgen/subprocess Fix numgpu opt miscomparison	2024-04-02 20:10:17 -07:00
Pier Francesco Contino	531324a9be	feat: add OLLAMA_DEBUG in ollama server help message (#3461 ) Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>	2024-04-02 18:20:03 -07:00
Daniel Hiltgen	6589eb8a8c	Revert options as a ref in the server	2024-04-02 16:44:10 -07:00
Michael Yang	90f071c658	default head_kv to 1	2024-04-02 16:37:59 -07:00
Michael Yang	a039e383cd	Merge pull request #3465 from ollama/mxyng/fix-metal fix metal gpu	2024-04-02 16:29:58 -07:00
Michael Yang	80163ebcb5	fix metal gpu	2024-04-02 16:06:45 -07:00
Daniel Hiltgen	a57818d93e	Merge pull request #3343 from dhiltgen/bump_more2 Bump llama.cpp to b2581	2024-04-02 15:08:26 -07:00
Daniel Hiltgen	841adda157	Fix windows lint CI flakiness	2024-04-02 12:22:16 -07:00
Daniel Hiltgen	0035e31af8	Bump to b2581	2024-04-02 11:53:07 -07:00
Daniel Hiltgen	c863c6a96d	Merge pull request #3218 from dhiltgen/subprocess Switch back to subprocessing for llama.cpp	2024-04-02 10:49:44 -07:00
Daniel Hiltgen	1f11b52511	Refined min memory from testing	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	526d4eb204	Release gpu discovery library after use Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	0a74cb31d5	Safeguard for noexec We may have users that run into problems with our current payload model, so this gives us an escape valve.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	10ed1b6292	Detect too-old cuda driver "cudart init failure: 35" isn't particularly helpful in the logs.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	4fec5816d6	Integration test improvements Cleaner shutdown logic, a bit of response hardening	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	0a0e9f3e0f	Apply 01-cache.diff	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Patrick Devine	3b6a9154dd	Simplify model conversion (#3422 )	2024-04-01 16:14:53 -07:00
Michael Yang	d6dd2ff839	Merge pull request #3241 from ollama/mxyng/mem update memory estimations for gpu offloading	2024-04-01 13:59:14 -07:00

... 2 3 4 5 6 ...

2481 commits