ollama

Author	SHA1	Message	Date
Jeffrey Morgan	5ec12cec6c	update llama.cpp submodule to `1b67731` (#3561 )	2024-04-09 15:10:17 -04:00
Jeffrey Morgan	63efa075a0	update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528 )	2024-04-07 19:29:51 -04:00
Michael Yang	be517e491c	no rope parameters	2024-04-05 18:05:27 -07:00
Michael Yang	fc8e108642	Merge pull request #3496 from ollama/mxyng/cmd-r-graph add command-r graph estimate	2024-04-05 12:26:21 -07:00
Daniel Hiltgen	dfe330fa1c	Merge pull request #3488 from mofanke/fix-windows-dll-compress fix dll compress in windows building	2024-04-04 16:12:13 -07:00
Michael Yang	01f77ae25d	add command-r graph estimate	2024-04-04 14:07:24 -07:00
Daniel Hiltgen	36bd967722	Fail fast if mingw missing on windows	2024-04-04 09:51:26 -07:00
mofanke	4de0126719	fix dll compress in windows building	2024-04-04 21:27:33 +08:00
Daniel Hiltgen	e4a7e5b2ca	Fix CI release glitches The subprocess change moved the build directory arm64 builds weren't setting cross-compilation flags when building on x86	2024-04-03 16:41:40 -07:00
Michael Yang	12e923e158	update graph size estimate	2024-04-03 13:34:12 -07:00
Jeffrey Morgan	cd135317d2	Fix macOS builds on older SDKs (#3467 )	2024-04-03 10:45:54 -07:00
Michael Yang	4f895d633f	Merge pull request #3466 from ollama/mxyng/head-kv default head_kv to 1	2024-04-03 10:41:00 -07:00
Daniel Hiltgen	464d817824	Merge pull request #3464 from dhiltgen/subprocess Fix numgpu opt miscomparison	2024-04-02 20:10:17 -07:00
Daniel Hiltgen	6589eb8a8c	Revert options as a ref in the server	2024-04-02 16:44:10 -07:00
Michael Yang	90f071c658	default head_kv to 1	2024-04-02 16:37:59 -07:00
Michael Yang	80163ebcb5	fix metal gpu	2024-04-02 16:06:45 -07:00
Daniel Hiltgen	0035e31af8	Bump to b2581	2024-04-02 11:53:07 -07:00
Daniel Hiltgen	0a0e9f3e0f	Apply 01-cache.diff	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	d338d70492	refactor model parsing	2024-04-01 13:16:15 -07:00
Patrick Devine	5a5efee46b	Add gemma safetensors conversion (#3250 ) Co-authored-by: Michael Yang <mxyng@pm.me>	2024-03-28 18:54:01 -07:00
Jeffrey Morgan	f5ca7f8c8e	add license in file header for vendored llama.cpp code (#3351 )	2024-03-26 16:23:23 -04:00
Jeffrey Morgan	856b8ec131	remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350 )	2024-03-26 16:23:16 -04:00
Patrick Devine	1b272d5bcd	change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347 )	2024-03-26 13:04:17 -07:00
Daniel Hiltgen	8091ef2eeb	Bump llama.cpp to b2527	2024-03-25 13:47:44 -07:00
Daniel Hiltgen	560be5e0b6	Merge pull request #3308 from dhiltgen/bump_more Bump llama.cpp to b2510	2024-03-25 12:56:12 -07:00
Jeremy	dfc6721b20	add support for libcudart.so for CUDA devices (adds Jetson support)	2024-03-25 11:07:44 -04:00
Blake Mizerany	acfa2b9422	llm: prevent race appending to slice (#3320 )	2024-03-24 11:35:54 -07:00
Daniel Hiltgen	3e30c75f3e	Bump llama.cpp to b2510	2024-03-23 19:55:56 +01:00
Daniel Hiltgen	43799532c1	Bump llama.cpp to b2474 The release just before ggml-cuda.cu refactoring	2024-03-23 09:54:56 +01:00
Daniel Hiltgen	74788b487c	Better tmpdir cleanup If expanding the runners fails, don't leave a corrupt/incomplete payloads dir We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs and remove this as long as there isn't still a process running.	2024-03-20 16:03:19 +01:00
Michael Yang	3c4ad0ecab	dyn global	2024-03-18 09:45:45 +01:00
Michael Yang	22f326464e	Merge pull request #3083 from ollama/mxyng/refactor-readseeker refactor readseeker	2024-03-16 12:08:56 -07:00
Jeffrey Morgan	e95ffc7448	llama: remove server static assets (#3174 )	2024-03-15 19:24:12 -07:00
Daniel Hiltgen	ab3456207b	Merge pull request #3028 from ollama/ci_release CI release process	2024-03-15 16:40:54 -07:00
Daniel Hiltgen	6ad414f31e	Merge pull request #3086 from dhiltgen/import_server Import server.cpp to retain llava support	2024-03-15 16:10:35 -07:00
Daniel Hiltgen	d4c10df2b0	Add Radeon gfx940-942 GPU support	2024-03-15 15:34:58 -07:00
Daniel Hiltgen	540f4af45f	Wire up more complete CI for releases Flesh out our github actions CI so we can build official releaes.	2024-03-15 12:37:36 -07:00
Blake Mizerany	6ce37e4d96	llm,readline: use errors.Is instead of simple == check (#3161 ) This fixes some brittle, simple equality checks to use errors.Is. Since go1.13, errors.Is is the idiomatic way to check for errors. Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-03-15 07:14:12 -07:00
Michael Yang	291c663865	fix: clip memory leak	2024-03-14 13:12:42 -07:00
Jeffrey Morgan	e72c567cfd	restore locale patch (#3091 )	2024-03-12 22:08:13 -07:00
Bruce MacDonald	3e22611200	token repeat limit for prediction requests (#3080 )	2024-03-12 22:08:25 -04:00
Bruce MacDonald	2f804068bd	warn when json format is expected but not mentioned in prompt (#3081 )	2024-03-12 19:07:11 -04:00
Daniel Hiltgen	85129d3a32	Adapt our build for imported server.cpp	2024-03-12 14:57:15 -07:00
Daniel Hiltgen	9ac6440da3	Import server.cpp as of b2356	2024-03-12 13:58:06 -07:00
Michael Yang	0085297928	refactor readseeker	2024-03-12 12:54:18 -07:00
racerole	53c107e20e	chore: fix typo (#3073 ) Signed-off-by: racerole <jiangyifeng@outlook.com>	2024-03-12 14:09:22 -04:00
Bruce MacDonald	b80661e8c7	relay load model errors to the client (#3065 )	2024-03-11 16:48:27 -04:00
Jeffrey Morgan	369eda65f5	update llama.cpp submodule to `ceca1ae` (#3064 )	2024-03-11 12:57:48 -07:00

1 2 3 4 5 ...

352 commits