Jeffrey Morgan
5ec12cec6c
update llama.cpp submodule to 1b67731
( #3561 )
2024-04-09 15:10:17 -04:00
Jeffrey Morgan
63efa075a0
update generate scripts with new LLAMA_CUDA
variable, set HIP_PLATFORM
to avoid compiler errors ( #3528 )
2024-04-07 19:29:51 -04:00
Michael Yang
be517e491c
no rope parameters
2024-04-05 18:05:27 -07:00
Michael Yang
fc8e108642
Merge pull request #3496 from ollama/mxyng/cmd-r-graph
...
add command-r graph estimate
2024-04-05 12:26:21 -07:00
Daniel Hiltgen
dfe330fa1c
Merge pull request #3488 from mofanke/fix-windows-dll-compress
...
fix dll compress in windows building
2024-04-04 16:12:13 -07:00
Michael Yang
01f77ae25d
add command-r graph estimate
2024-04-04 14:07:24 -07:00
Daniel Hiltgen
36bd967722
Fail fast if mingw missing on windows
2024-04-04 09:51:26 -07:00
mofanke
4de0126719
fix dll compress in windows building
2024-04-04 21:27:33 +08:00
Daniel Hiltgen
e4a7e5b2ca
Fix CI release glitches
...
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
2024-04-03 16:41:40 -07:00
Michael Yang
12e923e158
update graph size estimate
2024-04-03 13:34:12 -07:00
Jeffrey Morgan
cd135317d2
Fix macOS builds on older SDKs ( #3467 )
2024-04-03 10:45:54 -07:00
Michael Yang
4f895d633f
Merge pull request #3466 from ollama/mxyng/head-kv
...
default head_kv to 1
2024-04-03 10:41:00 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658
default head_kv to 1
2024-04-02 16:37:59 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Jeffrey Morgan
f5ca7f8c8e
add license in file header for vendored llama.cpp code ( #3351 )
2024-03-26 16:23:23 -04:00
Jeffrey Morgan
856b8ec131
remove need for $VSINSTALLDIR
since build will fail if ninja
cannot be found ( #3350 )
2024-03-26 16:23:16 -04:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama
to github.com/ollama/ollama
( #3347 )
2024-03-26 13:04:17 -07:00
Daniel Hiltgen
8091ef2eeb
Bump llama.cpp to b2527
2024-03-25 13:47:44 -07:00
Daniel Hiltgen
560be5e0b6
Merge pull request #3308 from dhiltgen/bump_more
...
Bump llama.cpp to b2510
2024-03-25 12:56:12 -07:00
Jeremy
dfc6721b20
add support for libcudart.so for CUDA devices (adds Jetson support)
2024-03-25 11:07:44 -04:00
Blake Mizerany
acfa2b9422
llm: prevent race appending to slice ( #3320 )
2024-03-24 11:35:54 -07:00
Daniel Hiltgen
3e30c75f3e
Bump llama.cpp to b2510
2024-03-23 19:55:56 +01:00
Daniel Hiltgen
43799532c1
Bump llama.cpp to b2474
...
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Daniel Hiltgen
74788b487c
Better tmpdir cleanup
...
If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.
2024-03-20 16:03:19 +01:00
Michael Yang
3c4ad0ecab
dyn global
2024-03-18 09:45:45 +01:00
Michael Yang
22f326464e
Merge pull request #3083 from ollama/mxyng/refactor-readseeker
...
refactor readseeker
2024-03-16 12:08:56 -07:00
Jeffrey Morgan
e95ffc7448
llama: remove server static assets ( #3174 )
2024-03-15 19:24:12 -07:00
Daniel Hiltgen
ab3456207b
Merge pull request #3028 from ollama/ci_release
...
CI release process
2024-03-15 16:40:54 -07:00
Daniel Hiltgen
6ad414f31e
Merge pull request #3086 from dhiltgen/import_server
...
Import server.cpp to retain llava support
2024-03-15 16:10:35 -07:00
Daniel Hiltgen
d4c10df2b0
Add Radeon gfx940-942 GPU support
2024-03-15 15:34:58 -07:00
Daniel Hiltgen
540f4af45f
Wire up more complete CI for releases
...
Flesh out our github actions CI so we can build official releaes.
2024-03-15 12:37:36 -07:00
Blake Mizerany
6ce37e4d96
llm,readline: use errors.Is instead of simple == check ( #3161 )
...
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-03-15 07:14:12 -07:00
Michael Yang
291c663865
fix: clip memory leak
2024-03-14 13:12:42 -07:00
Jeffrey Morgan
e72c567cfd
restore locale patch ( #3091 )
2024-03-12 22:08:13 -07:00
Bruce MacDonald
3e22611200
token repeat limit for prediction requests ( #3080 )
2024-03-12 22:08:25 -04:00
Bruce MacDonald
2f804068bd
warn when json format is expected but not mentioned in prompt ( #3081 )
2024-03-12 19:07:11 -04:00
Daniel Hiltgen
85129d3a32
Adapt our build for imported server.cpp
2024-03-12 14:57:15 -07:00
Daniel Hiltgen
9ac6440da3
Import server.cpp as of b2356
2024-03-12 13:58:06 -07:00
Michael Yang
0085297928
refactor readseeker
2024-03-12 12:54:18 -07:00
racerole
53c107e20e
chore: fix typo ( #3073 )
...
Signed-off-by: racerole <jiangyifeng@outlook.com>
2024-03-12 14:09:22 -04:00
Bruce MacDonald
b80661e8c7
relay load model errors to the client ( #3065 )
2024-03-11 16:48:27 -04:00
Jeffrey Morgan
369eda65f5
update llama.cpp submodule to ceca1ae
( #3064 )
2024-03-11 12:57:48 -07:00