Commit graph

357 commits

Author SHA1 Message Date
Daniel Hiltgen
c5ff443b9f Handle very slow model loads
During testing, we're seeing some models take over 3 minutes.
2024-04-09 16:35:10 -07:00
Blake Mizerany
1524f323a3
Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 15:57:45 -07:00
Blake Mizerany
fccf3eecaa
build.go: introduce a friendlier way to build Ollama (#3548)
This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.

This script also provides nicer feedback to the user about what is
happening during the build process.

At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).
2024-04-09 14:18:47 -07:00
Michael Yang
c77d45d836
Merge pull request #3506 from ollama/mxyng/quantize-redux
cgo quantize
2024-04-09 12:32:53 -07:00
Jeffrey Morgan
5ec12cec6c
update llama.cpp submodule to 1b67731 (#3561) 2024-04-09 15:10:17 -04:00
Michael Yang
9502e5661f cgo quantize 2024-04-08 15:31:08 -07:00
Jeffrey Morgan
63efa075a0
update generate scripts with new LLAMA_CUDA variable, set HIP_PLATFORM to avoid compiler errors (#3528) 2024-04-07 19:29:51 -04:00
Michael Yang
be517e491c no rope parameters 2024-04-05 18:05:27 -07:00
Michael Yang
fc8e108642
Merge pull request #3496 from ollama/mxyng/cmd-r-graph
add command-r graph estimate
2024-04-05 12:26:21 -07:00
Daniel Hiltgen
dfe330fa1c
Merge pull request #3488 from mofanke/fix-windows-dll-compress
fix dll compress in windows building
2024-04-04 16:12:13 -07:00
Michael Yang
01f77ae25d add command-r graph estimate 2024-04-04 14:07:24 -07:00
Daniel Hiltgen
36bd967722 Fail fast if mingw missing on windows 2024-04-04 09:51:26 -07:00
mofanke
4de0126719 fix dll compress in windows building 2024-04-04 21:27:33 +08:00
Daniel Hiltgen
e4a7e5b2ca Fix CI release glitches
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
2024-04-03 16:41:40 -07:00
Michael Yang
12e923e158 update graph size estimate 2024-04-03 13:34:12 -07:00
Jeffrey Morgan
cd135317d2
Fix macOS builds on older SDKs (#3467) 2024-04-03 10:45:54 -07:00
Michael Yang
4f895d633f
Merge pull request #3466 from ollama/mxyng/head-kv
default head_kv to 1
2024-04-03 10:41:00 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Daniel Hiltgen
6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658 default head_kv to 1 2024-04-02 16:37:59 -07:00
Michael Yang
80163ebcb5 fix metal gpu 2024-04-02 16:06:45 -07:00
Daniel Hiltgen
0035e31af8 Bump to b2581 2024-04-02 11:53:07 -07:00
Daniel Hiltgen
0a0e9f3e0f Apply 01-cache.diff 2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Michael Yang
91b3e4d282 update memory calcualtions
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492 refactor model parsing 2024-04-01 13:16:15 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion (#3250)
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Jeffrey Morgan
f5ca7f8c8e
add license in file header for vendored llama.cpp code (#3351) 2024-03-26 16:23:23 -04:00
Jeffrey Morgan
856b8ec131
remove need for $VSINSTALLDIR since build will fail if ninja cannot be found (#3350) 2024-03-26 16:23:16 -04:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Daniel Hiltgen
8091ef2eeb Bump llama.cpp to b2527 2024-03-25 13:47:44 -07:00
Daniel Hiltgen
560be5e0b6
Merge pull request #3308 from dhiltgen/bump_more
Bump llama.cpp to b2510
2024-03-25 12:56:12 -07:00
Jeremy
dfc6721b20 add support for libcudart.so for CUDA devices (adds Jetson support) 2024-03-25 11:07:44 -04:00
Blake Mizerany
acfa2b9422
llm: prevent race appending to slice (#3320) 2024-03-24 11:35:54 -07:00
Daniel Hiltgen
3e30c75f3e Bump llama.cpp to b2510 2024-03-23 19:55:56 +01:00
Daniel Hiltgen
43799532c1 Bump llama.cpp to b2474
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Daniel Hiltgen
74788b487c Better tmpdir cleanup
If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.
2024-03-20 16:03:19 +01:00
Michael Yang
3c4ad0ecab dyn global 2024-03-18 09:45:45 +01:00
Michael Yang
22f326464e
Merge pull request #3083 from ollama/mxyng/refactor-readseeker
refactor readseeker
2024-03-16 12:08:56 -07:00
Jeffrey Morgan
e95ffc7448
llama: remove server static assets (#3174) 2024-03-15 19:24:12 -07:00
Daniel Hiltgen
ab3456207b
Merge pull request #3028 from ollama/ci_release
CI release process
2024-03-15 16:40:54 -07:00
Daniel Hiltgen
6ad414f31e
Merge pull request #3086 from dhiltgen/import_server
Import server.cpp to retain llava support
2024-03-15 16:10:35 -07:00
Daniel Hiltgen
d4c10df2b0 Add Radeon gfx940-942 GPU support 2024-03-15 15:34:58 -07:00
Daniel Hiltgen
540f4af45f Wire up more complete CI for releases
Flesh out our github actions CI so we can build official releaes.
2024-03-15 12:37:36 -07:00
Blake Mizerany
6ce37e4d96
llm,readline: use errors.Is instead of simple == check (#3161)
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-03-15 07:14:12 -07:00
Michael Yang
291c663865 fix: clip memory leak 2024-03-14 13:12:42 -07:00
Jeffrey Morgan
e72c567cfd
restore locale patch (#3091) 2024-03-12 22:08:13 -07:00
Bruce MacDonald
3e22611200
token repeat limit for prediction requests (#3080) 2024-03-12 22:08:25 -04:00
Bruce MacDonald
2f804068bd
warn when json format is expected but not mentioned in prompt (#3081) 2024-03-12 19:07:11 -04:00
Daniel Hiltgen
85129d3a32 Adapt our build for imported server.cpp 2024-03-12 14:57:15 -07:00