Daniel Hiltgen
05cd82ef94
Rename gpu package discover ( #7143 )
...
Cleaning up go package naming
2024-10-16 17:45:00 -07:00
Daniel Hiltgen
69be940bf6
gpu: Group GPU Library sets by variant ( #6483 )
...
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
de2fbdec99
Merge pull request #1819 from dhiltgen/multi_variant
...
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
2024-01-11 14:00:48 -08:00
Daniel Hiltgen
39928a42e8
Always dynamically load the llm server library
...
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Fabian Preiß
3bc8b9832b
fix gpu_test.go Error (same type) uint64->uint32 ( #1921 )
2024-01-11 08:22:23 -05:00
Jeffrey Morgan
c336693f07
calculate overhead based number of gpu devices ( #1875 )
2024-01-09 15:53:33 -05:00
Daniel Hiltgen
a2ad952440
Fix windows system memory lookup
...
This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.
2024-01-03 08:50:01 -08:00
Daniel Hiltgen
d966b730ac
Switch windows build to fully dynamic
...
Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
35934b2e05
Adapted rocm support to cgo based llama.cpp
2023-12-19 09:05:46 -08:00