Daniel Hiltgen
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
206797bda4
Fix concurrency integration test to work locally
...
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fb9cdfa723
Fix server.cpp for the new cuda build macros
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892
.
2024-06-14 14:51:40 -07:00
Jeffrey Morgan
6b800aa7b7
openai: do not set temperature to 0 when setting seed ( #5045 )
2024-06-14 13:43:56 -07:00
Jeffrey Morgan
dd7c9ebeaf
server: longer timeout in TestRequests
( #5046 )
2024-06-14 09:48:25 -07:00
Patrick Devine
4dc7fb9525
update 40xx gpu compat matrix ( #5036 )
2024-06-13 17:10:33 -07:00
Daniel Hiltgen
c39761c552
Merge pull request #5032 from dhiltgen/actually_skip
...
Actually skip PhysX on windows
2024-06-13 13:26:09 -07:00
Daniel Hiltgen
aac367636d
Actually skip PhysX on windows
2024-06-13 13:17:19 -07:00
Michael Yang
15a687ae4b
Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
...
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
Michael Yang
d528e1af75
fix utf16 for multibyte runes
2024-06-13 13:07:42 -07:00
Michael Yang
cd234ce22c
parser: add test for multibyte runes
2024-06-13 13:07:42 -07:00
Patrick Devine
94618b2365
add OLLAMA_MODELS to envconfig ( #5029 )
2024-06-13 12:52:03 -07:00
Jeffrey Morgan
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
Michael Yang
e87fc7200d
Merge pull request #5025 from ollama/mxyng/revert-parser-scan
...
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
Michael Yang
20b9f8e6f4
Revert "proper utf16 support"
...
This reverts commit 66ab48772f
.
this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Patrick Devine
c69bc19e46
move OLLAMA_HOST to envconfig ( #5009 )
2024-06-12 18:48:16 -04:00
Michael Yang
bba5d177aa
Merge pull request #5004 from ollama/mxyng/fix-templates
...
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
217f60c3d9
Merge pull request #4987 from ollama/mxyng/revert-byte-order
...
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
2024-06-11 16:04:20 -07:00
Michael Yang
7bdcd1da94
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
...
This reverts commit f5f245cc15
, reversing
changes made to 94d37fdcae
.
this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Jeffrey Morgan
ead259d877
llm: fix seed value not being applied to requests ( #4986 )
2024-06-11 14:24:41 -07:00
James Montgomery
2ff45d571d
Add Ollama-hpp to Community Libraries in README. ( #4983 )
2024-06-11 11:15:05 -07:00
jayson-cloude
157f09acdf
fix: "Skip searching for network devices"
...
On an Ubuntu 24.04 computer with vmware installed, the sudo lshw command will get stuck. "Network interfaces" is always displayed
2024-06-11 16:11:35 +08:00
Michael Yang
0f3cf1d42e
Merge pull request #4715 from ollama/mxyng/utf16-parser
...
proper utf16 support
2024-06-10 11:41:29 -07:00
Michael Yang
5bc029c529
Merge pull request #4921 from ollama/mxyng/import-md
...
update import.md
2024-06-10 11:41:09 -07:00
Michael Yang
e9a9c6a8e8
Merge pull request #4965 from ollama/mxyng/skip-layer-remove
...
fix: skip removing layers that no longer exist
2024-06-10 11:40:03 -07:00
Michael Yang
515f497e6d
fix: skip removing layers that no longer exist
2024-06-10 11:32:19 -07:00
Michael Yang
b27268aaef
add test
2024-06-10 11:32:15 -07:00
Michael Yang
f5f245cc15
Merge pull request #4938 from ollama/mxyng/fix-byte-order
...
fix parsing big endian gguf
2024-06-10 09:38:12 -07:00
Jim Scardelis
94d37fdcae
fix: examples/langchain-python-rag-privategpt/requirements.txt ( #3382 )
2024-06-09 10:58:09 -07:00
Craig Hughes
b84aea1685
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. ( #3782 )
2024-06-09 10:57:09 -07:00
Napuh
896495de7b
Add instructions to easily install specific versions on faq.md ( #4084 )
...
* Added instructions to easily install specific versions on faq.md
* Small typo
* Moved instructions on how to install specific version to linux.md
* Update docs/linux.md
* Update docs/linux.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-06-09 10:49:03 -07:00
dcasota
5528dd9d11
Error handling load_single_document() in ingest.py ( #4852 )
...
load_single_document() handles
- corrupt files
- empty (zero byte) files
- unsupported file extensions
2024-06-09 10:41:07 -07:00
Jeffrey Morgan
943172cbf4
Update api.md
2024-06-08 23:04:32 -07:00
Nischal Jain
85169e8d6f
Added headless-ollama ( #4612 )
2024-06-08 18:51:16 -07:00
Jeffrey Morgan
34f142797a
llm: always add bos token to prompt ( #4941 )
...
* fix embedding by adding fixes from llama.cpp upstream
* remove assert
---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com>
2024-06-08 18:47:10 -07:00
Erhan
46a7f1e74a
Update README.md with LangChainRust ( #4854 )
2024-06-08 17:29:36 -07:00
Michael Yang
620d5c569e
fix parsing big endian gguf
2024-06-08 12:35:26 -07:00
Michael Yang
b9ce7bf75e
update import.md
2024-06-07 16:45:15 -07:00
Daniel Hiltgen
cddc63381c
Merge pull request #4909 from dhiltgen/oneapi_disable
...
Add ability to skip oneapi generate
2024-06-07 14:07:15 -07:00
Michael Yang
385a32ecb5
Merge pull request #4910 from ollama/mxyng/detect-chat-template
...
fix create model when template detection errors
2024-06-07 11:07:39 -07:00
Michael Yang
030e765e76
fix create model when template detection errors
2024-06-07 10:51:35 -07:00
Daniel Hiltgen
ab8c929e20
Add ability to skip oneapi generate
...
This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries
2024-06-07 08:32:49 -07:00
Jeffrey Morgan
ce0dc33cb8
llm: patch to fix qwen 2 temporarily on nvidia ( #4897 )
2024-06-06 23:14:33 -07:00
Michael Yang
78f81fc0e5
Merge pull request #4800 from ollama/mxyng/detect-chat-template
...
detect chat template from KV
2024-06-06 16:17:18 -07:00
Michael Yang
9b6c2e6eb6
detect chat template from KV
2024-06-06 16:03:47 -07:00