Blake Mizerany
c8af3c2d96
server: reuse original download URL for images ( #5962 )
...
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Jeffrey Morgan
455e61170d
Update openai.md
2024-07-25 18:34:47 -04:00
royjhan
4de1370a9d
openai tools doc ( #5617 )
2024-07-25 18:34:06 -04:00
Jeffrey Morgan
bbf8f102ee
Revert "llm(llama): pass rope factors ( #5924 )" ( #5963 )
...
This reverts commit bb46bbcf5e
.
2024-07-25 18:24:55 -04:00
Daniel Hiltgen
ce3c93b08f
Report better error on cuda unsupported os/arch
...
If we detect an NVIDIA GPU, but nvidia doesn't support the os/arch,
this will report a better error for the user and point them to docs
to self-install the drivers if possible.
2024-07-24 17:09:20 -07:00
Daniel Hiltgen
6c2129d5d0
Explain font problems on windows 10
2024-07-24 15:22:00 -07:00
Daniel Hiltgen
7c2a157ca4
Ensure amd gpu nodes are numerically sorted
...
For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.
2024-07-24 13:43:26 -07:00
Michael Yang
bb46bbcf5e
llm(llama): pass rope factors ( #5924 )
2024-07-24 16:05:59 -04:00
royjhan
ac33aa7d37
Fix Embed Test Flakes ( #5893 )
...
* float cmp
* increase tolerance
2024-07-24 11:15:46 -07:00
Daniel Hiltgen
830fdd2715
Better explain multi-gpu behavior
2024-07-23 15:16:38 -07:00
Ajay Chintala
a6cd8f6169
Update README.md to add LLMStack integration ( #5799 )
2024-07-23 14:40:23 -04:00
Daniel Hiltgen
c78089263a
Merge pull request #5864 from dhiltgen/bump_go
...
Bump Go patch version
2024-07-22 16:34:18 -07:00
Daniel Hiltgen
3e5ea035d5
Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities
...
bump go version to 1.22.5 to fix security vulnerabilities in docker
2024-07-22 16:32:43 -07:00
Daniel Hiltgen
5d604eec5b
Bump Go patch version
2024-07-22 16:16:28 -07:00
Josh
db0968f30c
fix dupe err message ( #5857 )
2024-07-22 15:48:15 -07:00
Daniel Hiltgen
e12fff8810
Enable windows error dialog for subprocess startup
...
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it. Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
9b60a038e5
update api.md
2024-07-22 13:49:51 -07:00
Michael Yang
83a0cb8d88
docs
2024-07-22 13:38:09 -07:00
royjhan
c0648233f2
api embed docs ( #5282 )
2024-07-22 13:37:08 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim
for mistral ( #5818 )
2024-07-22 16:16:22 -04:00
Michael Yang
85d9d73a72
comments
2024-07-22 11:49:03 -07:00
Michael Yang
78140a712c
cleanup tests
2024-07-22 11:49:03 -07:00
Michael Yang
1954ec5917
uint64
2024-07-22 11:49:02 -07:00
Michael Yang
0f1910129f
int
2024-07-22 11:30:07 -07:00
Michael Yang
e2c3f6b3e2
string
2024-07-22 11:27:52 -07:00
Michael Yang
8570c1c0ef
keepalive
2024-07-22 11:27:22 -07:00
Michael Yang
55cd3ddcca
bool
2024-07-22 11:27:21 -07:00
Michael Yang
66fe77f084
models
2024-07-22 11:26:12 -07:00
Michael Yang
d1a5227cad
origins
2024-07-22 11:25:30 -07:00
Michael Yang
4f1afd575d
host
2024-07-22 11:25:30 -07:00
Michael Yang
35b89b2eab
rfc: dynamic environ lookup
2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397
Merge pull request #5854 from dhiltgen/win_exit_status
...
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Daniel Hiltgen
f14aa5435d
Merge pull request #5855 from dhiltgen/remove_max_vram
...
Remove no longer supported max vram var
2024-07-22 10:35:29 -07:00
Jeffrey Morgan
f8fedbda20
Update llama.cpp submodule commit to d94c6e0c
( #5805 )
2024-07-22 12:42:00 -04:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing ( #5824 )
2024-07-22 12:38:03 -04:00
Daniel Hiltgen
cc269ba094
Remove no longer supported max vram var
...
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios. With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups. Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen
a3c20e3f18
Refine error reporting for subprocess crash
...
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
Jeffrey Morgan
80ee9b5e47
Remove out of space test temporarily ( #5825 )
2024-07-21 00:22:11 -04:00
Jeffrey Morgan
5534f2cc6a
llm: consider head_dim
in llama arch ( #5817 )
2024-07-20 21:48:12 -04:00
Daniel Hiltgen
d321297d8a
Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
...
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
...
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen
5d707e6fd5
Merge pull request #5583 from dhiltgen/integration_improvements
...
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00
Daniel Hiltgen
283948c83b
Adjust windows ROCm discovery
...
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
1475eab95f
add patch for tekken ( #5807 )
2024-07-20 13:41:21 -04:00
Jeffrey Morgan
20090f3172
preserve last assistant message ( #5802 )
2024-07-19 20:19:26 -07:00
Jeffrey Morgan
69a2d4ccff
Fix generate test flakyness ( #5804 )
2024-07-19 19:11:25 -07:00
Josh
e8b954c646
server: validate template ( #5734 )
...
add template validation to modelfile
2024-07-19 15:24:29 -07:00
royjhan
c57317cbf0
OpenAI: Function Based Testing ( #5752 )
...
* distinguish error forwarding
* more coverage
* rm comment
2024-07-19 11:37:12 -07:00
royjhan
51b2fd299c
adjust openai chat msg processing ( #5729 )
2024-07-19 11:19:20 -07:00
Michael Yang
d0634b1596
Merge pull request #5780 from ollama/mxyng/tools
...
fix parsing tool calls: break on unexpected eofs
2024-07-18 12:14:10 -07:00