Jeffrey Morgan
f8fedbda20
Update llama.cpp submodule commit to d94c6e0c
( #5805 )
2024-07-22 12:42:00 -04:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing ( #5824 )
2024-07-22 12:38:03 -04:00
Daniel Hiltgen
cc269ba094
Remove no longer supported max vram var
...
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios. With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups. Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen
a3c20e3f18
Refine error reporting for subprocess crash
...
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
Jeffrey Morgan
80ee9b5e47
Remove out of space test temporarily ( #5825 )
2024-07-21 00:22:11 -04:00
Jeffrey Morgan
5534f2cc6a
llm: consider head_dim
in llama arch ( #5817 )
2024-07-20 21:48:12 -04:00
Daniel Hiltgen
d321297d8a
Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
...
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
...
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen
5d707e6fd5
Merge pull request #5583 from dhiltgen/integration_improvements
...
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00
Daniel Hiltgen
283948c83b
Adjust windows ROCm discovery
...
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
1475eab95f
add patch for tekken ( #5807 )
2024-07-20 13:41:21 -04:00
Jeffrey Morgan
20090f3172
preserve last assistant message ( #5802 )
2024-07-19 20:19:26 -07:00
Jeffrey Morgan
69a2d4ccff
Fix generate test flakyness ( #5804 )
2024-07-19 19:11:25 -07:00
Josh
e8b954c646
server: validate template ( #5734 )
...
add template validation to modelfile
2024-07-19 15:24:29 -07:00
royjhan
c57317cbf0
OpenAI: Function Based Testing ( #5752 )
...
* distinguish error forwarding
* more coverage
* rm comment
2024-07-19 11:37:12 -07:00
royjhan
51b2fd299c
adjust openai chat msg processing ( #5729 )
2024-07-19 11:19:20 -07:00
Michael Yang
d0634b1596
Merge pull request #5780 from ollama/mxyng/tools
...
fix parsing tool calls: break on unexpected eofs
2024-07-18 12:14:10 -07:00
Michael Yang
43606d6d6a
fix parsing tool calls
2024-07-18 12:08:11 -07:00
Jeffrey Morgan
70b1010fa5
server: check for empty tools array too ( #5779 )
2024-07-18 11:44:57 -07:00
Jeffrey Morgan
84e5721f3a
always provide content even if empty ( #5778 )
2024-07-18 11:28:19 -07:00
Jeffrey Morgan
319fb1ce03
server: only parse tool calls if tools are provided ( #5771 )
...
* server: only parse tool calls if tools are provided
* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang
b255445557
marshal json automatically for some template values ( #5758 )
2024-07-17 15:35:11 -07:00
lreed
f02f83660c
bump go version to 1.22.5 to fix security vulnerabilities
2024-07-17 21:44:19 +00:00
Michael Yang
b23424bb3c
Merge pull request #5753 from ollama/mxyng/parse-tool-call
...
parse tool call as individual objects
2024-07-17 11:47:53 -07:00
Michael Yang
5fd6988126
parse tool call as individual objects
2024-07-17 11:19:04 -07:00
Michael Yang
5b82960df8
stub response ( #5750 )
2024-07-17 10:39:22 -07:00
Michael Yang
cc9a252d8c
Merge pull request #5732 from ollama/mxyng/cleanup
...
remove ToolCall from GenerateResponse
2024-07-17 10:26:54 -07:00
Pákozdi György
d281a6e603
add sidellama link ( #5702 )
2024-07-17 10:24:44 -07:00
royjhan
154f6f45d4
OpenAI: Support Tools ( #5614 )
...
* reopen pr
* tools
* remove tc from stream for now
* ID and Function
* openai expects arguments to be a string (#5739 )
* mutually exclusive content and tool calls
* clean up
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-16 20:52:59 -07:00
royjhan
0d41623b52
OpenAI: Add Suffix to v1/completions
( #5611 )
...
* add suffix
* remove todo
* remove TODO
* add to test
* rm outdated prompt tokens info md
* fix test
* fix test
2024-07-16 20:50:14 -07:00
Michael Yang
c279f96371
remove ToolCall from GenerateResponse
2024-07-16 15:22:49 -07:00
Michael Yang
499e87c9ba
Merge pull request #5730 from ollama/mxyng/cleanup
...
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang
cd0853f2d5
Merge pull request #5207 from ollama/mxyng/suffix
...
add insert support to generate endpoint
2024-07-16 14:37:32 -07:00
Michael Yang
d290e87513
add suffix support to generate endpoint
...
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Thorsten Sommer
97c20ede33
README: Added AI Studio to the list of UIs ( #5721 )
...
* Added AI Studio to the list of UIs
2024-07-16 14:24:27 -07:00
Michael Yang
5a83f79afd
remove unneeded tool calls
2024-07-16 13:48:45 -07:00
royjhan
987dbab0b0
OpenAI: /v1/embeddings compatibility ( #5285 )
...
* OpenAI v1 models
* Empty List Testing
* Add back envconfig
* v1/models docs
* Remove Docs
* OpenAI batch embed compatibility
* merge conflicts
* integrate with api/embed
* ep
* merge conflicts
* request tests
* rm resp test
* merge conflict
* merge conflict
* test fixes
* test fn renaming
* input validation for empty string
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Michael Yang
a8388beb94
Merge pull request #5726 from ollama/mxyng/tools-templates
...
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang
5afbb60fc4
fix unmarshal type errors
2024-07-16 11:39:34 -07:00
Jeffrey Morgan
4cb5d7decc
server: omit model system prompt if empty ( #5717 )
2024-07-16 11:09:00 -07:00
Michael Yang
8eac50dd4f
Merge pull request #5684 from ollama/mxyng/tests
...
add chat and generate tests with mock runner
2024-07-16 09:44:45 -07:00
Michael Yang
4a565cbf94
add chat and generate tests with mock runner
2024-07-16 09:39:31 -07:00
Michael Yang
64039df6d7
Merge pull request #5284 from ollama/mxyng/tools
...
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan
7ac6d462ec
server: return empty slice on empty /api/embed
request ( #5713 )
...
* server: return empty slice on empty `/api/embed` request
* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang
ef5136a745
tools test
2024-07-15 17:18:21 -07:00
Daniel Hiltgen
8288ec8824
Merge pull request #5710 from dhiltgen/rocm_bump
...
Bump linux ROCm to 6.1.2
2024-07-15 15:32:18 -07:00
Michael Yang
d02bbebb11
tools
2024-07-15 15:26:16 -07:00
Daniel Hiltgen
224337b32f
Bump linux ROCm to 6.1.2
2024-07-15 15:10:22 -07:00
Jeffrey Morgan
9e35d9bbee
server: lowercase roles for compatibility with clients ( #5695 )
2024-07-15 13:55:57 -07:00
royjhan
b9f5e16c80
Introduce /api/embed
endpoint supporting batch embedding ( #5127 )
...
* Initial Batch Embedding
* Revert "Initial Batch Embedding"
This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.
* Initial Draft
* mock up notes
* api/embed draft
* add server function
* check normalization
* clean up
* normalization
* playing around with truncate stuff
* Truncation
* Truncation
* move normalization to go
* Integration Test Template
* Truncation Integration Tests
* Clean up
* use float32
* move normalize
* move normalize test
* refactoring
* integration float32
* input handling and handler testing
* Refactoring of legacy and new
* clear comments
* merge conflicts
* touches
* embedding type 64
* merge conflicts
* fix hanging on single string
* refactoring
* test values
* set context length
* clean up
* testing clean up
* testing clean up
* remove function closure
* Revert "remove function closure"
This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.
* remove function closure
* remove redundant error check
* clean up
* more clean up
* clean up
2024-07-15 12:14:24 -07:00