Commit graph

665 commits

Author SHA1 Message Date
royjhan
1b44d873e7
Add Metrics to api\embed response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen
345420998e Prevent partial loading on mixed GPU brands
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands.  This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00
Michael Yang
079b2c3b03
Merge pull request #5999 from ollama/mxyng/fix-push
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany
750c1c55f7
server: fix race conditions during download (#5994)
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.

This commit is mostly a hot-fix and will be replaced by a new client one
day soon.

Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang
a622c47bd3 fix nil deref in auth.go 2024-07-26 14:14:48 -07:00
Michael Yang
ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Blake Mizerany
c8af3c2d96
server: reuse original download URL for images (#5962)
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Josh
db0968f30c
fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Jeffrey Morgan
80ee9b5e47
Remove out of space test temporarily (#5825) 2024-07-21 00:22:11 -04:00
Daniel Hiltgen
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Jeffrey Morgan
69a2d4ccff
Fix generate test flakyness (#5804) 2024-07-19 19:11:25 -07:00
Josh
e8b954c646
server: validate template (#5734)
add template validation to modelfile
2024-07-19 15:24:29 -07:00
Michael Yang
43606d6d6a fix parsing tool calls 2024-07-18 12:08:11 -07:00
Jeffrey Morgan
70b1010fa5
server: check for empty tools array too (#5779) 2024-07-18 11:44:57 -07:00
Jeffrey Morgan
319fb1ce03
server: only parse tool calls if tools are provided (#5771)
* server: only parse tool calls if tools are provided

* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang
b255445557
marshal json automatically for some template values (#5758) 2024-07-17 15:35:11 -07:00
Michael Yang
5fd6988126 parse tool call as individual objects 2024-07-17 11:19:04 -07:00
Michael Yang
c279f96371 remove ToolCall from GenerateResponse 2024-07-16 15:22:49 -07:00
Michael Yang
499e87c9ba
Merge pull request #5730 from ollama/mxyng/cleanup
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang
d290e87513 add suffix support to generate endpoint
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Michael Yang
5a83f79afd remove unneeded tool calls 2024-07-16 13:48:45 -07:00
royjhan
987dbab0b0
OpenAI: /v1/embeddings compatibility (#5285)
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Michael Yang
a8388beb94
Merge pull request #5726 from ollama/mxyng/tools-templates
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang
5afbb60fc4 fix unmarshal type errors 2024-07-16 11:39:34 -07:00
Jeffrey Morgan
4cb5d7decc
server: omit model system prompt if empty (#5717) 2024-07-16 11:09:00 -07:00
Michael Yang
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
Michael Yang
64039df6d7
Merge pull request #5284 from ollama/mxyng/tools
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan
7ac6d462ec
server: return empty slice on empty /api/embed request (#5713)
* server: return empty slice on empty `/api/embed` request

* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang
ef5136a745 tools test 2024-07-15 17:18:21 -07:00
Michael Yang
d02bbebb11 tools 2024-07-15 15:26:16 -07:00
royjhan
b9f5e16c80
Introduce /api/embed endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
Patrick Devine
057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
jmorganca
f7ee012300 server: prepend system message in chat handler 2024-07-13 15:08:00 -07:00
Jeffrey Morgan
1ed0aa8fea
server: fix context, load_duration and total_duration fields (#5676)
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
2024-07-13 09:25:31 -07:00
Michael Yang
22c5451fc2
fix system prompt (#5662)
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
Michael Yang
ebc529cbb3 autodetect stop parameters from template 2024-07-12 16:01:23 -07:00
Michael Yang
57ec6901eb revert embedded templates to use prompt/response
This reverts commit 19753c18c0.

for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Jeffrey Morgan
791650ddef
sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
Michael Yang
41be28096a add system prompt to first legacy template 2024-07-10 17:03:08 -07:00
Daniel Hiltgen
f4408219e9 Refine scheduler unit tests for reliability
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
Michael Yang
6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan
e4ff73297d
server: fix model reloads when setting OLLAMA_NUM_PARALLEL (#5560)
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`

* remove whitespace change

* undo some changes
2024-07-08 22:32:15 -07:00
Jeffrey Morgan
0ee87615c7
sched: don't error if paging to disk on Windows and macOS (#5523) 2024-07-06 22:01:52 -04:00
Michael Yang
fb6cbc02fb update named templates 2024-07-05 16:29:32 -07:00
Michael Yang
ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang
2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Daniel Hiltgen
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00