Bruce MacDonald
42998d797d
subprocess llama.cpp server ( #401 )
...
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Quinn Slack
f4432e1dba
treat stop as stop sequences, not exact tokens ( #442 )
...
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.
Fixes https://github.com/jmorganca/ollama/issues/295 .
2023-08-30 11:53:42 -04:00
Michael Yang
982c535428
Merge pull request #428 from jmorganca/mxyng/upload-chunks
...
update upload chunks
2023-08-30 07:47:17 -07:00
Michael Yang
7df342a6ea
Merge pull request #421 from jmorganca/mxyng/f16-metal
...
allow F16 to use metal
2023-08-29 06:32:59 -07:00
Patrick Devine
8bbff2df98
add model IDs ( #439 )
2023-08-28 20:50:24 -07:00
Michael Yang
16b06699fd
remove unused parameter
2023-08-28 18:35:18 -04:00
Michael Yang
246dc65417
loosen http status code checks
2023-08-28 18:34:53 -04:00
Michael Yang
865fceb73c
chunked pipe
2023-08-28 18:34:53 -04:00
Michael Yang
72266c7684
bump chunk size to 95MB
2023-08-28 18:34:53 -04:00
Jeffrey Morgan
d3b838ce60
update orca
to orca-mini
2023-08-27 13:26:30 -04:00
Michael Yang
e639a12fa1
Merge pull request #412 from jmorganca/mxyng/update-readme
...
update README.md
2023-08-26 21:26:34 -07:00
Michael Yang
e82fcf30c6
Merge pull request #420 from jmorganca/mxyng/34b-mem-check
...
add 34b to mem check
2023-08-26 14:15:52 -07:00
Michael Yang
495e8b0a6a
Merge pull request #426 from jmorganca/default-template
...
set default template
2023-08-26 14:15:38 -07:00
Michael Yang
59734ca24d
set default template
2023-08-26 12:20:48 -07:00
Jeffrey Morgan
22ab7f5f88
default host to 127.0.0.1
, fixes #424
2023-08-26 11:59:28 -07:00
Michael Yang
b25dd1795d
allow F16 to use metal
...
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang
304f2b6c96
add 34b to mem check
2023-08-26 08:29:21 -07:00
Quinn Slack
2ecc3a33c3
delete all models (not just 1st) in ollama rm
( #415 )
...
Previously, `ollama rm model1 model2 modelN` would only delete `model1`. The other model command-line arguments would be silently ignored. Now, all models mentioned are deleted.
2023-08-26 00:47:56 -07:00
Jeffrey Morgan
ee6e1df118
add codellama
to model list in readme
2023-08-25 20:44:26 -07:00
Jeffrey Morgan
177b69a211
add missing entries for 34B
2023-08-25 18:35:35 -07:00
Michael Yang
dad63f0821
Merge pull request #411 from jmorganca/mxyng/34b
...
patch llama.cpp for 34B
2023-08-25 11:59:05 -07:00
Michael Yang
041f9ad1a1
update README.md
2023-08-25 11:44:25 -07:00
Michael Yang
7a378f8b66
patch llama.cpp for 34B
2023-08-25 10:06:55 -07:00
Michael Yang
de0bdd7f29
Merge pull request #405 from jmorganca/mxyng/34b
...
add 34b model type
2023-08-24 10:37:22 -07:00
Michael Yang
b1cececb8e
add 34b model type
2023-08-24 10:35:44 -07:00
Michael Yang
e0d39fa3bf
Merge pull request #398 from jmorganca/mxyng/cleanup
...
Mxyng/cleanup
2023-08-22 15:51:41 -07:00
Michael Yang
968ced2e71
Merge pull request #393 from jmorganca/mxyng/net-url
...
use url.URL
2023-08-22 15:51:33 -07:00
Michael Yang
32d1a00017
remove unused requestContextKey
2023-08-22 10:49:54 -07:00
Michael Yang
04e2128273
move upload funcs to upload.go
2023-08-22 10:49:53 -07:00
Michael Yang
2cc634689b
use url.URL
2023-08-22 10:49:07 -07:00
Michael Yang
8f827641b0
Merge pull request #397 from jmorganca/mxyng/release-mode
...
build release mode
2023-08-22 10:48:44 -07:00
Michael Yang
95187d7e1e
build release mode
2023-08-22 09:52:43 -07:00
Michael Yang
9ec7e37534
Merge pull request #392 from jmorganca/mxyng/version
...
add version
2023-08-22 09:50:25 -07:00
Michael Yang
2c7f956b38
add version
2023-08-22 09:40:58 -07:00
Jeffrey Morgan
a9f6c56652
fix FROM
instruction erroring when referring to a file
2023-08-22 09:39:42 -07:00
Ryan Baker
0a892419ad
Strip protocol from model path ( #377 )
2023-08-21 21:56:56 -07:00
Jeffrey Morgan
e3054fc74e
add .env
to .dockerignore
2023-08-21 09:32:02 -07:00
Michael Yang
23c2485044
Merge pull request #381 from jmorganca/mxyng/fix-push-chunks
...
retry on unauthorized chunk push
2023-08-18 13:49:25 -07:00
Michael Yang
386c66f285
Merge pull request #378 from jmorganca/mxyng/copy-metadata-from-source
...
copy metadata from source
2023-08-18 13:49:09 -07:00
Michael Yang
3b49315f97
retry on unauthorized chunk push
...
The token printed for authorized requests has a lifetime of 1h. If an
upload exceeds 1h, a chunk push will fail since the token is created on
a "start upload" request.
This replaces the Pipe with SectionReader which is simpler and
implements Seek, a requirement for makeRequestWithRetry. This is
slightly worse than using a Pipe since the progress update is directly
tied to the chunk size instead of controlled separately.
2023-08-18 11:23:47 -07:00
Michael Yang
5ca05c2e88
fix ModelType()
2023-08-18 11:23:38 -07:00
Michael Yang
7eda70f23b
copy metadata from source
2023-08-17 21:55:25 -07:00
Jeffrey Morgan
3d79b414d3
app: package ggml-metal.metal
from correct directory
2023-08-17 23:55:45 -04:00
Michael Yang
c84bbf1dd6
Merge pull request #376 from jmorganca/mxyng/from-map-ignore-nil
...
ignore nil map values
2023-08-17 15:57:12 -07:00
Michael Yang
f723bf0879
ignore nil map values
2023-08-17 15:50:46 -07:00
Michael Yang
cbf725a9ba
Merge pull request #375 from jmorganca/mxyng/fix-push
...
fix push manifest
2023-08-17 15:33:31 -07:00
Michael Yang
086449b6c7
fmt
2023-08-17 15:32:31 -07:00
Michael Yang
3cbc6a5c01
fix push manifest
2023-08-17 15:28:12 -07:00
Jeffrey Morgan
54bb49a502
parse protocol for OLLAMA_HOST
2023-08-17 18:20:44 -04:00
Michael Yang
cabaada956
Merge pull request #372 from jmorganca/mxyng/string-types
...
model and file type as strings
2023-08-17 15:10:59 -07:00