Quinn Slack
f4432e1dba
treat stop as stop sequences, not exact tokens ( #442 )
...
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.
Fixes https://github.com/jmorganca/ollama/issues/295 .
2023-08-30 11:53:42 -04:00
Michael Yang
7df342a6ea
Merge pull request #421 from jmorganca/mxyng/f16-metal
...
allow F16 to use metal
2023-08-29 06:32:59 -07:00
Michael Yang
e82fcf30c6
Merge pull request #420 from jmorganca/mxyng/34b-mem-check
...
add 34b to mem check
2023-08-26 14:15:52 -07:00
Michael Yang
b25dd1795d
allow F16 to use metal
...
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang
304f2b6c96
add 34b to mem check
2023-08-26 08:29:21 -07:00
Jeffrey Morgan
177b69a211
add missing entries for 34B
2023-08-25 18:35:35 -07:00
Michael Yang
7a378f8b66
patch llama.cpp for 34B
2023-08-25 10:06:55 -07:00
Michael Yang
b1cececb8e
add 34b model type
2023-08-24 10:35:44 -07:00
Michael Yang
5ca05c2e88
fix ModelType()
2023-08-18 11:23:38 -07:00
Michael Yang
a894cc792d
model and file type as strings
2023-08-17 12:08:04 -07:00
Michael Yang
4dcf5c3e0b
Merge pull request #349 from jmorganca/close-files
...
close open files
2023-08-14 16:15:58 -07:00
Michael Yang
e26085b921
close open files
2023-08-14 16:08:06 -07:00
Michael Yang
f7b613332c
update llama.cpp
2023-08-14 15:47:00 -07:00
Bruce MacDonald
4b2d366c37
Update llama.go
2023-08-14 12:55:50 -03:00
Bruce MacDonald
56fd4e4ef2
log embedding eval timing
2023-08-14 12:51:31 -03:00
Jeffrey Morgan
22885aeaee
update llama.cpp
to f64d44a
2023-08-12 22:47:15 -04:00
Michael Yang
6ed991c8e2
ggml: fix off by one error
...
remove used Unknown FileType
2023-08-11 10:45:22 -07:00
Michael Yang
6de5d032e1
implement loading ggml lora adapters through the modelfile
2023-08-10 09:23:39 -07:00
Michael Yang
d791df75dd
check memory requirements before loading
2023-08-10 09:23:11 -07:00
Michael Yang
020a3b3530
disable gpu for q5_0, q5_1, q8_0 quants
2023-08-10 09:23:11 -07:00
Michael Yang
fccf8d179f
partial decode ggml bin for more info
2023-08-10 09:23:10 -07:00