ollama/llm
Jesse Gross 93ac3760cb runner: Flush pending responses before returning
If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.

Fixes #6707
2024-09-11 16:39:32 -07:00
..
ext_server runner: Flush pending responses before returning 2024-09-11 16:39:32 -07:00
generate llm: update llama.cpp commit to 8962422 (#6618) 2024-09-03 21:12:39 -04:00
llama.cpp@8962422b1c llm: update llama.cpp commit to 8962422 (#6618) 2024-09-03 21:12:39 -04:00
patches llm: update llama.cpp commit to 8962422 (#6618) 2024-09-03 21:12:39 -04:00
filetype.go Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) 2024-05-23 13:21:49 -07:00
ggla.go update convert test to check result data 2024-07-31 10:59:38 -07:00
ggml.go Merge pull request #6260 from ollama/mxyng/mem 2024-09-05 13:22:08 -07:00
ggml_test.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
gguf.go add conversion for microsoft phi 3 mini/medium 4k, 128 2024-08-12 15:13:29 -07:00
llm.go lint 2024-08-01 17:06:06 -07:00
llm_darwin_amd64.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
llm_darwin_arm64.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
llm_linux.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
llm_windows.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
memory.go Improve logging on GPU too small (#6666) 2024-09-06 08:29:36 -07:00
memory_test.go llama3.1 2024-08-21 11:49:31 -07:00
payload.go Add Jetson cuda variants for arm 2024-08-19 09:38:53 -07:00
server.go Quiet down dockers new lint warnings (#6716) 2024-09-09 17:22:20 -07:00
status.go Catch one more error log 2024-08-05 09:28:07 -07:00