ollama/server
Blake Mizerany cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
..
auth.go Revert "use post token" 2024-05-11 22:19:14 -07:00
download.go server: skip blob verification for already verified blobs 2024-06-05 16:39:11 -07:00
fixblobs.go server: replace blob prefix separator from ':' to '-' (#3146) 2024-03-14 20:18:06 -07:00
fixblobs_test.go server: replace blob prefix separator from ':' to '-' (#3146) 2024-03-14 20:18:06 -07:00
images.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
layer.go Merge pull request #3718 from ollama/mxyng/modelname-3 2024-05-29 12:02:07 -07:00
manifest.go fix: skip removing layers that no longer exist 2024-06-10 11:32:19 -07:00
manifest_test.go add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
model.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
modelpath.go add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
modelpath_test.go add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
prompt.go change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
prompt_test.go change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
routes.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
routes_create_test.go add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
routes_delete_test.go add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
routes_list_test.go add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
routes_test.go Extend api/show and ollama show to return more model info (#4881) 2024-06-19 14:19:02 -07:00
sched.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
sched_test.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
upload.go lint 2024-06-04 11:13:30 -07:00