ollama/llm/ext_server
2024-03-12 14:57:15 -07:00
..
CMakeLists.txt Adapt our build for imported server.cpp 2024-03-12 14:57:15 -07:00
completion.js.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
ext_server.cpp chore: fix typo (#3073) 2024-03-12 14:09:22 -04:00
ext_server.h update llama.cpp submodule to 66c1968f7 (#2618) 2024-02-20 17:42:31 -05:00
httplib.h Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
index.html.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
index.js.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
json-schema-to-grammar.mjs.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
json.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
oai.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
README.md Support multiple variants for a given llm lib type 2024-01-10 17:27:51 -08:00
server.cpp Adapt our build for imported server.cpp 2024-03-12 14:57:15 -07:00
utils.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00

Extern C Server

This directory contains a thin facade we layer on top of the Llama.cpp server to expose extern C interfaces to access the functionality through direct API calls in-process. The llama.cpp code uses compile time macros to configure GPU type along with other settings. During the go generate ./... execution, the build will generate one or more copies of the llama.cpp extern C server based on what GPU libraries are detected to support multiple GPU types as well as CPU only support. The Ollama go build then embeds these different servers to support different GPUs and settings at runtime.

If you are making changes to the code in this directory, make sure to disable caching during your go build to ensure you pick up your changes. A typical iteration cycle from the top of the source tree looks like:

go generate ./... && go build -a .