This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.
This script also provides nicer feedback to the user about what is
happening during the build process.
At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent. This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
After executing the `userdel ollama` command, I saw this message:
```sh
$ sudo userdel ollama
userdel: group ollama not removed because it has other members.
```
Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too.
Thanks!
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
* Clean up documentation
Will probably need to update with PRs for new release.
Signed-off-by: Matt Williams <m@technovangelist.com>
* Correcting to fit in 0.1.15 changes
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* addressing comments
Signed-off-by: Matt Williams <m@technovangelist.com>
* more api cleanup
Signed-off-by: Matt Williams <m@technovangelist.com>
* its llava not llama
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Updated hosting to server and documented all env vars
Signed-off-by: Matt Williams <m@technovangelist.com>
* remove last of the cli descriptions
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* update further per conversation with jeff earlier today
Signed-off-by: Matt Williams <m@technovangelist.com>
* cleanup the doc readme
Signed-off-by: Matt Williams <m@technovangelist.com>
* move upgrade to faq
Signed-off-by: Matt Williams <m@technovangelist.com>
* first change
Signed-off-by: Matt Williams <m@technovangelist.com>
* updated
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* examples in parent
Signed-off-by: Matt Williams <m@technovangelist.com>
* add exapmle for create model.
Signed-off-by: Matt Williams <m@technovangelist.com>
* update faq
Signed-off-by: Matt Williams <m@technovangelist.com>
* update create model api
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* update the readme in docs
Signed-off-by: Matt Williams <m@technovangelist.com>
* update a few more things
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/modelfile.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
---------
Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
* faq: does ollama share my prompts
Signed-off-by: Matt Williams <m@technovangelist.com>
* faq: ollama and openai
Signed-off-by: Matt Williams <m@technovangelist.com>
* faq: vscode plugins
Signed-off-by: Matt Williams <m@technovangelist.com>
* faq: send a doc to Ollama
Signed-off-by: Matt Williams <m@technovangelist.com>
* extra spacing
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update faq.md
* Update faq.md
---------
Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Michael <mchiang0610@users.noreply.github.com>
* allow for a configurable ollama models directory
- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com>
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com>
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com>