Compare commits

...

22 commits

Author SHA1 Message Date
7f1565721c
Merge https://github.com/ollama/ollama
Signed-off-by: baalajimaestro <baalajimaestro@ptr.moe>
2024-09-15 23:49:24 +05:30
Edward Cui
d889c6fd07
readme: add Obsidian Quiz Generator plugin to community integrations (#6789) 2024-09-14 23:52:37 -04:00
Daniel Hiltgen
56b9af336a
Fix incremental builds on linux (#6780)
scripts: fix incremental builds on linux or similar
2024-09-13 08:24:08 -07:00
Daniel Hiltgen
fda0d3be52
Use GOARCH for build dirs (#6779)
Corrects x86_64 vs amd64 discrepancy
2024-09-12 16:38:05 -07:00
Daniel Hiltgen
cd5c8f6471
Optimize container images for startup (#6547)
* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release
2024-09-12 12:10:30 -07:00
dcasota
fef257c5c5
examples: updated requirements.txt for privategpt example 2024-09-11 18:56:56 -07:00
Adrian Cole
d066d9b8e0
examples: polish loganalyzer example (#6744) 2024-09-11 18:37:37 -07:00
RAPID ARCHITECT
5a00dc9fc9
readme: add ollama_moe to community integrations (#6752) 2024-09-11 18:36:26 -07:00
Jesse Gross
c354e87809
Merge pull request #6767 from ollama/jessegross/bug_6707
runner: Flush pending responses before returning
2024-09-11 17:20:22 -07:00
Jesse Gross
93ac3760cb runner: Flush pending responses before returning
If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.

Fixes #6707
2024-09-11 16:39:32 -07:00
Patrick Devine
abed273de3
add "stop" command (#6739) 2024-09-11 16:36:21 -07:00
Michael Yang
034392624c
Merge pull request #6762 from ollama/mxyng/show-output
refactor show ouput
2024-09-11 14:58:40 -07:00
Michael Yang
ecab6f1cc5 refactor show ouput
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Petr Mironychev
7d6900827d
readme: add QodeAssist to community integrations (#6754) 2024-09-11 13:19:49 -07:00
Daniel Hiltgen
9246e6dd15
Verify permissions for AMD GPU (#6736)
This adds back a check which was lost many releases back to verify /dev/kfd permissions
which when lacking, can lead to confusing failure modes of:
  "rocBLAS error: Could not initialize Tensile host: No devices found"

This implementation does not hard fail the serve command but instead will fall back to CPU
with an error log.  In the future we can include this in the GPU discovery UX to show
detected but unsupported devices we discovered.
2024-09-11 11:38:25 -07:00
Michael Yang
735a0ca2e4
Merge pull request #6732 from ollama/mxyng/debug-proxy
add *_proxy to env map for debugging
2024-09-10 16:13:25 -07:00
Michael Yang
dddb72e084 add *_proxy for debugging 2024-09-10 09:43:35 -07:00
Jeffrey Morgan
83a9b5271a
docs: update examples to use llama3.1 (#6718) 2024-09-09 22:47:16 -07:00
Daniel Hiltgen
4a8069f9c4
Quiet down dockers new lint warnings (#6716)
* Quiet down dockers new lint warnings

Docker has recently added lint warnings to build.  This cleans up those warnings.

* Fix go lint regression
2024-09-09 17:22:20 -07:00
Patrick Devine
84b84ce2db
catch when model vocab size is set correctly (#6714) 2024-09-09 17:18:54 -07:00
Jeffrey Morgan
bb6a086d63
readme: add crewAI to community integrations (#6699) 2024-09-08 00:36:24 -07:00
RAPID ARCHITECT
30c8f201cc
readme: add crewAI with mesop to community integrations 2024-09-08 00:35:59 -07:00
51 changed files with 1383 additions and 868 deletions

View file

@ -7,3 +7,5 @@ llm/llama.cpp
.env
.cache
test_data
llm/build
llama/build

View file

@ -102,8 +102,8 @@ jobs:
with:
name: generate-windows-cpu
path: |
llm/build/**/bin/*
llm/build/**/*.a
build/**/*
build/**/*.a
dist/windows-amd64/**
# ROCm generation step
@ -176,7 +176,7 @@ jobs:
with:
name: generate-windows-rocm
path: |
llm/build/**/bin/*
build/**/*
dist/windows-amd64/**
- uses: actions/upload-artifact@v4
with:
@ -265,7 +265,7 @@ jobs:
with:
name: generate-windows-cuda-${{ matrix.cuda.version }}
path: |
llm/build/**/bin/*
build/**/*
dist/windows-amd64/**
- uses: actions/upload-artifact@v4
with:
@ -338,7 +338,7 @@ jobs:
- uses: actions/download-artifact@v4
with:
name: generate-windows-rocm
- run: dir llm/build
- run: dir build
- run: |
$gopath=(get-command go).source | split-path -parent
& "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
@ -359,9 +359,7 @@ jobs:
environment: release
runs-on: linux
env:
OLLAMA_SKIP_MANIFEST_CREATE: '1'
BUILD_ARCH: amd64
PUSH: '1'
steps:
- uses: actions/checkout@v4
with:
@ -369,14 +367,8 @@ jobs:
- name: Set Version
shell: bash
run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- run: |
./scripts/build_linux.sh
./scripts/build_docker.sh
- uses: actions/upload-artifact@v4
with:
name: dist-linux-amd64
@ -390,9 +382,7 @@ jobs:
environment: release
runs-on: linux-arm64
env:
OLLAMA_SKIP_MANIFEST_CREATE: '1'
BUILD_ARCH: arm64
PUSH: '1'
steps:
- uses: actions/checkout@v4
with:
@ -421,14 +411,8 @@ jobs:
sudo usermod -aG docker $USER
sudo apt-get install acl
sudo setfacl --modify user:$USER:rw /var/run/docker.sock
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- run: |
./scripts/build_linux.sh
./scripts/build_docker.sh
- uses: actions/upload-artifact@v4
with:
name: dist-linux-arm64
@ -436,6 +420,181 @@ jobs:
dist/*linux*
!dist/*-cov
# Container image build
build-linux:
environment: release
strategy:
matrix:
runner:
- linux
- linux-arm64
runs-on: ${{ matrix.runner }}
env:
FINAL_IMAGE_REPO: ollama/ollama
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: 'Install Docker'
if: ${{ startsWith(matrix.runner, 'linux-arm64') }}
run: |
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
sudo usermod -aG docker $USER
sudo apt-get install acl
sudo setfacl --modify user:$USER:rw /var/run/docker.sock
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.FINAL_IMAGE_REPO }}
flavor: |
latest=false
tags: |
type=ref,event=tag
type=ref,enable=true,priority=600,prefix=0.0.0-pr,suffix=,event=pr
type=semver,pattern={{version}}
- name: Set Version
shell: bash
run: |
machine=$(uname -m)
case ${machine} in
x86_64) echo ARCH=amd64; echo PLATFORM_PAIR=linux-amd64 ;;
aarch64) echo ARCH=arm64; echo PLATFORM_PAIR=linux-arm64 ;;
esac >>$GITHUB_ENV
echo GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=${{ env.DOCKER_METADATA_OUTPUT_VERSION }}\" \"-X=github.com/ollama/ollama/server.mode=release\"'" >>$GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- name: Build and push by digest
id: build
uses: docker/build-push-action@v6
with:
context: "."
platforms: linux/${{ env.ARCH }}
build-args: |
GOFLAGS
outputs: type=image,name=${{ env.FINAL_IMAGE_REPO }},push-by-digest=true,name-canonical=true,push=true
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-${{ env.PLATFORM_PAIR }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
merge:
environment: release
runs-on: linux
needs:
- build-linux
env:
FINAL_IMAGE_REPO: ollama/ollama
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Download digests
uses: actions/download-artifact@v4
with:
path: /tmp/digests
pattern: digests-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.FINAL_IMAGE_REPO }}
flavor: |
latest=false
tags: |
type=ref,event=tag
type=ref,enable=true,priority=600,prefix=0.0.0-pr,suffix=,event=pr
type=semver,pattern={{version}}
- name: Set Version
shell: bash
run: |
machine=$(uname -m)
case ${machine} in
x86_64) echo ARCH=amd64; echo PLATFORM_PAIR=linux-amd64 ;;
aarch64) echo ARCH=arm64; echo PLATFORM_PAIR=linux-arm64 ;;
esac >>$GITHUB_ENV
echo GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=${{ env.DOCKER_METADATA_OUTPUT_VERSION }}\" \"-X=github.com/ollama/ollama/server.mode=release\"'" >>$GITHUB_ENV
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
$(printf '${{ env.FINAL_IMAGE_REPO }}@sha256:%s ' *)
- name: Inspect image
run: |
docker buildx imagetools inspect ${{ env.FINAL_IMAGE_REPO }}:${{ steps.meta.outputs.version }}
build-linux-rocm:
environment: release
runs-on: linux
env:
FINAL_IMAGE_REPO: ollama/ollama
ARCH: amd64
PLATFORM_PAIR: linux-amd64
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.FINAL_IMAGE_REPO }}
flavor: |
latest=false
tags: |
type=ref,event=tag
type=ref,enable=true,priority=600,prefix=0.0.0-pr,suffix=,event=pr
type=semver,pattern={{version}}
- name: Set Version
shell: bash
run: |
echo GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=${{ env.DOCKER_METADATA_OUTPUT_VERSION }}\" \"-X=github.com/ollama/ollama/server.mode=release\"'" >>$GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- name: Build and push by digest
id: build
uses: docker/build-push-action@v6
with:
context: "."
target: runtime-rocm
build-args: |
GOFLAGS
tags: ${{ env.FINAL_IMAGE_REPO }}:${{ env.DOCKER_METADATA_OUTPUT_VERSION}}-rocm,${{ env.FINAL_IMAGE_REPO }}:rocm
push: true
# Aggregate all the assets and ship a release
release:
needs:
@ -448,8 +607,6 @@ jobs:
permissions:
contents: write
env:
OLLAMA_SKIP_IMAGE_BUILD: '1'
PUSH: '1'
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@v4
@ -458,12 +615,6 @@ jobs:
run: |
echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
echo "RELEASE_VERSION=$(echo ${GITHUB_REF_NAME} | cut -f1 -d-)" >> $GITHUB_ENV
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- run: ./scripts/build_docker.sh
- name: Retrieve built artifact
uses: actions/download-artifact@v4
with:

View file

@ -81,12 +81,6 @@ jobs:
if: ${{ ! startsWith(matrix.os, 'windows-') }}
name: 'Unix Go Generate'
- run: go build .
- uses: actions/upload-artifact@v4
with:
name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
path: |
llm/build/**/bin/*
llm/build/**/*.a
generate-cuda:
needs: [changes]
if: ${{ needs.changes.outputs.GENERATE_CUDA == 'True' }}
@ -114,12 +108,6 @@ jobs:
go generate -x ./...
env:
OLLAMA_SKIP_CPU_GENERATE: '1'
- uses: actions/upload-artifact@v4
with:
name: cuda-${{ matrix.cuda-version }}-libraries
path: |
llm/build/**/bin/*
dist/windows-amd64/**
generate-rocm:
needs: [changes]
if: ${{ needs.changes.outputs.GENERATE_ROCM == 'True' }}
@ -147,12 +135,6 @@ jobs:
go generate -x ./...
env:
OLLAMA_SKIP_CPU_GENERATE: '1'
- uses: actions/upload-artifact@v4
with:
name: rocm-${{ matrix.rocm-version }}-libraries
path: |
llm/build/**/bin/*
dist/windows-amd64/**
# ROCm generation step
generate-windows-rocm:
@ -189,7 +171,6 @@ jobs:
name: go generate
env:
OLLAMA_SKIP_CPU_GENERATE: '1'
# TODO - do we need any artifacts?
# CUDA generation step
generate-windows-cuda:
@ -231,7 +212,6 @@ jobs:
go generate -x ./...
env:
OLLAMA_SKIP_CPU_GENERATE: '1'
# TODO - do we need any artifacts?
lint:
strategy:
@ -263,14 +243,6 @@ jobs:
arm64) echo ARCH=arm64 ;;
esac >>$GITHUB_ENV
shell: bash
- run: |
mkdir -p llm/build/linux/$ARCH/stub/bin
touch llm/build/linux/$ARCH/stub/bin/ollama_llama_server
if: ${{ startsWith(matrix.os, 'ubuntu-') }}
- run: |
mkdir -p llm/build/darwin/$ARCH/stub/bin
touch llm/build/darwin/$ARCH/stub/bin/ollama_llama_server
if: ${{ startsWith(matrix.os, 'macos-') }}
- uses: golangci/golangci-lint-action@v6
with:
args: --timeout 8m0s -v
@ -301,23 +273,10 @@ jobs:
cache: true
- run: |
case ${{ matrix.arch }} in
amd64) echo ARCH=x86_64 ;;
amd64) echo ARCH=amd64 ;;
arm64) echo ARCH=arm64 ;;
esac >>$GITHUB_ENV
shell: bash
- run: |
mkdir -p llm/build/linux/$ARCH/stub/bin
touch llm/build/linux/$ARCH/stub/bin/ollama_llama_server
if: ${{ startsWith(matrix.os, 'ubuntu-') }}
- run: |
mkdir -p llm/build/darwin/$ARCH/stub/bin
touch llm/build/darwin/$ARCH/stub/bin/ollama_llama_server
if: ${{ startsWith(matrix.os, 'macos-') }}
shell: bash
- run: go generate ./...
- run: go build
- run: go test -v ./...
- uses: actions/upload-artifact@v4
with:
name: ${{ matrix.os }}-binaries
path: ollama

3
.gitignore vendored
View file

@ -12,4 +12,7 @@ ggml-metal.metal
test_data
*.crt
llm/build
build/*/*/*
!build/**/placeholder
llama/build
__debug_bin*

View file

@ -312,6 +312,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [Cherry Studio](https://github.com/kangfenmao/cherry-studio) (Desktop client with Ollama support)
- [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy focused LLM chat interface with optional encryption)
- [Archyve](https://github.com/nickthecook/archyve) (RAG-enabling document library)
- [crewAI with Mesop](https://github.com/rapidarchitect/ollama-crew-mesop) (Mesop Web Interface to run crewAI with Ollama)
### Terminal
@ -336,6 +337,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [podman-ollama](https://github.com/ericcurtin/podman-ollama)
- [gollama](https://github.com/sammcj/gollama)
- [Ollama eBook Summary](https://github.com/cognitivetech/ollama-ebook-summary/)
- [Ollama Mixture of Experts (MOE) in 50 lines of code](https://github.com/rapidarchitect/ollama_moe)
### Apple Vision Pro
- [Enchanted](https://github.com/AugustDev/enchanted)
@ -358,6 +360,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)
- [Firebase Genkit](https://firebase.google.com/docs/genkit/plugins/ollama)
- [crewAI](https://github.com/crewAIInc/crewAI)
- [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example)
- [LangChain4j](https://github.com/langchain4j/langchain4j) with [example](https://github.com/langchain4j/langchain4j-examples/tree/main/ollama-examples/src/main/java)
- [LangChainRust](https://github.com/Abraxas-365/langchain-rust) with [example](https://github.com/Abraxas-365/langchain-rust/blob/main/examples/llm_ollama.rs)
@ -427,6 +430,8 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [Headless Ollama](https://github.com/nischalj10/headless-ollama) (Scripts to automatically install ollama client & models on any OS for apps that depends on ollama server)
- [vnc-lm](https://github.com/jk011ru/vnc-lm) (A containerized Discord bot with support for attachments and web links)
- [LSP-AI](https://github.com/SilasMarvin/lsp-ai) (Open-source language server for AI-powered functionality)
- [QodeAssist](https://github.com/Palm1r/QodeAssist) (AI-powered coding assistant plugin for Qt Creator)
- [Obsidian Quiz Generator plugin](https://github.com/ECuiDev/obsidian-quiz-generator)
### Supported backends

View file

@ -0,0 +1 @@
This is here to make sure the build/ directory exists for the go:embed command

View file

@ -0,0 +1 @@
This is here to make sure the build/ directory exists for the go:embed command

View file

@ -0,0 +1,8 @@
package build
import "embed"
// Darwin payloads separated by architecture to avoid duplicate payloads when cross compiling
//go:embed darwin/amd64/*
var EmbedFS embed.FS

View file

@ -0,0 +1,8 @@
package build
import "embed"
// Darwin payloads separated by architecture to avoid duplicate payloads when cross compiling
//go:embed darwin/arm64/*
var EmbedFS embed.FS

6
build/embed_linux.go Normal file
View file

@ -0,0 +1,6 @@
package build
import "embed"
//go:embed linux/*
var EmbedFS embed.FS

8
build/embed_unused.go Normal file
View file

@ -0,0 +1,8 @@
//go:build !linux && !darwin
package build
import "embed"
// unused on windows
var EmbedFS embed.FS

View file

@ -0,0 +1 @@
This is here to make sure the build/ directory exists for the go:embed command

View file

@ -0,0 +1 @@
This is here to make sure the build/ directory exists for the go:embed command

View file

@ -2,6 +2,7 @@ package cmd
import (
"archive/zip"
"bufio"
"bytes"
"context"
"crypto/ed25519"
@ -21,6 +22,7 @@ import (
"regexp"
"runtime"
"slices"
"strconv"
"strings"
"sync/atomic"
"syscall"
@ -344,6 +346,39 @@ func (w *progressWriter) Write(p []byte) (n int, err error) {
return len(p), nil
}
func loadOrUnloadModel(cmd *cobra.Command, opts *runOptions) error {
p := progress.NewProgress(os.Stderr)
defer p.StopAndClear()
spinner := progress.NewSpinner("")
p.Add("", spinner)
client, err := api.ClientFromEnvironment()
if err != nil {
return err
}
req := &api.GenerateRequest{
Model: opts.Model,
KeepAlive: opts.KeepAlive,
}
return client.Generate(cmd.Context(), req, func(api.GenerateResponse) error { return nil })
}
func StopHandler(cmd *cobra.Command, args []string) error {
opts := &runOptions{
Model: args[0],
KeepAlive: &api.Duration{Duration: 0},
}
if err := loadOrUnloadModel(cmd, opts); err != nil {
if strings.Contains(err.Error(), "not found") {
return fmt.Errorf("couldn't find model \"%s\" to stop", args[0])
}
}
return nil
}
func RunHandler(cmd *cobra.Command, args []string) error {
interactive := true
@ -422,7 +457,7 @@ func RunHandler(cmd *cobra.Command, args []string) error {
opts.ParentModel = info.Details.ParentModel
if interactive {
if err := loadModel(cmd, &opts); err != nil {
if err := loadOrUnloadModel(cmd, &opts); err != nil {
return err
}
@ -578,7 +613,7 @@ func ListHandler(cmd *cobra.Command, args []string) error {
table.SetHeaderLine(false)
table.SetBorder(false)
table.SetNoWhiteSpace(true)
table.SetTablePadding("\t")
table.SetTablePadding(" ")
table.AppendBulk(data)
table.Render()
@ -613,7 +648,15 @@ func ListRunningHandler(cmd *cobra.Command, args []string) error {
cpuPercent := math.Round(float64(sizeCPU) / float64(m.Size) * 100)
procStr = fmt.Sprintf("%d%%/%d%% CPU/GPU", int(cpuPercent), int(100-cpuPercent))
}
data = append(data, []string{m.Name, m.Digest[:12], format.HumanBytes(m.Size), procStr, format.HumanTime(m.ExpiresAt, "Never")})
var until string
delta := time.Since(m.ExpiresAt)
if delta > 0 {
until = "Stopping..."
} else {
until = format.HumanTime(m.ExpiresAt, "Never")
}
data = append(data, []string{m.Name, m.Digest[:12], format.HumanBytes(m.Size), procStr, until})
}
}
@ -624,7 +667,7 @@ func ListRunningHandler(cmd *cobra.Command, args []string) error {
table.SetHeaderLine(false)
table.SetBorder(false)
table.SetNoWhiteSpace(true)
table.SetTablePadding("\t")
table.SetTablePadding(" ")
table.AppendBulk(data)
table.Render()
@ -720,125 +763,89 @@ func ShowHandler(cmd *cobra.Command, args []string) error {
return nil
}
showInfo(resp)
return nil
return showInfo(resp, os.Stdout)
}
func showInfo(resp *api.ShowResponse) {
modelData := [][]string{
{"parameters", resp.Details.ParameterSize},
{"quantization", resp.Details.QuantizationLevel},
}
if resp.ModelInfo != nil {
arch := resp.ModelInfo["general.architecture"].(string)
modelData = append(modelData,
[]string{"arch", arch},
[]string{"context length", fmt.Sprintf("%v", resp.ModelInfo[fmt.Sprintf("%s.context_length", arch)].(float64))},
[]string{"embedding length", fmt.Sprintf("%v", resp.ModelInfo[fmt.Sprintf("%s.embedding_length", arch)].(float64))},
)
func showInfo(resp *api.ShowResponse, w io.Writer) error {
tableRender := func(header string, rows func() [][]string) {
fmt.Fprintln(w, " ", header)
table := tablewriter.NewWriter(w)
table.SetAlignment(tablewriter.ALIGN_LEFT)
table.SetBorder(false)
table.SetNoWhiteSpace(true)
table.SetTablePadding(" ")
switch header {
case "Template", "System", "License":
table.SetColWidth(100)
}
table.AppendBulk(rows())
table.Render()
fmt.Fprintln(w)
}
mainTableData := [][]string{
{"Model"},
{renderSubTable(modelData, false)},
}
tableRender("Model", func() (rows [][]string) {
if resp.ModelInfo != nil {
arch := resp.ModelInfo["general.architecture"].(string)
rows = append(rows, []string{"", "architecture", arch})
rows = append(rows, []string{"", "parameters", format.HumanNumber(uint64(resp.ModelInfo["general.parameter_count"].(float64)))})
rows = append(rows, []string{"", "context length", strconv.FormatFloat(resp.ModelInfo[fmt.Sprintf("%s.context_length", arch)].(float64), 'f', -1, 64)})
rows = append(rows, []string{"", "embedding length", strconv.FormatFloat(resp.ModelInfo[fmt.Sprintf("%s.embedding_length", arch)].(float64), 'f', -1, 64)})
} else {
rows = append(rows, []string{"", "architecture", resp.Details.Family})
rows = append(rows, []string{"", "parameters", resp.Details.ParameterSize})
}
rows = append(rows, []string{"", "quantization", resp.Details.QuantizationLevel})
return
})
if resp.ProjectorInfo != nil {
projectorData := [][]string{
{"arch", "clip"},
{"parameters", format.HumanNumber(uint64(resp.ProjectorInfo["general.parameter_count"].(float64)))},
}
if projectorType, ok := resp.ProjectorInfo["clip.projector_type"]; ok {
projectorData = append(projectorData, []string{"projector type", projectorType.(string)})
}
projectorData = append(projectorData,
[]string{"embedding length", fmt.Sprintf("%v", resp.ProjectorInfo["clip.vision.embedding_length"].(float64))},
[]string{"projection dimensionality", fmt.Sprintf("%v", resp.ProjectorInfo["clip.vision.projection_dim"].(float64))},
)
mainTableData = append(mainTableData,
[]string{"Projector"},
[]string{renderSubTable(projectorData, false)},
)
tableRender("Projector", func() (rows [][]string) {
arch := resp.ProjectorInfo["general.architecture"].(string)
rows = append(rows, []string{"", "architecture", arch})
rows = append(rows, []string{"", "parameters", format.HumanNumber(uint64(resp.ProjectorInfo["general.parameter_count"].(float64)))})
rows = append(rows, []string{"", "embedding length", strconv.FormatFloat(resp.ProjectorInfo[fmt.Sprintf("%s.vision.embedding_length", arch)].(float64), 'f', -1, 64)})
rows = append(rows, []string{"", "dimensions", strconv.FormatFloat(resp.ProjectorInfo[fmt.Sprintf("%s.vision.projection_dim", arch)].(float64), 'f', -1, 64)})
return
})
}
if resp.Parameters != "" {
mainTableData = append(mainTableData, []string{"Parameters"}, []string{formatParams(resp.Parameters)})
tableRender("Parameters", func() (rows [][]string) {
scanner := bufio.NewScanner(strings.NewReader(resp.Parameters))
for scanner.Scan() {
if text := scanner.Text(); text != "" {
rows = append(rows, append([]string{""}, strings.Fields(text)...))
}
}
return
})
}
head := func(s string, n int) (rows [][]string) {
scanner := bufio.NewScanner(strings.NewReader(s))
for scanner.Scan() && (len(rows) < n || n < 0) {
if text := scanner.Text(); text != "" {
rows = append(rows, []string{"", strings.TrimSpace(text)})
}
}
return
}
if resp.System != "" {
mainTableData = append(mainTableData, []string{"System"}, []string{renderSubTable(twoLines(resp.System), true)})
tableRender("System", func() [][]string {
return head(resp.System, 2)
})
}
if resp.License != "" {
mainTableData = append(mainTableData, []string{"License"}, []string{renderSubTable(twoLines(resp.License), true)})
tableRender("License", func() [][]string {
return head(resp.License, 2)
})
}
table := tablewriter.NewWriter(os.Stdout)
table.SetAutoWrapText(false)
table.SetBorder(false)
table.SetAlignment(tablewriter.ALIGN_LEFT)
for _, v := range mainTableData {
table.Append(v)
}
table.Render()
}
func renderSubTable(data [][]string, file bool) string {
var buf bytes.Buffer
table := tablewriter.NewWriter(&buf)
table.SetAutoWrapText(!file)
table.SetBorder(false)
table.SetNoWhiteSpace(true)
table.SetTablePadding("\t")
table.SetAlignment(tablewriter.ALIGN_LEFT)
for _, v := range data {
table.Append(v)
}
table.Render()
renderedTable := buf.String()
lines := strings.Split(renderedTable, "\n")
for i, line := range lines {
lines[i] = "\t" + line
}
return strings.Join(lines, "\n")
}
func twoLines(s string) [][]string {
lines := strings.Split(s, "\n")
res := [][]string{}
count := 0
for _, line := range lines {
line = strings.TrimSpace(line)
if line != "" {
count++
res = append(res, []string{line})
if count == 2 {
return res
}
}
}
return res
}
func formatParams(s string) string {
lines := strings.Split(s, "\n")
table := [][]string{}
for _, line := range lines {
table = append(table, strings.Fields(line))
}
return renderSubTable(table, false)
return nil
}
func CopyHandler(cmd *cobra.Command, args []string) error {
@ -1328,6 +1335,15 @@ func NewCLI() *cobra.Command {
runCmd.Flags().Bool("insecure", false, "Use an insecure registry")
runCmd.Flags().Bool("nowordwrap", false, "Don't wrap words to the next line automatically")
runCmd.Flags().String("format", "", "Response format (e.g. json)")
stopCmd := &cobra.Command{
Use: "stop MODEL",
Short: "Stop a running model",
Args: cobra.ExactArgs(1),
PreRunE: checkServerHeartbeat,
RunE: StopHandler,
}
serveCmd := &cobra.Command{
Use: "serve",
Aliases: []string{"start"},
@ -1395,6 +1411,7 @@ func NewCLI() *cobra.Command {
createCmd,
showCmd,
runCmd,
stopCmd,
pullCmd,
pushCmd,
listCmd,
@ -1434,6 +1451,7 @@ func NewCLI() *cobra.Command {
createCmd,
showCmd,
runCmd,
stopCmd,
pullCmd,
pushCmd,
listCmd,

206
cmd/cmd_test.go Normal file
View file

@ -0,0 +1,206 @@
package cmd
import (
"bytes"
"os"
"path/filepath"
"testing"
"github.com/google/go-cmp/cmp"
"github.com/ollama/ollama/api"
)
func TestShowInfo(t *testing.T) {
t.Run("bare details", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
}, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("bare model info", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
ModelInfo: map[string]any{
"general.architecture": "test",
"general.parameter_count": float64(7_000_000_000),
"test.context_length": float64(0),
"test.embedding_length": float64(0),
},
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
}, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
context length 0
embedding length 0
quantization FP16
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("parameters", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
Parameters: `
stop never
stop gonna
stop give
stop you
stop up
temperature 99`,
}, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
Parameters
stop never
stop gonna
stop give
stop you
stop up
temperature 99
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("project info", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
ProjectorInfo: map[string]any{
"general.architecture": "clip",
"general.parameter_count": float64(133_700_000),
"clip.vision.embedding_length": float64(0),
"clip.vision.projection_dim": float64(0),
},
}, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
Projector
architecture clip
parameters 133.70M
embedding length 0
dimensions 0
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("system", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
System: `You are a pirate!
Ahoy, matey!
Weigh anchor!
`,
}, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
System
You are a pirate!
Ahoy, matey!
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("license", func(t *testing.T) {
var b bytes.Buffer
license, err := os.ReadFile(filepath.Join("..", "LICENSE"))
if err != nil {
t.Fatal(err)
}
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
License: string(license),
}, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
License
MIT License
Copyright (c) Ollama
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
}

View file

@ -18,7 +18,6 @@ import (
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/parser"
"github.com/ollama/ollama/progress"
"github.com/ollama/ollama/readline"
"github.com/ollama/ollama/types/errtypes"
)
@ -31,26 +30,6 @@ const (
MultilineSystem
)
func loadModel(cmd *cobra.Command, opts *runOptions) error {
p := progress.NewProgress(os.Stderr)
defer p.StopAndClear()
spinner := progress.NewSpinner("")
p.Add("", spinner)
client, err := api.ClientFromEnvironment()
if err != nil {
return err
}
chatReq := &api.ChatRequest{
Model: opts.Model,
KeepAlive: opts.KeepAlive,
}
return client.Chat(cmd.Context(), chatReq, func(api.ChatResponse) error { return nil })
}
func generateInteractive(cmd *cobra.Command, opts runOptions) error {
usage := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
@ -217,7 +196,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
opts.Model = args[1]
opts.Messages = []api.Message{}
fmt.Printf("Loading model '%s'\n", opts.Model)
if err := loadModel(cmd, &opts); err != nil {
if err := loadOrUnloadModel(cmd, &opts); err != nil {
return err
}
continue
@ -371,7 +350,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
switch args[1] {
case "info":
showInfo(resp)
_ = showInfo(resp, os.Stderr)
case "license":
if resp.License == "" {
fmt.Println("No license was specified for this model.")

View file

@ -208,14 +208,18 @@ func ConvertModel(fsys fs.FS, ws io.WriteSeeker) error {
return err
}
if vocabSize := int(p.VocabSize); vocabSize > len(t.Vocabulary.Tokens) {
slog.Warn("vocabulary is smaller than expected, padding with dummy tokens", "expect", p.VocabSize, "actual", len(t.Vocabulary.Tokens))
vocabSize := int(p.VocabSize)
switch {
case vocabSize > len(t.Vocabulary.Tokens):
slog.Warn("vocabulary is smaller than expected, padding with dummy tokens", "expect", vocabSize, "actual", len(t.Vocabulary.Tokens))
for i := range vocabSize - len(t.Vocabulary.Tokens) {
t.Vocabulary.Tokens = append(t.Vocabulary.Tokens, fmt.Sprintf("[PAD%d]", i))
t.Vocabulary.Scores = append(t.Vocabulary.Scores, -1)
t.Vocabulary.Types = append(t.Vocabulary.Types, tokenTypeUserDefined)
}
} else {
case vocabSize < len(t.Vocabulary.Tokens):
return fmt.Errorf("vocabulary is larger than expected '%d' instead of '%d'", len(t.Vocabulary.Tokens), vocabSize)
default:
slog.Debug("vocabulary", "size", len(t.Vocabulary.Tokens))
}

View file

@ -69,7 +69,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"model": "llama3.1",
"prompt": "Why is the sky blue?"
}'
```
@ -80,7 +80,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
@ -102,7 +102,7 @@ To calculate how fast the response is generated in tokens per second (token/s),
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
@ -124,7 +124,7 @@ A response can be received in one reply when streaming is off.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"model": "llama3.1",
"prompt": "Why is the sky blue?",
"stream": false
}'
@ -136,7 +136,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
@ -194,7 +194,7 @@ curl http://localhost:11434/api/generate -d '{
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"model": "llama3.1",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
@ -205,7 +205,7 @@ curl http://localhost:11434/api/generate -d '{
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
@ -327,7 +327,7 @@ If you want to set custom options for the model at runtime rather than in the Mo
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"model": "llama3.1",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
@ -368,7 +368,7 @@ curl http://localhost:11434/api/generate -d '{
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
@ -390,7 +390,7 @@ If an empty prompt is provided, the model will be loaded into memory.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3"
"model": "llama3.1"
}'
```
@ -400,7 +400,7 @@ A single JSON object is returned:
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
@ -445,7 +445,7 @@ Send a chat message with a streaming response.
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"model": "llama3.1",
"messages": [
{
"role": "user",
@ -461,7 +461,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama3",
"model": "llama3.1",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
@ -476,7 +476,7 @@ Final response:
```json
{
"model": "llama3",
"model": "llama3.1",