From 83a0cb8d88561b4302baa8b6ea0623c426483e5d Mon Sep 17 00:00:00 2001 From: Michael Yang Date: Tue, 2 Jul 2024 14:52:18 -0700 Subject: [PATCH 1/3] docs --- docs/template.md | 173 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+) create mode 100644 docs/template.md diff --git a/docs/template.md b/docs/template.md new file mode 100644 index 00000000..8f41e8fb --- /dev/null +++ b/docs/template.md @@ -0,0 +1,173 @@ +# Template + +Ollama provides a powerful templating engine backed by Go's built-in templating engine to construct prompts for your large language model. This feature is a valuable tool to get the most out of your models. + +## Basic Template Structure + +A basic Go template consists of three main parts: + +* **Layout**: The overall structure of the template. +* **Variables**: Placeholders for dynamic data that will be replaced with actual values when the template is rendered. +* **Functions**: Custom functions or logic that can be used to manipulate the template's content. + +Here's an example of a simple chat template: + +```gotmpl +{{- range .Messages }} +{{ .Role }}: {{ .Content }} +{{- end }} +``` + +In this example, we have: + +* A basic messages structure (layout) +* Three variables: `Messages`, `Role`, and `Content` (variables) +* A custom function (action) that iterates over an array of items (`range .Messages`) and displays each item + +## Adding Templates to Your Model + +By default, models imported into Ollama have a default template of `{{ .Prompt }}`, i.e. user inputs are sent verbatim to the LLM. This is appropriate for text or code completion models but lacks essential markers for chat or instruction models. + +Omitting a template in these models puts the responsibility of correctly templating input onto the user. Adding a template allows users to easily get the best results from the model. + +To add templates in your model, you'll need to add a `TEMPLATE` command to the Modelfile. Here's an example using Meta's Llama 3. + +```dockerfile +FROM llama3 + +TEMPLATE """{{- if .System }}<|start_header_id|>system<|end_header_id|> + +{{ .System }}<|eot_id|> +{{- end }} +{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|> + +{{ .Content }}<|eot_id|> +{{- end }}<|start_header_id|>assistant<|end_header_id|> + +""" +``` + +## Variables + +`System` (string): system prompt + +`Prompt` (string): user prompt + +`Response` (string): assistant response + +`Suffix` (string): text inserted after the assistant's response + +`Messages` (list): list of messages + +`Messages[].Role` (string): role which can be one of `system`, `user`, `assistant`, or `tool` + +`Messages[].Content` (string): message content + +`Messages[].ToolCalls` (list): list of tools the model wants to call + +`Messages[].ToolCalls[].Function` (object): function to call + +`Messages[].ToolCalls[].Function.Name` (string): function name + +`Messages[].ToolCalls[].Function.Arguments` (map): mapping of argument name to argument value + +`Tools` (list): list of tools the model can access + +`Tools[].Type` (string): schema type. `type` is always `function` + +`Tools[].Function` (object): function definition + +`Tools[].Function.Name` (string): function name + +`Tools[].Function.Description` (string): function description + +`Tools[].Function.Parameters` (object): function parameters + +`Tools[].Function.Parameters.Type` (string): schema type. `type` is always `object` + +`Tools[].Function.Parameters.Required` (list): list of required properties + +`Tools[].Function.Parameters.Properties` (map): mapping of property name to property definition + +`Tools[].Function.Parameters.Properties[].Type` (string): property type + +`Tools[].Function.Parameters.Properties[].Description` (string): property description + +`Tools[].Function.Parameters.Properties[].Enum` (list): list of valid values + +## Tips and Best Practices + +Keep the following tips and best practices in mind when working with Go templates: + +* **Be mindful of dot**: Control flow structures like `range` and `with` changes the value `.` +* **Out-of-scope variables**: Use `$.` to reference variables not currently in scope, starting from the root +* **Whitespace control**: Use `-` to trim leading (`{{-`) and trailing (`-}}`) whitespace + +## Examples + +### Example Messages + +#### ChatML + +ChatML is a popular template format. It can be used for models such as Databrick's DBRX, Intel's Neural Chat, and Microsoft's Orca 2. + +```gotmpl +{{- if .System }}<|im_start|>system +{{ .System }}<|im_end|> +{{ end }} +{{- range .Messages }}<|im_start|>{{ .Role }} +{{ .Content }}<|im_end|> +{{ end }}<|im_start|>assistant +{{ else }} +{{ if .System }}<|im_start|>system +{{ .System }}<|im_end|> +``` + +### Example Tools + +Tools support can be added to a model by adding a `{{ .Tools }}` node to the template. This feature is useful for models trained to call external tools and can a powerful tool for retrieving real-time data or performing complex tasks. + +#### Mistral + +Mistral v0.3 and Mixtral 8x22B supports tool calling. + +```gotmpl +{{- range $index, $_ := .Messages }} +{{- if eq .Role "user" }} +{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS] {{ json $.Tools }}[/AVAILABLE_TOOLS] +{{- end }}[INST] {{ if and (eq (len (slice $.Messages $index)) 1) $.System }}{{ $.System }} + +{{ end }}{{ .Content }}[/INST] +{{- else if eq .Role "assistant" }} +{{- if .Content }} {{ .Content }} +{{- else if .ToolCalls }}[TOOL_CALLS] [ +{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ json .Function.Arguments }}} +{{- end }}] +{{- end }} +{{- else if eq .Role "tool" }}[TOOL_RESULTS] {"content": {{ .Content }}}[/TOOL_RESULTS] +{{- end }} +{{- end }} +``` + +### Example Fill-in-Middle + +Fill-in-middle support can be added to a model by adding a `{{ .Suffix }}` node to the template. This feature is useful for models that are trained to generate text in the middle of user input, such as code completion models. + +#### CodeLlama + +CodeLlama [7B](https://ollama.com/library/codellama:7b-code) and [13B](https://ollama.com/library/codellama:13b-code) code completion models support fill-in-middle. + +```gotmpl +
 {{ .Prompt }} {{ .Suffix }} 
+```
+
+> [!NOTE]
+> CodeLlama 34B and 70B code completion and all instruct and Python fine-tuned models do not support fill-in-middle.
+
+#### Codestral
+
+Codestral [22B](https://ollama.com/library/codestral:22b) supports fill-in-middle.
+
+```gotmpl
+[SUFFIX]{{ .Suffix }}[PREFIX] {{ .Prompt }}
+```

From 9b60a038e5169c4a69bc513ae6e7ea1816f9fc11 Mon Sep 17 00:00:00 2001
From: Michael Yang 
Date: Mon, 22 Jul 2024 13:34:56 -0700
Subject: [PATCH 2/3] update api.md

---
 README.md         |   3 +-
 docs/api.md       | 117 +++++++++++++++++++++++++++++++++++++++++++++-
 docs/modelfile.md |   3 +-
 3 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index b96f4c16..02ab7051 100644
--- a/README.md
+++ b/README.md
@@ -64,7 +64,8 @@ Here are some example models that can be downloaded:
 | LLaVA              | 7B         | 4.5GB | `ollama run llava`             |
 | Solar              | 10.7B      | 6.1GB | `ollama run solar`             |
 
-> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
+> [!NOTE]
+> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
 
 ## Customize a model
 
diff --git a/docs/api.md b/docs/api.md
index c577bb1a..bf4c8ce8 100644
--- a/docs/api.md
+++ b/docs/api.md
@@ -40,6 +40,7 @@ Generate a response for a given prompt with a provided model. This is a streamin
 
 - `model`: (required) the [model name](#model-names)
 - `prompt`: the prompt to generate a response for
+- `suffix`: the text after the model response
 - `images`: (optional) a list of base64-encoded images (for multimodal models such as `llava`)
 
 Advanced parameters (optional):
@@ -57,7 +58,8 @@ Advanced parameters (optional):
 
 Enable JSON mode by setting the `format` parameter to `json`. This will structure the response as a valid JSON object. See the JSON mode [example](#request-json-mode) below.
 
-> Note: it's important to instruct the model to use JSON in the `prompt`. Otherwise, the model may generate large amounts whitespace.
+> [!IMPORTANT]
+> It's important to instruct the model to use JSON in the `prompt`. Otherwise, the model may generate large amounts whitespace.
 
 ### Examples
 
@@ -148,8 +150,44 @@ If `stream` is set to `false`, the response will be a single JSON object:
 }
 ```
 
+#### Request (with suffix)
+
+##### Request
+
+```shell
+curl http://localhost:11434/api/generate -d '{
+  "model": "codellama:code",
+  "prompt": "def compute_gcd(a, b):",
+  "suffix": "    return result",
+  "options": {
+    "temperature": 0
+  },
+  "stream": false
+}'
+```
+
+##### Response
+
+```json
+{
+  "model": "codellama:code",
+  "created_at": "2024-07-22T20:47:51.147561Z",
+  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",
+  "done": true,
+  "done_reason": "stop",
+  "context": [...],
+  "total_duration": 1162761250,
+  "load_duration": 6683708,
+  "prompt_eval_count": 17,
+  "prompt_eval_duration": 201222000,
+  "eval_count": 63,
+  "eval_duration": 953997000
+}
+```
+
 #### Request (JSON mode)
 
+> [!IMPORTANT]
 > When `format` is set to `json`, the output will always be a well-formed JSON object. It's important to also instruct the model to respond in JSON.
 
 ##### Request
@@ -383,9 +421,10 @@ Generate the next message in a chat with a provided model. This is a streaming e
 
 The `message` object has the following fields:
 
-- `role`: the role of the message, either `system`, `user` or `assistant`
+- `role`: the role of the message, either `system`, `user`, `assistant`, or `tool`
 - `content`: the content of the message
 - `images` (optional): a list of images to include in the message (for multimodal models such as `llava`)
+- `tool_calls` (optional): a list of tools the model wants to use
 
 Advanced parameters (optional):
 
@@ -393,6 +432,7 @@ Advanced parameters (optional):
 - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
 - `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
 - `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
+- `tools`: external tools the model can use. Not all models support this feature.
 
 ### Examples
 
@@ -622,6 +662,79 @@ curl http://localhost:11434/api/chat -d '{
 }
 ```
 
+#### Chat request (with tools)
+
+##### Request
+
+```
+curl http://localhost:11434/api/chat -d '{
+  "model": "mistral",
+  "messages": [
+    {
+      "role": "user",
+      "content": "What is the weather today in Paris?"
+    }
+  ],
+  "stream": false,
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "get_current_weather",
+        "description": "Get the current weather for a location",
+        "parameters": {
+          "type": "object",
+          "properties": {
+            "location": {
+              "type": "string",
+              "description": "The location to get the weather for, e.g. San Francisco, CA"
+            },
+            "format": {
+              "type": "string",
+              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
+              "enum": ["celsius", "fahrenheit"]
+            }
+          },
+          "required": ["location", "format"]
+        }
+      }
+    }
+  ]
+}'
+```
+
+##### Response
+
+```json
+{
+  "model": "mistral:7b-instruct-v0.3-q4_K_M",
+  "created_at": "2024-07-22T20:33:28.123648Z",
+  "message": {
+    "role": "assistant",
+    "content": "",
+    "tool_calls": [
+      {
+        "function": {
+          "name": "get_current_weather",
+          "arguments": {
+            "format": "celsius",
+            "location": "Paris, FR"
+          }
+        }
+      }
+    ]
+  },
+  "done_reason": "stop",
+  "done": true,
+  "total_duration": 885095291,
+  "load_duration": 3753500,
+  "prompt_eval_count": 122,
+  "prompt_eval_duration": 328493000,
+  "eval_count": 33,
+  "eval_duration": 552222000
+}
+```
+
 ## Create a Model
 
 ```shell
diff --git a/docs/modelfile.md b/docs/modelfile.md
index 21ee1826..c3645b06 100644
--- a/docs/modelfile.md
+++ b/docs/modelfile.md
@@ -1,6 +1,7 @@
 # Ollama Model File
 
-> Note: `Modelfile` syntax is in development
+> [!NOTE]
+> `Modelfile` syntax is in development
 
 A model file is the blueprint to create and share models with Ollama.
 

From 997c903884b08aef53d0f92634f74bdb64f05c0a Mon Sep 17 00:00:00 2001
From: Michael Yang 
Date: Thu, 25 Jul 2024 16:23:40 -0700
Subject: [PATCH 3/3] Update docs/template.md

Co-authored-by: Jeffrey Morgan 
---
 docs/template.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/template.md b/docs/template.md
index 8f41e8fb..f6ce06ba 100644
--- a/docs/template.md
+++ b/docs/template.md
@@ -24,7 +24,7 @@ In this example, we have:
 * Three variables: `Messages`, `Role`, and `Content` (variables)
 * A custom function (action) that iterates over an array of items (`range .Messages`) and displays each item
 
-## Adding Templates to Your Model
+## Adding templates to your model
 
 By default, models imported into Ollama have a default template of `{{ .Prompt }}`, i.e. user inputs are sent verbatim to the LLM. This is appropriate for text or code completion models but lacks essential markers for chat or instruction models.