ollama/llm/patches/01-cache.diff

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index 8fe5e0b1..3e82acb9 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -997,13 +997,15 @@ struct llama_server_context
                 slot.n_sent_text += result.text_to_send.size();
                 // add the token to slot queue and cache
             }
-            slot.add_token_string(result);
+
             if (slot.params.stream)
             {
                 send_partial_response(slot, result);
             }
         }
 
+        slot.add_token_string(result);
+
         if (incomplete)
         {
             slot.has_next_token = true;
Fix clearing kv cache between requests with the same prompt (#2186) * Fix clearing kv cache between requests with the same prompt * fix powershell script 2024-01-25 21:46:20 +00:00			`diff --git a/examples/server/server.cpp b/examples/server/server.cpp`
update llama.cpp submodule to `ceca1ae` (#3064) 2024-03-11 19:57:48 +00:00			`index 8fe5e0b1..3e82acb9 100644`
Fix clearing kv cache between requests with the same prompt (#2186) * Fix clearing kv cache between requests with the same prompt * fix powershell script 2024-01-25 21:46:20 +00:00			`--- a/examples/server/server.cpp`
			`+++ b/examples/server/server.cpp`
update llama.cpp submodule to `ceca1ae` (#3064) 2024-03-11 19:57:48 +00:00			`@@ -997,13 +997,15 @@ struct llama_server_context`
			`slot.n_sent_text += result.text_to_send.size();`
patch: always add token to cache_tokens (#2459) 2024-02-12 16:10:16 +00:00			`// add the token to slot queue and cache`
			`}`
			`- slot.add_token_string(result);`
update llama.cpp submodule to `ceca1ae` (#3064) 2024-03-11 19:57:48 +00:00			`+`
			`if (slot.params.stream)`
			`{`
patch: always add token to cache_tokens (#2459) 2024-02-12 16:10:16 +00:00			`send_partial_response(slot, result);`
			`}`
			`}`

			`+ slot.add_token_string(result);`
Fix clearing kv cache between requests with the same prompt (#2186) * Fix clearing kv cache between requests with the same prompt * fix powershell script 2024-01-25 21:46:20 +00:00			`+`
update llama.cpp submodule to `ceca1ae` (#3064) 2024-03-11 19:57:48 +00:00			`if (incomplete)`
			`{`
patch: always add token to cache_tokens (#2459) 2024-02-12 16:10:16 +00:00			`slot.has_next_token = true;`