ollama

Author	SHA1	Message	Date
Patcher	a820d2b267	readme: add observability section with OpenLIT to community-integrations	2024-11-23 18:03:12 -08:00
Meng Zhuo	2ebdb54fb3	all: update math32 go mod to v1.11.0 (#6627 )	2024-11-23 15:21:54 -08:00
josc146	bb52abfa55	readme: add ChatGPTBox and RWKV-Runner to community integrations (#4118 )	2024-11-23 13:31:27 -08:00
oza6ut0ne	31cb1ca9e5	openai: accept X-Stainless-Retry-Count header (#6910 )	2024-11-23 12:39:05 -08:00
Rodrigo Ribeiro Gomes	78f779a323	readme: add powershai, a powershell module with ollama support to community integrations (#7438 )	2024-11-23 10:08:59 -08:00
Jesse Gross	3478b2cf14	runner.go: Fix deadlock with many concurrent requests If there are no avilable slots for new sequences then a request will not be added to the processing queue but will continue on to wait for a response that never comes. Besides never giving a response to the request, this prevents the model from being unloaded due to the outstanding request. To prevent this, there are semaphores that prevent more requests from being processed than there are slots - one in the Ollama server and one in the runner. - The Ollama server one works but it is not designed to protect the runner's data internal structures and the runner can return a final response before clearing its data structures. - The internal runner semaphore has similar behavior where it can release the semaphore when it issues a response. This is wrong - it should only release the semaphore after it has cleared the data structure. In addition, we should return an error if a slot is not found rather than deadlocking in the event we ever get to this spot. Fixes #7779	2024-11-22 16:14:51 -08:00
Bruce MacDonald	7b5585b9cb	server: remove out of date anonymous access check (#7785 ) In the past the ollama.com server would return a JWT that contained information about the user being authenticated. This was used to return different error messages to the user. This is no longer possible since the token used to authenticate does not contain information about the user anymore. Removing this code that no longer works. Follow up changes will improve the error messages returned here, but good to clean up first.	2024-11-22 11:57:35 -08:00
Daniel Hiltgen	f0a351810c	tests: fix max queue integration test (#7782 ) This had fallen out of sync with the envconfig behavior, where max queue default was not zero.	2024-11-22 08:05:45 -08:00
Daniel Hiltgen	b85520bfb9	logs: explain client aborts better (#7783 ) Users get confused by "Failed to acquire semaphore" error="context canceled" messages in the logs, which are actually clients giving up. While there could be a legitimate hang bug in the system, sometimes this is just short client timeouts with an overloaded system, so this should help users understand what's going on better.	2024-11-22 08:05:32 -08:00
Daniel Hiltgen	d88972ea48	Be quiet when redirecting output (#7360 ) This avoids emitting the progress indicators to stderr, and the interactive prompts to the output file or pipe. Running "ollama run model > out.txt" now exits immediately, and "echo hello \| ollama run model > out.txt" produces zero stderr output and a typical response in out.txt	2024-11-22 08:04:54 -08:00
Leon Sander	25c9339e2d	readme: add Local Multimodal AI Chat app to community integrations (#6931 )	2024-11-21 20:39:38 -08:00
Mikel Olasagasti Uranga	597072ef1b	readme: update google/uuid module (#7310 ) update uuid.New().String() to uuid.NewString()	2024-11-21 19:37:04 -08:00
Dustin	84b3e07f1b	readme: add ollamarama-matrix to community integrations (#7325 )	2024-11-21 17:49:30 -08:00
Edwin.JH.Lee	422d52858c	readme: add x-cmd ollama module to community integrations (#5191 )	2024-11-21 16:55:25 -08:00
Elias	723f285813	readme: add OrionChat to community integrations (#7084 ) OrionChat is a free web-based chat interface that simplifies interactions with multiple AI model providers. It provides a unified platform for chatting and exploring multiple large language models (LLMs).	2024-11-21 11:23:42 -08:00
湛露先生	eaaf5d309d	cmd: delete duplicated call to sb.Reset() (#7308 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2024-11-21 11:20:48 -08:00
Jeffrey Morgan	27d9c749d5	docs: remove tutorials, add cloud section to community integrations (#7784 )	2024-11-21 09:59:53 -08:00
R0CKSTAR	b7bddeebc1	env.sh: cleanup unused RELEASE_IMAGE_REPO (#6855 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-21 08:28:04 -08:00
Paul Robello	6a0c2ec50f	readme: add terminal tool ParLlama to community integrations (#5623 )	2024-11-21 02:55:35 -08:00
毛巳煜	baa41be2aa	readme: add a community made ollama web management tool (#7126 )	2024-11-21 02:51:45 -08:00
xuyangbocn	2157b1232e	readme: add Terraform AWS Ollama & Open WebUI community example (#5633 )	2024-11-21 02:28:57 -08:00
emrgnt-cmplxty	37711578a2	readme: add R2R to community integrations (#5587 )	2024-11-21 02:09:36 -08:00
Cyril Blaecke	fb2c9594e0	readme: Add Nosia to Community Integrations (#5381 )	2024-11-21 02:07:17 -08:00
Christian Tzolov	7fbcd55da3	readme: Add Spring AI library reference (#5981 )	2024-11-21 02:02:14 -08:00
Philippe Charrière	b4348bdd25	readme: add Parakeet to community integrations Parakeet is a GoLang SDK for Ollama --------- Co-authored-by: Parth Sareen <parth.sareen@ollama.com>	2024-11-21 02:00:32 -08:00
Marcin Szczygliński	155734e09a	readme: add community integration py-gpt (#6503 )	2024-11-21 01:54:39 -08:00
Michael	883d80e097	readme: add Promptery to community integrations (#7093 )	2024-11-21 01:46:20 -08:00
Jakub Burkiewicz	e4c9f75b23	readme: add node-red-contrib-ollama to community integrations (#4648 )	2024-11-21 01:09:37 -08:00
Dezoito	f5ec7cc872	readme: add ollama grid search, a community project (#4301 )	2024-11-21 01:02:46 -08:00
Franco Lombardo	811bafba82	readme: Add LLPhant to community integrations (#5679 )	2024-11-21 00:54:26 -08:00
Aarushi	431075fcbb	readme: add autogpt integration to list of community integrations (#6459 )	2024-11-21 00:51:38 -08:00
Kevin Brake	c4f27225ac	readme: add community contribution to readme ollama-kis (#5575 )	2024-11-21 00:31:27 -08:00
chyok	b7aa5ee06c	readme: Add tkinter-based client to community based integrations (#5412 )	2024-11-21 00:19:24 -08:00
Nico	3f87f71755	readme: add Shinkai Desktop to community integrations (#4877 )	2024-11-21 00:16:18 -08:00
Laurent Eschenauer	20623cec13	readme: add OpenGPA to community integrations (#5497 )	2024-11-21 00:13:54 -08:00
Andy Gill	0e5f31a86d	readme: add Haverscript to community integrations (#6945 ) Haverscript uses classical functional programming techniques to provide a composable interface for interacting with ollama-hosted LLMs.	2024-11-21 00:11:39 -08:00
drunkwcodes	7e92091751	readme: Terminal app bb7 to community integrations (#7064 )	2024-11-21 00:03:11 -08:00
boessu	1a742f54c9	readme: update AMD ROCm links (#7213 )	2024-11-20 23:48:55 -08:00
奶茶叔叔	6a89dcf848	readme: flutter-based chat app to community integrations (#7221 )	2024-11-20 23:30:10 -08:00
Alexander F. Rødseth	c5e238e8e5	readme: orbiton to community integrations (#7770 )	2024-11-20 23:24:05 -08:00
Nikita Ganzikov	fce30f407a	app: typo in wintray messages const (#7705 )	2024-11-20 22:01:58 -08:00
Daniel Hiltgen	d863298210	docs: Link to AMD guide on multi-GPU guidance (#7744 )	2024-11-20 16:00:46 -08:00
Jesse Gross	c4b34f2a2a	runner.go: Truncate inputs that exceed context rather than shifting Previous versions of the runner would truncate inputs to the context window before beginning processing. The main processing loop relied on this behavior if the context needed to be shifted later (due to token generation). If truncation did not occur then invariants would be broken, causing crashes or infinite loops. Later versions attempted to fix these bugs and make the logic less subtle so that all inputs could be handled. Truncation was removed to make things consistent. However, truncation is much faster than processing and shifting, so removing it caused performance problems when the input vastly exceeded the context size. This restores the input truncation as a performance optimization while keeping the more robust processing logic. Fixes #7762	2024-11-20 12:49:24 -08:00
Jesse Gross	c3ff916431	runner.go: Don't add inputs to cache view until actually processed We need to track which tokens are in the cache ourselves. We currently add tokens to the cache tracker when we add them to batch but they are not actually in the cache until we call Decode. This can cause confusion when we are shifting the cache. Avoids "could not find a KV slot for the batch" issues. Bug #7545	2024-11-20 12:49:24 -08:00
Jesse Gross	3fc1dc0e6f	runner.go: Hard fail on errors rather than potentially infinite looping We try to recover from errors by dropping the tokens that caused the problem and re-trying. However, dropping the tokens is not correct and continuing often leads to infinite loops. To avoid, this we end the sequence if such a condition is detected, which is also surprising. At this point, it is better to just report the error. This will make it easier to find problems and the alternatives are perhaps even more surprising to users. This is not a very satisfactory solution either - we should isolate the error and return it to the user without killing the whole process. However, this is an incremental step and consistent with most other failures (which either manifest as abort() or panic).	2024-11-20 12:49:24 -08:00
Jesse Gross	7121dfa309	runner.go: Retry decoding after defragmentation if needed Fragmentation of the KV cache can occur due to cache shifting or different sequences getting processed. Decode uses a heuristic to decide if it should defrag. However, this heuristic isn't 100% accurate, so decoding can sometimes fail by surprise. For these cases, if decode indicates that there is no KV cache space, we should defrag and then try again.	2024-11-20 12:49:24 -08:00
Jesse Gross	5f68fcab12	runner.go: Use correct index when retrieving embedding results This doesn't have any impact currently because NUM_PARALLEL is forced to 1 for embeddings, so both indicies will always be 0.	2024-11-20 12:49:24 -08:00
Emir Sahin	ecf41eed05	readme: add llm-axe to community integrations (#5931 )	2024-11-20 10:53:14 -08:00
Marcus Ziadé	b8c66d3307	readme: add a swift community integration (#7383 )	2024-11-20 10:49:15 -08:00
thewh1teagle	303f4bc79e	readme: add vibe app to community integrations (#7607 )	2024-11-20 10:45:10 -08:00

1 2 3 4 5 ...

3677 commits