Commit graph

3677 commits

Author SHA1 Message Date
Patcher
a820d2b267
readme: add observability section with OpenLIT to community-integrations 2024-11-23 18:03:12 -08:00
Meng Zhuo
2ebdb54fb3
all: update math32 go mod to v1.11.0 (#6627) 2024-11-23 15:21:54 -08:00
josc146
bb52abfa55
readme: add ChatGPTBox and RWKV-Runner to community integrations (#4118) 2024-11-23 13:31:27 -08:00
oza6ut0ne
31cb1ca9e5
openai: accept X-Stainless-Retry-Count header (#6910) 2024-11-23 12:39:05 -08:00
Rodrigo Ribeiro Gomes
78f779a323
readme: add powershai, a powershell module with ollama support to community integrations (#7438) 2024-11-23 10:08:59 -08:00
Jesse Gross
3478b2cf14 runner.go: Fix deadlock with many concurrent requests
If there are no avilable slots for new sequences then a request
will not be added to the processing queue but will continue on
to wait for a response that never comes. Besides never giving a
response to the request, this prevents the model from being
unloaded due to the outstanding request.

To prevent this, there are semaphores that prevent more requests
from being processed than there are slots - one in the Ollama
server and one in the runner.
 - The Ollama server one works but it is not designed to protect
the runner's data internal structures and the runner can return a
final response before clearing its data structures.
 - The internal runner semaphore has similar behavior where it
 can release the semaphore when it issues a response. This is
 wrong - it should only release the semaphore after it has
 cleared the data structure.

In addition, we should return an error if a slot is not found
rather than deadlocking in the event we ever get to this spot.

Fixes #7779
2024-11-22 16:14:51 -08:00
Bruce MacDonald
7b5585b9cb
server: remove out of date anonymous access check (#7785)
In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.

Follow up changes will improve the error messages returned here, but good to
clean up first.
2024-11-22 11:57:35 -08:00
Daniel Hiltgen
f0a351810c
tests: fix max queue integration test (#7782)
This had fallen out of sync with the envconfig behavior, where max queue default was not zero.
2024-11-22 08:05:45 -08:00
Daniel Hiltgen
b85520bfb9
logs: explain client aborts better (#7783)
Users get confused by "Failed to acquire semaphore" error="context canceled"
messages in the logs, which are actually clients giving up.  While there could be
a legitimate hang bug in the system, sometimes this is just short client timeouts
with an overloaded system, so this should help users understand what's going on
better.
2024-11-22 08:05:32 -08:00
Daniel Hiltgen
d88972ea48
Be quiet when redirecting output (#7360)
This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe.  Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt
2024-11-22 08:04:54 -08:00
Leon Sander
25c9339e2d
readme: add Local Multimodal AI Chat app to community integrations (#6931) 2024-11-21 20:39:38 -08:00
Mikel Olasagasti Uranga
597072ef1b
readme: update google/uuid module (#7310)
update uuid.New().String() to uuid.NewString()
2024-11-21 19:37:04 -08:00
Dustin
84b3e07f1b
readme: add ollamarama-matrix to community integrations (#7325) 2024-11-21 17:49:30 -08:00
Edwin.JH.Lee
422d52858c
readme: add x-cmd ollama module to community integrations (#5191) 2024-11-21 16:55:25 -08:00
Elias
723f285813
readme: add OrionChat to community integrations (#7084)
OrionChat is a free web-based chat interface that simplifies interactions
with multiple AI model providers. It provides a unified platform for chatting
and exploring multiple large language models (LLMs).
2024-11-21 11:23:42 -08:00
湛露先生
eaaf5d309d
cmd: delete duplicated call to sb.Reset() (#7308)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-11-21 11:20:48 -08:00
Jeffrey Morgan
27d9c749d5
docs: remove tutorials, add cloud section to community integrations (#7784) 2024-11-21 09:59:53 -08:00
R0CKSTAR
b7bddeebc1
env.sh: cleanup unused RELEASE_IMAGE_REPO (#6855)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-21 08:28:04 -08:00
Paul Robello
6a0c2ec50f
readme: add terminal tool ParLlama to community integrations (#5623) 2024-11-21 02:55:35 -08:00
毛巳煜
baa41be2aa
readme: add a community made ollama web management tool (#7126) 2024-11-21 02:51:45 -08:00
xuyangbocn
2157b1232e
readme: add Terraform AWS Ollama & Open WebUI community example (#5633) 2024-11-21 02:28:57 -08:00
emrgnt-cmplxty
37711578a2
readme: add R2R to community integrations (#5587) 2024-11-21 02:09:36 -08:00
Cyril Blaecke
fb2c9594e0
readme: Add Nosia to Community Integrations (#5381) 2024-11-21 02:07:17 -08:00
Christian Tzolov
7fbcd55da3
readme: Add Spring AI library reference (#5981) 2024-11-21 02:02:14 -08:00
Philippe Charrière
b4348bdd25
readme: add Parakeet to community integrations
Parakeet is a GoLang SDK for Ollama

---------

Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
2024-11-21 02:00:32 -08:00
Marcin Szczygliński
155734e09a
readme: add community integration py-gpt (#6503) 2024-11-21 01:54:39 -08:00
Michael
883d80e097
readme: add Promptery to community integrations (#7093) 2024-11-21 01:46:20 -08:00
Jakub Burkiewicz
e4c9f75b23
readme: add node-red-contrib-ollama to community integrations (#4648) 2024-11-21 01:09:37 -08:00
Dezoito
f5ec7cc872
readme: add ollama grid search, a community project (#4301) 2024-11-21 01:02:46 -08:00
Franco Lombardo
811bafba82
readme: Add LLPhant to community integrations (#5679) 2024-11-21 00:54:26 -08:00
Aarushi
431075fcbb
readme: add autogpt integration to list of community integrations (#6459) 2024-11-21 00:51:38 -08:00
Kevin Brake
c4f27225ac
readme: add community contribution to readme ollama-kis (#5575) 2024-11-21 00:31:27 -08:00
chyok
b7aa5ee06c
readme: Add tkinter-based client to community based integrations (#5412) 2024-11-21 00:19:24 -08:00
Nico
3f87f71755
readme: add Shinkai Desktop to community integrations (#4877) 2024-11-21 00:16:18 -08:00
Laurent Eschenauer
20623cec13
readme: add OpenGPA to community integrations (#5497) 2024-11-21 00:13:54 -08:00
Andy Gill
0e5f31a86d
readme: add Haverscript to community integrations (#6945)
Haverscript uses classical functional programming techniques to provide a composable interface for interacting with ollama-hosted LLMs.
2024-11-21 00:11:39 -08:00
drunkwcodes
7e92091751
readme: Terminal app bb7 to community integrations (#7064) 2024-11-21 00:03:11 -08:00
boessu
1a742f54c9
readme: update AMD ROCm links (#7213) 2024-11-20 23:48:55 -08:00
奶茶叔叔
6a89dcf848
readme: flutter-based chat app to community integrations (#7221) 2024-11-20 23:30:10 -08:00
Alexander F. Rødseth
c5e238e8e5
readme: orbiton to community integrations (#7770) 2024-11-20 23:24:05 -08:00
Nikita Ganzikov
fce30f407a
app: typo in wintray messages const (#7705) 2024-11-20 22:01:58 -08:00
Daniel Hiltgen
d863298210
docs: Link to AMD guide on multi-GPU guidance (#7744) 2024-11-20 16:00:46 -08:00
Jesse Gross
c4b34f2a2a runner.go: Truncate inputs that exceed context rather than shifting
Previous versions of the runner would truncate inputs to the context
window before beginning processing. The main processing loop relied
on this behavior if the context needed to be shifted later (due to
token generation). If truncation did not occur then invariants
would be broken, causing crashes or infinite loops.

Later versions attempted to fix these bugs and make the logic less
subtle so that all inputs could be handled. Truncation was removed
to make things consistent.

However, truncation is much faster than processing and shifting, so
removing it caused performance problems when the input vastly exceeded
the context size. This restores the input truncation as a performance
optimization while keeping the more robust processing logic.

Fixes #7762
2024-11-20 12:49:24 -08:00
Jesse Gross
c3ff916431 runner.go: Don't add inputs to cache view until actually processed
We need to track which tokens are in the cache ourselves. We currently
add tokens to the cache tracker when we add them to batch but they are
not actually in the cache until we call Decode. This can cause
confusion when we are shifting the cache.

Avoids "could not find a KV slot for the batch" issues.

Bug #7545
2024-11-20 12:49:24 -08:00
Jesse Gross
3fc1dc0e6f runner.go: Hard fail on errors rather than potentially infinite looping
We try to recover from errors by dropping the tokens that caused the
problem and re-trying. However, dropping the tokens is not correct
and continuing often leads to infinite loops. To avoid, this we
end the sequence if such a condition is detected, which is also
surprising.

At this point, it is better to just report the error. This will make
it easier to find problems and the alternatives are perhaps even more
surprising to users.

This is not a very satisfactory solution either - we should isolate
the error and return it to the user without killing the whole process.
However, this is an incremental step and consistent with most other
failures (which either manifest as abort() or panic).
2024-11-20 12:49:24 -08:00
Jesse Gross
7121dfa309 runner.go: Retry decoding after defragmentation if needed
Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.
2024-11-20 12:49:24 -08:00
Jesse Gross
5f68fcab12 runner.go: Use correct index when retrieving embedding results
This doesn't have any impact currently because NUM_PARALLEL is forced
to 1 for embeddings, so both indicies will always be 0.
2024-11-20 12:49:24 -08:00
Emir Sahin
ecf41eed05
readme: add llm-axe to community integrations (#5931) 2024-11-20 10:53:14 -08:00
Marcus Ziadé
b8c66d3307
readme: add a swift community integration (#7383) 2024-11-20 10:49:15 -08:00
thewh1teagle
303f4bc79e
readme: add vibe app to community integrations (#7607) 2024-11-20 10:45:10 -08:00