Blake Mizerany
7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )
...
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be
feat: add OLLAMA_DEBUG in ollama server help message ( #3461 )
...
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658
default head_kv to 1
2024-04-02 16:37:59 -07:00
Michael Yang
a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
...
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
...
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Daniel Hiltgen
841adda157
Fix windows lint CI flakiness
2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Daniel Hiltgen
c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
...
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Daniel Hiltgen
1f11b52511
Refined min memory from testing
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204
Release gpu discovery library after use
...
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process. This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5
Safeguard for noexec
...
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292
Detect too-old cuda driver
...
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6
Integration test improvements
...
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
...
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
...
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
...
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069
fix generate output
2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351
Add chromem-go to community integrations ( #3437 )
2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202
Update README.md ( #3436 )
2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69
Community Integration: CRAG Ollama Chat ( #3423 )
...
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗
Support:
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Yaroslav
e6fb39c182
Update README.md ( #3378 )
...
Plugins list updated
2024-03-31 13:10:05 -04:00
sugarforever
e1f1c374ea
Community Integration: ChatOllama ( #3400 )
...
* Community Integration: ChatOllama
* fixed typo
2024-03-30 22:46:50 -04:00
Jeffrey Morgan
06a1508bfe
Update 90_bug_report.yml
2024-03-29 10:11:17 -04:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Daniel Hiltgen
97ae517fbf
Merge pull request #3398 from dhiltgen/release_latest
...
CI automation for tagging latest images
2024-03-28 16:25:54 -07:00
Daniel Hiltgen
44b813e459
Merge pull request #3377 from dhiltgen/rocm_v6_bump
...
Bump ROCm to 6.0.2 patch release
2024-03-28 16:07:54 -07:00
Daniel Hiltgen
539043f5e0
CI automation for tagging latest images
2024-03-28 16:07:37 -07:00
Daniel Hiltgen
dbcace6847
Merge pull request #3392 from dhiltgen/ci_build_win_cuda
...
CI windows gpu builds
2024-03-28 16:03:52 -07:00
Daniel Hiltgen
c91a4ebcff
Bump ROCm to 6.0.2 patch release
2024-03-28 15:58:57 -07:00
Daniel Hiltgen
b79c7e4528
CI windows gpu builds
...
If we're doing generate, test windows cuda and rocm as well
2024-03-28 14:39:10 -07:00
Michael Yang
035b274b70
Merge pull request #3379 from ollama/mxyng/origins
...
fix: trim quotes on OLLAMA_ORIGINS
2024-03-28 14:14:18 -07:00
Michael Yang
9c6a254945
Merge pull request #3391 from ollama/mxyng-patch-1
2024-03-28 13:15:56 -07:00
Michael Yang
f31f2bedf4
Update troubleshooting link
2024-03-28 12:05:26 -07:00
Michael Yang
756c257553
Merge pull request #3380 from ollama/mxyng/conditional-generate
...
fix: workflows
2024-03-28 00:35:27 +01:00
Michael Yang
5255d0af8a
fix: workflows
2024-03-27 16:30:01 -07:00
Michael Yang
af8a8a6b59
fix: trim quotes on OLLAMA_ORIGINS
2024-03-27 15:24:29 -07:00
Michael Yang
461ad25015
Merge pull request #3376 from ollama/mxyng/conditional-generate
...
only generate on changes to llm subdirectory
2024-03-27 22:12:53 +01:00
Michael Yang
8838ae787d
stub stub
2024-03-27 13:59:12 -07:00
Michael Yang
db75402ade
mangle arch
2024-03-27 13:44:50 -07:00
Michael Yang
1e85a140a3
only generate on changes to llm subdirectory
2024-03-27 12:45:35 -07:00
Michael Yang
c363282fdc
Merge pull request #3375 from ollama/mxyng/conditional-generate
...
only generate cuda/rocm when changes to llm detected
2024-03-27 20:40:55 +01:00
Michael Yang
5b0c48d29e
only generate cuda/rocm when changes to llm detected
2024-03-27 12:23:09 -07:00