Michael Yang
a5ec9cfc0f
Merge pull request #3508 from ollama/mxyng/rope
2024-04-05 18:46:06 -07:00
Michael Yang
be517e491c
no rope parameters
2024-04-05 18:05:27 -07:00
Michael Yang
fc8e108642
Merge pull request #3496 from ollama/mxyng/cmd-r-graph
...
add command-r graph estimate
2024-04-05 12:26:21 -07:00
Daniel Hiltgen
c5d5c4a96c
Merge pull request #3491 from dhiltgen/context_bust_test
...
Add test case for context exhaustion
2024-04-04 16:20:20 -07:00
Daniel Hiltgen
dfe330fa1c
Merge pull request #3488 from mofanke/fix-windows-dll-compress
...
fix dll compress in windows building
2024-04-04 16:12:13 -07:00
Michael Yang
01f77ae25d
add command-r graph estimate
2024-04-04 14:07:24 -07:00
Daniel Hiltgen
483b81a863
Merge pull request #3494 from dhiltgen/ci_release
...
Fail fast if mingw missing on windows
2024-04-04 10:15:40 -07:00
Daniel Hiltgen
36bd967722
Fail fast if mingw missing on windows
2024-04-04 09:51:26 -07:00
Jeffrey Morgan
b0e7d35db8
use an older version of the mac os sdk in release ( #3484 )
2024-04-04 09:48:54 -07:00
Daniel Hiltgen
aeb1fb5192
Add test case for context exhaustion
...
Confirmed this fails on 0.1.30 with known regression
but passes on main
2024-04-04 07:42:17 -07:00
Daniel Hiltgen
a2e60ebcaf
Merge pull request #3490 from dhiltgen/ci_fixes
...
CI missing archive
2024-04-04 07:24:24 -07:00
Daniel Hiltgen
883ec4d1ef
CI missing archive
2024-04-04 07:23:27 -07:00
mofanke
4de0126719
fix dll compress in windows building
2024-04-04 21:27:33 +08:00
Daniel Hiltgen
9768e2dc75
Merge pull request #3481 from dhiltgen/ci_fixes
...
CI subprocess path fix
2024-04-03 19:29:09 -07:00
Daniel Hiltgen
08600d5bec
CI subprocess path fix
2024-04-03 19:12:53 -07:00
Daniel Hiltgen
a624e672d2
Merge pull request #3479 from dhiltgen/ci_fixes
...
Fix CI release glitches
2024-04-03 18:42:27 -07:00
Daniel Hiltgen
e4a7e5b2ca
Fix CI release glitches
...
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
2024-04-03 16:41:40 -07:00
Michael Yang
a0a15cfd5b
Merge pull request #3463 from ollama/mxyng/graph-estimate
...
update graph size estimate
2024-04-03 14:27:30 -07:00
Michael Yang
12e923e158
update graph size estimate
2024-04-03 13:34:12 -07:00
Jeffrey Morgan
cd135317d2
Fix macOS builds on older SDKs ( #3467 )
2024-04-03 10:45:54 -07:00
Michael Yang
4f895d633f
Merge pull request #3466 from ollama/mxyng/head-kv
...
default head_kv to 1
2024-04-03 10:41:00 -07:00
Blake Mizerany
7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )
...
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be
feat: add OLLAMA_DEBUG in ollama server help message ( #3461 )
...
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658
default head_kv to 1
2024-04-02 16:37:59 -07:00
Michael Yang
a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
...
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
...
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Daniel Hiltgen
841adda157
Fix windows lint CI flakiness
2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Daniel Hiltgen
c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
...
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Daniel Hiltgen
1f11b52511
Refined min memory from testing
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204
Release gpu discovery library after use
...
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process. This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5
Safeguard for noexec
...
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292
Detect too-old cuda driver
...
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6
Integration test improvements
...
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
...
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
...
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
...
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069
fix generate output
2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351
Add chromem-go to community integrations ( #3437 )
2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202
Update README.md ( #3436 )
2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69
Community Integration: CRAG Ollama Chat ( #3423 )
...
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗
Support:
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Yaroslav
e6fb39c182
Update README.md ( #3378 )
...
Plugins list updated
2024-03-31 13:10:05 -04:00