Commit graph

3244 commits

Author SHA1 Message Date
Daniel Hiltgen
074dc3b9d8 Integration fixes 2024-05-10 14:20:10 -07:00
Daniel Hiltgen
86f9b582d5
Merge pull request #4323 from dhiltgen/sort_by_free
Always use the sorted list of GPUs
2024-05-10 14:12:15 -07:00
Daniel Hiltgen
4142c3ef7c Always use the sorted list of GPUs
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan
6602e793c0
Use --quantize flag and quantize api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Michael Yang
ea0fdaed28
Merge pull request #4320 from ollama/mxyng/phi2-mem
add phi2 mem
2024-05-10 12:35:08 -07:00
Michael Yang
1eb382da5a add phi2 mem 2024-05-10 12:13:28 -07:00
Jeffrey Morgan
bb6fd02298
Don't clamp ctx size in PredictServerFit (#4317)
* dont clamp ctx size in `PredictServerFit`

* minimum 4 context

* remove context warning
2024-05-10 10:17:12 -07:00
Daniel Hiltgen
7e2bceceee
Merge pull request #4316 from dhiltgen/more_buffer
Bump VRAM buffer back up
2024-05-10 10:02:34 -07:00
Daniel Hiltgen
30a7d7096c Bump VRAM buffer back up
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Michael Yang
200a18820e
Merge pull request #4306 from ollama/mxyng/fix-routes 2024-05-10 08:58:16 -07:00
Michael Yang
e03637176d fix(routes): skip bad manifests 2024-05-10 08:46:11 -07:00
Bruce MacDonald
c02db93243 omit empty done reason 2024-05-09 16:45:29 -07:00
Michael Yang
ffa4d5134a
Merge pull request #4305 from ollama/mxyng/typo
fix typo
2024-05-09 16:42:09 -07:00
Jeffrey Morgan
302d7fdbf3
prune partial downloads (#4272) 2024-05-09 16:35:20 -07:00
Michael Yang
cf442cd57e fix typo 2024-05-09 16:23:37 -07:00
Michael Yang
0e1ba65855
Merge pull request #4302 from ollama/mxyng/forward-env
only forward some env vars
2024-05-09 16:21:05 -07:00
Michael Yang
6aad333c63
Merge pull request #4298 from ollama/mxyng/log-cleanup
log clean up
2024-05-09 16:20:57 -07:00
Daniel Hiltgen
4fcc84e67a
Merge pull request #4304 from dhiltgen/signals
Fix race in shutdown logic
2024-05-09 15:58:44 -07:00
Daniel Hiltgen
3ae2f441e0 Fix race in shutdown logic
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Zander Lewis
2abb3f6424
Update README.md (#4300) 2024-05-09 15:30:49 -07:00
Michael Yang
ce3b212d12 only forward some env vars 2024-05-09 15:16:09 -07:00
Daniel Hiltgen
83d6d46e29
Merge pull request #4299 from dhiltgen/handle_vram_reporting_lag
Wait for GPU free memory reporting to converge
2024-05-09 15:08:56 -07:00
Daniel Hiltgen
354ad9254e Wait for GPU free memory reporting to converge
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Michael Yang
58876091f7 log clean up 2024-05-09 14:55:36 -07:00
Daniel Hiltgen
dc18eee39d
Merge pull request #4238 from dhiltgen/gpu_info
Record more GPU information
2024-05-09 14:26:58 -07:00
Daniel Hiltgen
8727a9c140 Record more GPU information
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Daniel Hiltgen
d0425f26cf
Merge pull request #4294 from dhiltgen/harden_subprocess_reaping
Harden subprocess reaping
2024-05-09 14:02:16 -07:00
Bruce MacDonald
cfa84b8470
add done_reason to the api (#4235) 2024-05-09 13:30:14 -07:00
Michael Yang
1580ed4c06
Merge pull request #4295 from ollama/mxyng/fix-list
routes: skip invalid filepaths
2024-05-09 11:37:34 -07:00
Michael Yang
a7ee84fc31 routes: skip invalid filepaths 2024-05-09 11:23:22 -07:00
Daniel Hiltgen
84ac7ce139 Refine subprocess reaping 2024-05-09 11:21:31 -07:00
tusharhero
788b092c49
docs: add Guix package manager in README. (#4040) 2024-05-09 11:10:24 -07:00
J S
5cde17a096
Add PromptingTools.jl (#2192) 2024-05-09 09:39:05 -07:00
Daniel Hiltgen
c3837eb08c
Merge pull request #4289 from dhiltgen/doc_container_workarounds
Doc container usage and workaround for nvidia errors
2024-05-09 09:27:29 -07:00
Daniel Hiltgen
8cc0ee2efe Doc container usage and workaround for nvidia errors 2024-05-09 09:26:45 -07:00
Jeffrey Morgan
d5eec16d23
use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale (#1983) 2024-05-09 09:06:13 -07:00
Carlos Gamez
daa1a032f7
Update langchainjs.md (#2027)
Updated sample code as per warning notification from the package maintainers
2024-05-08 20:21:03 -07:00
jmorganca
6042e8bc57 remove bash-comparemodels example 2024-05-08 19:49:45 -07:00
Daniel Hiltgen
920a4b0794 Merge remote-tracking branch 'upstream/main' into pr3702 2024-05-08 16:44:35 -07:00
Daniel Hiltgen
ee49844d09
Merge pull request #4153 from dhiltgen/gpu_verbose_response
Add GPU usage
2024-05-08 16:39:11 -07:00
Daniel Hiltgen
8a516ac862
Merge pull request #4241 from dhiltgen/fix_tmp_override
Detect noexec and report a better error
2024-05-08 15:34:22 -07:00
Daniel Hiltgen
bee2f4a3b0 Record GPU usage information
This records more GPU usage information for eventual UX inclusion.
2024-05-08 14:45:39 -07:00
Bruce MacDonald
cef45feaa4
Add preflight OPTIONS handling and update CORS config (#4086)
* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang
2687f02c96
Merge pull request #4265 from ollama/mxyng/fix-show-llava
routes: fix show llava models
2024-05-08 12:51:21 -07:00
Michael Yang
b25976aeb8 routes: fix show llava models 2024-05-08 12:43:36 -07:00
Michael Yang
001f167aad
Merge pull request #4261 from ollama/mxyng/fix-tag-case
types/model: fix tag case
2024-05-08 11:09:47 -07:00
Michael Yang
486a2c1d94 types/model: fix tag case 2024-05-08 08:47:16 -07:00
Michael Yang
88cf154483
Merge pull request #4244 from ollama/mxyng/skip-if-same
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald
8cbd3e7510
skip hidden files in list models handler (#4247) 2024-05-07 19:01:45 -07:00
Michael Yang
eeb695261f skip if same quantization 2024-05-07 17:44:19 -07:00