Nurgo
01c9386267
Add BrainSoup to compatible clients list ( #3473 )
2024-05-06 13:42:16 -07:00
Daniel Hiltgen
af9eb36f9f
Merge pull request #4135 from dhiltgen/no_physx
...
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396
Merge pull request #4067 from dhiltgen/cudart
...
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr
86b7fcac32
Update README.md with StreamDeploy ( #3621 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-06 11:14:41 -07:00
Hyden Liu
fb8ddc564e
chore: delete HEAD
( #4194 )
2024-05-06 10:32:30 -07:00
Saif
242efe6611
👌 IMPROVE: add portkey library for production tools ( #4119 )
2024-05-06 10:25:23 -07:00
Jeffrey Morgan
1b0e6c9c0e
Fix llava models not working after first request ( #4164 )
...
* fix llava models not working after first request
* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan
dfa2f32ca0
unload in critical section ( #4187 )
2024-05-05 17:18:27 -07:00
Daniel Hiltgen
840424a2c4
Merge pull request #4154 from dhiltgen/central_config
...
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx
6707768ebd
chore: format go code ( #4149 )
2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING
c78bb76a12
update libraries for langchain_community + llama3 changed from llama2 ( #4174 )
2024-05-05 16:07:04 -07:00
Jeffrey Morgan
942c979232
allocate a large enough kv cache for all parallel requests ( #4162 )
2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning
06164911dd
Update README.md ( #4111 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-05 14:45:32 -07:00
Patrick Devine
2a21363bb7
validate the format of the digest when getting the model path ( #4175 )
2024-05-05 11:46:12 -07:00
Daniel Hiltgen
026869915f
Merge pull request #4144 from dhiltgen/max_queue
...
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen
45d61aaaa3
Add integration test to push max queue limits
2024-05-05 10:46:25 -07:00
Daniel Hiltgen
20f6c06569
Make maximum pending request configurable
...
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen
371f5e52aa
Merge pull request #4141 from dhiltgen/win_docs
...
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00
Daniel Hiltgen
e006480e49
Explain the 2 different windows download options
2024-05-04 12:50:05 -07:00
Michael Yang
aed545872d
Merge pull request #4143 from ollama/mxyng/final-response
...
omit prompt and generate settings from final response
2024-05-03 17:39:49 -07:00
Michael Yang
44869c59d6
omit prompt and generate settings from final response
2024-05-03 17:00:02 -07:00
Daniel Hiltgen
52663284cf
Merge pull request #4145 from dhiltgen/fix_lint
...
Fix lint warnings
2024-05-03 16:53:17 -07:00
Daniel Hiltgen
42fa9d7f0a
Fix lint warnings
2024-05-03 16:44:19 -07:00
Michael Yang
b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
...
rename parser to model/file
2024-05-03 13:01:22 -07:00
Dr Nic Williams
e8aaea030e
Update 'llama2' -> 'llama3' in most places ( #4116 )
...
* Update 'llama2' -> 'llama3' in most places
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-03 15:25:04 -04:00
Daniel Hiltgen
b1ad3a43cb
Skip PhysX cudart library
...
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen
267e25a750
Merge pull request #4129 from dhiltgen/unit_tests
...
Soften timeouts on sched unit tests
2024-05-03 11:10:26 -07:00
Daniel Hiltgen
9a32c514cb
Soften timeouts on sched unit tests
...
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang
e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
...
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
93707fa3f2
Merge pull request #4108 from ollama/mxyng/lf
...
fix line ending
2024-05-02 14:55:15 -07:00
Michael Yang
94c369095f
fix line ending
...
replace CRLF with LF
2024-05-02 14:53:13 -07:00
Jeffrey Morgan
9164b0161b
Update .gitattributes
2024-05-02 14:06:31 -04:00
Daniel Hiltgen
e592e8fccb
Support Fedoras standard ROCm location
2024-05-01 15:47:12 -07:00
Bryce Reitano
bf4fc25f7b
Add a /clear command ( #3947 )
...
* Add a /clear command
* change help messages
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-01 17:44:36 -04:00
Michael Yang
5b806d8d24
Merge pull request #4089 from ollama/mxyng/target-invalid
...
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang
cb1e072643
Merge pull request #4087 from ollama/mxyng/fix-host-port
...
types/model: fix name for hostport
2024-05-01 12:42:07 -07:00
Michael Yang
45b6a12e45
server: target invalid
2024-05-01 12:40:45 -07:00
alwqx
68755f1f5e
chore: fix typo in docs/development.md ( #4073 )
2024-05-01 15:39:11 -04:00
Michael Yang
997a455039
want filepath
2024-05-01 12:33:41 -07:00
Michael Yang
88775e1ff9
strip scheme from name
2024-05-01 12:26:19 -07:00
Michael Yang
8867e744ff
types/model: fix name for hostport
2024-05-01 12:14:53 -07:00
Daniel Hiltgen
4fd064bea6
Merge pull request #4031 from MarkWard0110/fix/issue-3736
...
Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings.
2024-05-01 12:13:26 -07:00
Jeffrey Morgan
59fbceedcc
use lf for line endings ( #4085 )
2024-05-01 15:02:45 -04:00
Mark Ward
321d57e1a0
Removing go routine calling .wait from load.
2024-05-01 18:51:10 +00:00
Mark Ward
ba26c7aa00
it will always return an error due to Kill() discarding Wait() errors
2024-05-01 18:51:10 +00:00
Mark Ward
63c763685f
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
...
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
2024-05-01 18:51:10 +00:00
Mark Ward
34a4a94f13
ignore debug bin files
2024-05-01 18:51:10 +00:00
Mark Ward
f4a73d57a4
fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
2024-05-01 18:51:10 +00:00
Mark Ward
948114e3e3
fix sched to wait for the runner to terminate to ensure following vram check will be more accurate
2024-05-01 18:51:10 +00:00