Commit graph

2828 commits

Author SHA1 Message Date
Mélony QIN
3f71ba406a
Correct the kubernetes terminology (#3843)
* add details on kubernetes deployment and separate the testing process

* Update examples/kubernetes/README.md

thanks for suggesting this change, I agree with you and let's make this project better together !

Co-authored-by: JonZeolla <Zeolla@gmail.com>

---------

Co-authored-by: QIN Mélony <MQN1@dsone.3ds.com>
Co-authored-by: JonZeolla <Zeolla@gmail.com>
2024-05-07 09:53:08 -07:00
Hause Lin
88a67127d8
Update README.md to include ollama-r library (#4012)
* Update README.md

Add Ollama for R - ollama-r library

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:52:30 -07:00
Jeffrey Morgan
f7dc7dcc64
Update .gitattributes 2024-05-07 09:50:19 -07:00
alwqx
04f971c84b
fix golangci workflow missing gofmt and goimports (#4190) 2024-05-07 09:49:40 -07:00
Michael Yang
548a7df014 update list handler to use model.Name 2024-05-07 09:38:45 -07:00
Michael Yang
70edb9bc4d
Merge pull request #4215 from ollama/mxyng/mem
llm: add minimum based on layer size
2024-05-07 09:26:33 -07:00
Michael Yang
3f0ed03856
Update examples/flyio/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:25:01 -07:00
Michael Yang
4736391bfb llm: add minimum based on layer size 2024-05-06 17:04:19 -07:00
CrispStrobe
7c5330413b
note on naming restrictions (#2625)
* note on naming restrictions

else push would fail with cryptic
retrieving manifest 
Error: file does not exist
==> maybe change that in code too

* Update docs/import.md

---------

Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 16:03:21 -07:00
Jeffrey Morgan
39d9d22ca3
close server on receiving signal (#4213) 2024-05-06 16:01:37 -07:00
Jackie Li
af47413dba
Add MarshalJSON to Duration (#3284)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-06 15:59:18 -07:00
Michael Yang
b2f00aa977 close zip files 2024-05-06 15:27:19 -07:00
Michael Yang
6694be5e50 convert/llama: use WriteSeeker 2024-05-06 15:24:01 -07:00
Michael Yang
f5e8b207fb s/DisplayLongest/String/ 2024-05-06 15:24:01 -07:00
Michael Yang
d245460362 only quantize language models 2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383 no iterator 2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d rebase 2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a comments 2024-05-06 15:24:01 -07:00
Michael Yang
a7248f6ea8 update tests 2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509 quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Jeffrey Chen
d091fe3c21
Windows automatically recognizes username (#3214) 2024-05-06 15:03:14 -07:00
Mohamed A. Fouad
ee02f548c8
Update linux.md (#3847)
Add -e to viewing logs in order to show end of ollama logs
2024-05-06 15:02:25 -07:00
Daniel Hiltgen
b08870aff3
Merge pull request #4188 from dhiltgen/use_our_lib
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Darinka
3ecae420ac
Update api.md (#3945)
* Update api.md

Changed the calculation of tps (token/s) in the documentation

* Update docs/api.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 14:39:58 -07:00
Daniel Hiltgen
4cbbf0e13b
Merge pull request #4090 from dhiltgen/rocm_paths
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen
380378cc80 Use our libraries first
Trying to live off the land for cuda libraries was not the right strategy.  We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen
0963c65027
Merge pull request #4208 from dhiltgen/fix_sched_test
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan
ed740a2504
Fix no slots available error with concurrent requests (#4160) 2024-05-06 14:22:53 -07:00
Jeffrey Morgan
c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners (#4189) 2024-05-06 14:22:24 -07:00
Daniel Hiltgen
0a954e5066 Fix stale test logic
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Adrien Brault
aa93423fbf
docs: pbcopy on mac (#3129) 2024-05-06 13:47:00 -07:00
Nurgo
01c9386267
Add BrainSoup to compatible clients list (#3473) 2024-05-06 13:42:16 -07:00
Daniel Hiltgen
af9eb36f9f
Merge pull request #4135 from dhiltgen/no_physx
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396
Merge pull request #4067 from dhiltgen/cudart
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr
86b7fcac32
Update README.md with StreamDeploy (#3621)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-06 11:14:41 -07:00
Hyden Liu
fb8ddc564e
chore: delete HEAD (#4194) 2024-05-06 10:32:30 -07:00
Saif
242efe6611
👌 IMPROVE: add portkey library for production tools (#4119) 2024-05-06 10:25:23 -07:00
Jeffrey Morgan
1b0e6c9c0e
Fix llava models not working after first request (#4164)
* fix llava models not working after first request

* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan
dfa2f32ca0
unload in critical section (#4187) 2024-05-05 17:18:27 -07:00
Daniel Hiltgen
840424a2c4
Merge pull request #4154 from dhiltgen/central_config
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen
f56aa20014 Centralize server config handling
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx
6707768ebd
chore: format go code (#4149) 2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING
c78bb76a12
update libraries for langchain_community + llama3 changed from llama2 (#4174) 2024-05-05 16:07:04 -07:00
Jeffrey Morgan
942c979232
allocate a large enough kv cache for all parallel requests (#4162) 2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning
06164911dd
Update README.md (#4111)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-05 14:45:32 -07:00
Patrick Devine
2a21363bb7
validate the format of the digest when getting the model path (#4175) 2024-05-05 11:46:12 -07:00
Daniel Hiltgen
026869915f
Merge pull request #4144 from dhiltgen/max_queue
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen
45d61aaaa3 Add integration test to push max queue limits 2024-05-05 10:46:25 -07:00
Daniel Hiltgen
20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen
371f5e52aa
Merge pull request #4141 from dhiltgen/win_docs
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00