Commit graph

15 commits

Author SHA1 Message Date
Michael Yang
72e7a49aa9 seek instead of copyn 2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40 split from into one or more models 2023-12-04 16:59:23 -08:00
Michael Yang
b2816bca67 unnecessary ReadSeeker for DecodeGGML 2023-12-04 16:59:23 -08:00
Michael Yang
125d0a013a ggufv3
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Michael Yang
c02c0cd483 starcoder 2023-10-02 19:56:51 -07:00
Bruce MacDonald
86279f4ae3
unbound max num gpu layers (#591)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-25 18:36:46 -04:00
Bruce MacDonald
4cba75efc5
remove tmp directories created by previous servers (#559)
* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Bruce MacDonald
66003e1d05
subprocess improvements (#524)
* subprocess improvements

- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages

* Update llama.go

* Update llama.go

* simplify by using glob
2023-09-18 15:16:32 -04:00
Bruce MacDonald
2540c9181c
support for packaging in multiple cuda runners (#509)
* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-14 15:08:13 -04:00
Michael Yang
7dee25a07f fix falcon decode
get model and file type from bin file
2023-09-12 12:34:53 -07:00
Bruce MacDonald
09dd2aeff9
GGUF support (#441) 2023-09-07 13:55:37 -04:00
Michael Yang
b1cececb8e add 34b model type 2023-08-24 10:35:44 -07:00
Michael Yang
a894cc792d model and file type as strings 2023-08-17 12:08:04 -07:00
Michael Yang
6ed991c8e2 ggml: fix off by one error
remove used Unknown FileType
2023-08-11 10:45:22 -07:00
Michael Yang
fccf8d179f partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00