Commit graph

61 commits

Author SHA1 Message Date
Jeffrey Morgan
5cba29b9d6
JSON mode: add `"format" as an api parameter (#1051)
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Bruce MacDonald
1ae84bc2a2
skip gpu if less than 2GB VRAM are available (#1059) 2023-11-09 13:16:16 -08:00
Jeffrey Morgan
c44b619428 remove unused fmt.Println 2023-11-03 17:24:58 -07:00
Jeffrey Morgan
17678b7225 Restore system prompt on requests and default num_keep to 0 2023-11-03 13:25:25 -07:00
Jeffrey Morgan
2e53704685
default rope params to 0 for new models (#968) 2023-11-02 08:41:30 -07:00
Michael Yang
642128b75a append LD_LIBRARY_PATH 2023-10-31 15:54:49 -07:00
Bruce MacDonald
6d283882b1
catch insufficient permissions nvidia err (#934) 2023-10-27 12:42:40 -04:00
Bruce MacDonald
2665f3c28e
offload 75% of available vram to improve stability (#921) 2023-10-26 20:49:55 -04:00
Jeffrey Morgan
7ed5a39bc7 simpler check for model loading compatibility errors 2023-10-19 14:50:49 -04:00
Jeffrey Morgan
a7dad24d92
add error for falcon and starcoder vocab compatibility (#844)
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-19 12:18:31 -04:00
Michael Yang
235e43d7f6
Merge pull request #833 from discovertomorrow/leadingspace
Fix Issue with Leading Whitespaces in Decoded Context
2023-10-18 13:52:48 -07:00
Arne Müller
730996e530 use TrimPrefix instead of TrimLeft 2023-10-18 22:51:30 +02:00
Arne Müller
ce6197a8e0 removed redundant strings.CutPrefix from Decode 2023-10-18 22:47:20 +02:00
Arne Müller
46b9953f32 use strings.TrimLeft to remove spaces 2023-10-18 22:41:19 +02:00
Bruce MacDonald
565648f3f7
relay CUDA errors to the client (#825) 2023-10-18 15:36:56 -04:00
Arne Müller
90c49bed57 moved removal of leading space into Predict 2023-10-18 20:08:26 +02:00
Arne Müller
5dc0cff459 fix whitespace removal 2023-10-18 08:15:27 +02:00
Michael Yang
b36b0b71f8 use cut prefix 2023-10-17 14:01:39 -07:00
Michael Yang
094df37563 remove unused struct 2023-10-17 14:01:38 -07:00
Bruce MacDonald
bd93a94abd
fix MB VRAM log output (#824) 2023-10-17 15:35:16 -04:00
Michael Yang
f55bdb6f10
Merge pull request #799 from deichbewohner/jsonmarshaling
Fix JSON Marshal Escaping for Special Characters
2023-10-17 08:46:02 -07:00
Michael Yang
2870a9bfc8
Merge pull request #812 from jmorganca/mxyng/fix-format-string
fix: wrong format string type
2023-10-17 08:40:49 -07:00
Arne Müller
8fa3f366ad Removed newline trimming and used buffer directly in POST request. 2023-10-17 08:17:35 +02:00
Michael Yang
fddb303f23 fix: format string wrong type 2023-10-16 16:14:28 -07:00
Michael Yang
cb4a80b693 fix: regression unsupported metal types
omitting `--n-gpu-layers` means use metal on macos which isn't correct
since ollama uses `num_gpu=0` to explicitly disable gpu for file types
that are not implemented in metal
2023-10-16 14:37:20 -07:00
Arne Müller
ee94693b1a handling unescaped json marshaling 2023-10-16 11:15:55 +02:00
Michael Yang
11d82d7b9b update checkvram 2023-10-13 14:47:29 -07:00
Michael Yang
92189a5855 fix memory check 2023-10-13 14:47:29 -07:00
Michael Yang
d790bf9916
Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading
fix: offloading on low end GPUs
2023-10-13 14:36:44 -07:00
Michael Yang
35afac099a do not use gpu binary when num_gpu == 0 2023-10-13 14:32:12 -07:00
Michael Yang
811c3d1900 no gpu if vram < 2GB 2023-10-13 14:32:12 -07:00
Bruce MacDonald
6fe178134d
improve api error handling (#781)
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Bruce MacDonald
56497663c8
relay model runner error message to client (#720)
* give direction to user when runner fails
* also relay errors from timeout
* increase timeout to 3 minutes
2023-10-12 11:16:37 -04:00
Michael Yang
b599946b74 add format bytes 2023-10-11 14:08:23 -07:00
Bruce MacDonald
77295f716e
prevent waiting on exited command (#752)
* prevent waiting on exited command
* close llama runner once
2023-10-11 12:32:13 -04:00
Bruce MacDonald
f2ba1311aa
improve vram safety with 5% vram memory buffer (#724)
* check free memory not total
* wait for subprocess to exit
2023-10-10 16:16:09 -04:00
Bruce MacDonald
5d22319a2c
rename server subprocess (#700)
- this makes it easier to see that the subprocess is associated with ollama
2023-10-06 10:15:42 -04:00
Bruce MacDonald
9e2de1bd2c
increase streaming buffer size (#692) 2023-10-04 14:09:00 -04:00
Michael Yang
c02c0cd483 starcoder 2023-10-02 19:56:51 -07:00
Bruce MacDonald
b1f7123301
clean up num_gpu calculation code (#673) 2023-10-02 14:53:42 -04:00
Bruce MacDonald
1fbf3585d6
Relay default values to llama runner (#672)
* include seed in params for llama.cpp server and remove empty filter for temp

* relay default predict options to llama.cpp

- reorganize options to match predict request for readability

* omit empty stop

---------

Co-authored-by: hallh <hallh@users.noreply.github.com>
2023-10-02 14:53:16 -04:00
Bruce MacDonald
9771b1ec51
windows runner fixes (#637) 2023-09-29 11:47:55 -04:00
Michael Yang
f40b3de758 use int64 consistently 2023-09-28 11:07:24 -07:00
Bruce MacDonald
86279f4ae3
unbound max num gpu layers (#591)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-25 18:36:46 -04:00
Bruce MacDonald
4cba75efc5
remove tmp directories created by previous servers (#559)
* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Bruce MacDonald
1255bc9b45 only package 11.8 runner 2023-09-20 20:00:41 +01:00
Bruce MacDonald
4e8be787c7 pack in cuda libs 2023-09-20 17:40:42 +01:00
Bruce MacDonald
66003e1d05
subprocess improvements (#524)
* subprocess improvements

- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages

* Update llama.go

* Update llama.go

* simplify by using glob
2023-09-18 15:16:32 -04:00
Bruce MacDonald
2540c9181c
support for packaging in multiple cuda runners (#509)
* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-14 15:08:13 -04:00
Michael Yang
7dee25a07f fix falcon decode
get model and file type from bin file
2023-09-12 12:34:53 -07:00