Bruce MacDonald
2665f3c28e
offload 75% of available vram to improve stability ( #921 )
2023-10-26 20:49:55 -04:00
Jeffrey Morgan
b0c9cd0f3b
fix metal assertion errors
2023-10-24 00:32:36 -07:00
Jeffrey Morgan
77f61c6301
update submodule commit
2023-10-24 00:30:27 -07:00
Jeffrey Morgan
f3604534e5
update submodule commit
2023-10-23 23:59:12 -07:00
Michael Yang
0c7a00a264
bump submodules
...
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
2023-10-23 11:17:59 -07:00
Michael Yang
36c160f1c3
Merge pull request #881 from jmorganca/mxyng/ggufv3
...
ggufv3
2023-10-23 10:50:45 -07:00
Michael Yang
c9167494cb
update default log target
2023-10-23 10:44:50 -07:00
Michael Yang
125d0a013a
ggufv3
...
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.
loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Jeffrey Morgan
7ed5a39bc7
simpler check for model loading compatibility errors
2023-10-19 14:50:49 -04:00
Jeffrey Morgan
a7dad24d92
add error for falcon
and starcoder
vocab compatibility ( #844 )
...
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-19 12:18:31 -04:00
Michael Yang
235e43d7f6
Merge pull request #833 from discovertomorrow/leadingspace
...
Fix Issue with Leading Whitespaces in Decoded Context
2023-10-18 13:52:48 -07:00
Arne Müller
730996e530
use TrimPrefix instead of TrimLeft
2023-10-18 22:51:30 +02:00
Arne Müller
ce6197a8e0
removed redundant strings.CutPrefix from Decode
2023-10-18 22:47:20 +02:00
Arne Müller
46b9953f32
use strings.TrimLeft to remove spaces
2023-10-18 22:41:19 +02:00
Bruce MacDonald
565648f3f7
relay CUDA errors to the client ( #825 )
2023-10-18 15:36:56 -04:00
Arne Müller
90c49bed57
moved removal of leading space into Predict
2023-10-18 20:08:26 +02:00
Arne Müller
5dc0cff459
fix whitespace removal
2023-10-18 08:15:27 +02:00
Michael Yang
08b0e04f40
Merge pull request #813 from jmorganca/mxyng/llama
...
refactor llm/llama.go
2023-10-17 14:05:58 -07:00
Michael Yang
b36b0b71f8
use cut prefix
2023-10-17 14:01:39 -07:00
Michael Yang
094df37563
remove unused struct
2023-10-17 14:01:38 -07:00
Bruce MacDonald
f3648fd206
Update llama.cpp gguf to latest ( #710 )
2023-10-17 16:55:16 -04:00
Bruce MacDonald
bd93a94abd
fix MB VRAM log output ( #824 )
2023-10-17 15:35:16 -04:00
Michael Yang
f55bdb6f10
Merge pull request #799 from deichbewohner/jsonmarshaling
...
Fix JSON Marshal Escaping for Special Characters
2023-10-17 08:46:02 -07:00
Michael Yang
2870a9bfc8
Merge pull request #812 from jmorganca/mxyng/fix-format-string
...
fix: wrong format string type
2023-10-17 08:40:49 -07:00
Arne Müller
8fa3f366ad
Removed newline trimming and used buffer directly in POST request.
2023-10-17 08:17:35 +02:00
Michael Yang
fddb303f23
fix: format string wrong type
2023-10-16 16:14:28 -07:00
Michael Yang
cb4a80b693
fix: regression unsupported metal types
...
omitting `--n-gpu-layers` means use metal on macos which isn't correct
since ollama uses `num_gpu=0` to explicitly disable gpu for file types
that are not implemented in metal
2023-10-16 14:37:20 -07:00
Arne Müller
ee94693b1a
handling unescaped json marshaling
2023-10-16 11:15:55 +02:00
Michael Yang
11d82d7b9b
update checkvram
2023-10-13 14:47:29 -07:00
Michael Yang
36fe2deebf
only check system memory on macos
2023-10-13 14:47:29 -07:00
Michael Yang
4a8931f634
check total (system + video) memory
2023-10-13 14:47:29 -07:00
Michael Yang
bd6e38fb1a
refactor memory check
2023-10-13 14:47:29 -07:00
Michael Yang
92189a5855
fix memory check
2023-10-13 14:47:29 -07:00
Michael Yang
d790bf9916
Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading
...
fix: offloading on low end GPUs
2023-10-13 14:36:44 -07:00
Michael Yang
35afac099a
do not use gpu binary when num_gpu == 0
2023-10-13 14:32:12 -07:00
Michael Yang
811c3d1900
no gpu if vram < 2GB
2023-10-13 14:32:12 -07:00
Bruce MacDonald
6fe178134d
improve api error handling ( #781 )
...
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Bruce MacDonald
56497663c8
relay model runner error message to client ( #720 )
...
* give direction to user when runner fails
* also relay errors from timeout
* increase timeout to 3 minutes
2023-10-12 11:16:37 -04:00
Michael Yang
b599946b74
add format bytes
2023-10-11 14:08:23 -07:00
Bruce MacDonald
77295f716e
prevent waiting on exited command ( #752 )
...
* prevent waiting on exited command
* close llama runner once
2023-10-11 12:32:13 -04:00
Bruce MacDonald
f2ba1311aa
improve vram safety with 5% vram memory buffer ( #724 )
...
* check free memory not total
* wait for subprocess to exit
2023-10-10 16:16:09 -04:00
Jeffrey Morgan
ab0668293c
llm: fix build on amd64
2023-10-06 14:39:54 -07:00
Bruce MacDonald
5d22319a2c
rename server subprocess ( #700 )
...
- this makes it easier to see that the subprocess is associated with ollama
2023-10-06 10:15:42 -04:00
Bruce MacDonald
d06bc0cb6e
enable q8, q5, 5_1, and f32 for linux gpu ( #699 )
2023-10-05 12:53:47 -04:00
Bruce MacDonald
9e2de1bd2c
increase streaming buffer size ( #692 )
2023-10-04 14:09:00 -04:00
Michael Yang
c02c0cd483
starcoder
2023-10-02 19:56:51 -07:00
Bruce MacDonald
b1f7123301
clean up num_gpu calculation code ( #673 )
2023-10-02 14:53:42 -04:00
Bruce MacDonald
1fbf3585d6
Relay default values to llama runner ( #672 )
...
* include seed in params for llama.cpp server and remove empty filter for temp
* relay default predict options to llama.cpp
- reorganize options to match predict request for readability
* omit empty stop
---------
Co-authored-by: hallh <hallh@users.noreply.github.com>
2023-10-02 14:53:16 -04:00
Bruce MacDonald
9771b1ec51
windows runner fixes ( #637 )
2023-09-29 11:47:55 -04:00
Michael Yang
f40b3de758
use int64 consistently
2023-09-28 11:07:24 -07:00