Jeffrey Morgan
c79f8c9c39
Ensure nvidia
and nvidia_uvm
kernel modules are loaded in install.sh
script and at startup ( #4652 )
...
* ensure kernel modules are loaded in `install.sh` script and at startup
* indentation
* use `SUDO` variable
* restart if nouveau is detected
* consistent success message for AMD
2024-05-26 14:57:17 -07:00
Jeffrey Morgan
485016bfbb
Update install.sh
2024-05-26 11:46:00 -07:00
Daniel Hiltgen
0165ba1651
Merge pull request #4638 from dhiltgen/better_error
...
Report better warning on client closed abort of load
2024-05-25 14:32:28 -07:00
Daniel Hiltgen
c4209d6d21
Report better warning on client closed abort of load
...
If the client closes the connection before we finish loading the model
we abort, so lets make the log message clearer why to help users
understand this failure mode
2024-05-25 09:23:28 -07:00
Michael Yang
6adca97f37
Merge pull request #4619 from noxer/patch-1
...
Fix download retry issue
2024-05-24 17:21:57 -07:00
Michael Yang
9a3c8003c8
Merge pull request #4624 from ollama/mxyng/fix-5
...
fix q5_0, q5_1
2024-05-24 16:11:21 -07:00
Michael Yang
d51f15257c
Update llm/ggml.go
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-24 16:10:43 -07:00
Michael Yang
8f440d579a
fix q5_0, q5_1
2024-05-24 16:01:46 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1
Fix download retry issue
2024-05-24 20:30:42 +02:00
Jeffrey Morgan
afd2b058b4
set codesign timeout to longer ( #4605 )
2024-05-23 22:46:23 -07:00
Wang,Zhe
fd5971be0b
support ollama run on Intel GPUs
2024-05-24 11:18:27 +08:00
Daniel Hiltgen
89bf98bcf2
Merge pull request #4598 from dhiltgen/docs
...
Tidy up developer guide a little
2024-05-23 15:14:29 -07:00
Daniel Hiltgen
1b2d156094
Tidy up developer guide a little
2024-05-23 15:14:05 -07:00
Michael Yang
714adb8bd1
bump ( #4597 )
2024-05-23 14:16:26 -07:00
Daniel Hiltgen
95b1133d0c
Merge pull request #4547 from dhiltgen/load_progress
...
Wire up load progress
2024-05-23 14:06:02 -07:00
Daniel Hiltgen
b37b496a12
Wire up load progress
...
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
Bruce MacDonald
d6f692ad1a
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL ( #4322 )
...
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
2024-05-23 13:21:49 -07:00
Daniel Hiltgen
f77713bf1f
Add isolated gpu test to troubleshooting
2024-05-23 09:33:25 -07:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Michael
73630a7e85
add phi 3 medium ( #4578 )
2024-05-22 12:53:45 -04:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go ( #4571 )
...
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Josh
9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
...
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan
353f83a9c7
add Ctrl + W shortcut
2024-05-21 16:55:09 -07:00
Patrick Devine
3bade04e10
doc updates for the faq/troubleshooting ( #4565 )
2024-05-21 15:30:09 -07:00
Michael Yang
a6d0f443eb
Merge pull request #4543 from ollama/mxyng/simple-safetensors
...
simplify safetensors reading
2024-05-21 14:43:55 -07:00
Michael Yang
96236b7968
Merge pull request #4268 from ollama/pdevine/llama3
...
Convert directly from llama3
2024-05-21 14:43:37 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
171eb040fc
simplify safetensors reading
2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f
add test
2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3
fix conversion for f16 or f32 inputs
2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3
cleanup
2024-05-20 16:13:57 -07:00
Michael Yang
547132e820
bpe pretokenizer
2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9
add missing file
2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f
add fixes for llama
2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed
llama3 conversion
2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c
add safetensors version
2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd
some changes for llama3
2024-05-20 16:13:57 -07:00
Michael Yang
2f81b3dce2
Merge pull request #4502 from ollama/mxyng/fix-quantize
...
fix quantize file types
2024-05-20 16:09:27 -07:00
jmorganca
5cab13739e
set llama.cpp submodule commit to 614d3b9
2024-05-20 15:28:17 -07:00
Josh Yan
8aadad9c72
updated updateURL
2024-05-20 15:24:32 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
alwqx
8800c8a59b
chore: fix typo in docs ( #4536 )
2024-05-20 14:19:03 -07:00
Michael Yang
b4dce13309
Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
...
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4
feat: add support for flash_attn ( #4120 )
...
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
jmorganca
63a453554d
go mod tidy
2024-05-19 23:03:57 -07:00
Patrick Devine
105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode ( #4508 )
2024-05-18 11:51:57 -07:00