Commit graph

3087 commits

Author SHA1 Message Date
Michael Yang
714adb8bd1
bump (#4597) 2024-05-23 14:16:26 -07:00
Daniel Hiltgen
95b1133d0c
Merge pull request #4547 from dhiltgen/load_progress
Wire up load progress
2024-05-23 14:06:02 -07:00
Daniel Hiltgen
b37b496a12 Wire up load progress
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
Bruce MacDonald
d6f692ad1a
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322)
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
2024-05-23 13:21:49 -07:00
Daniel Hiltgen
f77713bf1f Add isolated gpu test to troubleshooting 2024-05-23 09:33:25 -07:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now (#4580)
* put flash attention behind flag for now

* add test

* remove print

* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Michael
73630a7e85
add phi 3 medium (#4578) 2024-05-22 12:53:45 -04:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go (#4571)
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Josh
9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan
353f83a9c7 add Ctrl + W shortcut 2024-05-21 16:55:09 -07:00
Patrick Devine
3bade04e10
doc updates for the faq/troubleshooting (#4565) 2024-05-21 15:30:09 -07:00
Michael Yang
a6d0f443eb
Merge pull request #4543 from ollama/mxyng/simple-safetensors
simplify safetensors reading
2024-05-21 14:43:55 -07:00
Michael Yang
96236b7968
Merge pull request #4268 from ollama/pdevine/llama3
Convert directly from llama3
2024-05-21 14:43:37 -07:00
Sang Park
4434d7f447
Correct typo in error message (#4535)
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
171eb040fc simplify safetensors reading 2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f add test 2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3 fix conversion for f16 or f32 inputs 2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3 cleanup 2024-05-20 16:13:57 -07:00
Michael Yang
547132e820 bpe pretokenizer 2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9 add missing file 2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f add fixes for llama 2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed llama3 conversion 2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c add safetensors version 2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd some changes for llama3 2024-05-20 16:13:57 -07:00
Michael Yang
2f81b3dce2
Merge pull request #4502 from ollama/mxyng/fix-quantize
fix quantize file types
2024-05-20 16:09:27 -07:00
jmorganca
5cab13739e set llama.cpp submodule commit to 614d3b9 2024-05-20 15:28:17 -07:00
Josh Yan
8aadad9c72 updated updateURL 2024-05-20 15:24:32 -07:00
Michael Yang
807d092761 fix quantize file types 2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9 tidy intermediate blobs 2024-05-20 15:15:06 -07:00
alwqx
8800c8a59b
chore: fix typo in docs (#4536) 2024-05-20 14:19:03 -07:00
Michael Yang
b4dce13309
Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4
feat: add support for flash_attn (#4120)
* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
Michael Yang
3520c0e4d5 cache and reuse intermediate blobs
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
jmorganca
63a453554d go mod tidy 2024-05-19 23:03:57 -07:00
Patrick Devine
105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode (#4508) 2024-05-18 11:51:57 -07:00
Daniel Hiltgen
ba04afc9a4
Merge pull request #4483 from dhiltgen/clean_exit
Don't return error on signal exit
2024-05-17 11:41:57 -07:00
Daniel Hiltgen
7e1e0086e7
Merge pull request #4482 from dhiltgen/integration_improvements
Skip max queue test on remote
2024-05-16 16:43:48 -07:00
Daniel Hiltgen
02b31c9dc8 Don't return error on signal exit 2024-05-16 16:25:38 -07:00
Daniel Hiltgen
7f2fbad736 Skip max queue test on remote
This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.
2024-05-16 16:24:18 -07:00
Josh
5bece94509
Merge pull request #4463 from ollama/jyan/line-display
changed line display to be calculated with runewidth
2024-05-16 14:15:08 -07:00
Josh Yan
3d90156e99 removed comment 2024-05-16 14:12:03 -07:00
Rose Heart
5e46c5c435
Updating software for read me (#4467)
* Update README.md

Added chat/moderation bot to list of software.

* Update README.md

Fixed link error.
2024-05-16 13:55:14 -07:00
Jeffrey Morgan
583c1f472c
update llama.cpp submodule to 614d3b9 (#4414) 2024-05-16 13:53:09 -07:00
Josh Yan
26bfc1c443 go fmt'd cmd.go 2024-05-15 17:26:39 -07:00
Josh Yan
799aa9883c go fmt'd cmd.go 2024-05-15 17:24:17 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
return on part done
2024-05-15 17:16:24 -07:00
Josh Yan
c9e584fb90 updated double-width display 2024-05-15 16:45:24 -07:00
Josh Yan
17b1e81ca1 fixed width and word count for double spacing 2024-05-15 16:29:33 -07:00
Daniel Hiltgen
7e9a2da097
Merge pull request #4462 from dhiltgen/opt_out_build
Port cuda/rocm skip build vars to linux
2024-05-15 16:27:47 -07:00