Daniel Hiltgen
89bf98bcf2
Merge pull request #4598 from dhiltgen/docs
...
Tidy up developer guide a little
2024-05-23 15:14:29 -07:00
Daniel Hiltgen
1b2d156094
Tidy up developer guide a little
2024-05-23 15:14:05 -07:00
Michael Yang
714adb8bd1
bump ( #4597 )
2024-05-23 14:16:26 -07:00
Daniel Hiltgen
95b1133d0c
Merge pull request #4547 from dhiltgen/load_progress
...
Wire up load progress
2024-05-23 14:06:02 -07:00
Daniel Hiltgen
b37b496a12
Wire up load progress
...
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
Bruce MacDonald
d6f692ad1a
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL ( #4322 )
...
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
2024-05-23 13:21:49 -07:00
Daniel Hiltgen
f77713bf1f
Add isolated gpu test to troubleshooting
2024-05-23 09:33:25 -07:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Michael
73630a7e85
add phi 3 medium ( #4578 )
2024-05-22 12:53:45 -04:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go ( #4571 )
...
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Josh
9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
...
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan
353f83a9c7
add Ctrl + W shortcut
2024-05-21 16:55:09 -07:00
Patrick Devine
3bade04e10
doc updates for the faq/troubleshooting ( #4565 )
2024-05-21 15:30:09 -07:00
Michael Yang
a6d0f443eb
Merge pull request #4543 from ollama/mxyng/simple-safetensors
...
simplify safetensors reading
2024-05-21 14:43:55 -07:00
Michael Yang
96236b7968
Merge pull request #4268 from ollama/pdevine/llama3
...
Convert directly from llama3
2024-05-21 14:43:37 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
171eb040fc
simplify safetensors reading
2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f
add test
2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3
fix conversion for f16 or f32 inputs
2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3
cleanup
2024-05-20 16:13:57 -07:00
Michael Yang
547132e820
bpe pretokenizer
2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9
add missing file
2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f
add fixes for llama
2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed
llama3 conversion
2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c
add safetensors version
2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd
some changes for llama3
2024-05-20 16:13:57 -07:00
Michael Yang
2f81b3dce2
Merge pull request #4502 from ollama/mxyng/fix-quantize
...
fix quantize file types
2024-05-20 16:09:27 -07:00
jmorganca
5cab13739e
set llama.cpp submodule commit to 614d3b9
2024-05-20 15:28:17 -07:00
Josh Yan
8aadad9c72
updated updateURL
2024-05-20 15:24:32 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
alwqx
8800c8a59b
chore: fix typo in docs ( #4536 )
2024-05-20 14:19:03 -07:00
Michael Yang
b4dce13309
Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
...
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4
feat: add support for flash_attn ( #4120 )
...
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
jmorganca
63a453554d
go mod tidy
2024-05-19 23:03:57 -07:00
Patrick Devine
105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode ( #4508 )
2024-05-18 11:51:57 -07:00
Daniel Hiltgen
ba04afc9a4
Merge pull request #4483 from dhiltgen/clean_exit
...
Don't return error on signal exit
2024-05-17 11:41:57 -07:00
Daniel Hiltgen
7e1e0086e7
Merge pull request #4482 from dhiltgen/integration_improvements
...
Skip max queue test on remote
2024-05-16 16:43:48 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Daniel Hiltgen
7f2fbad736
Skip max queue test on remote
...
This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.
2024-05-16 16:24:18 -07:00
Josh
5bece94509
Merge pull request #4463 from ollama/jyan/line-display
...
changed line display to be calculated with runewidth
2024-05-16 14:15:08 -07:00
Josh Yan
3d90156e99
removed comment
2024-05-16 14:12:03 -07:00
Rose Heart
5e46c5c435
Updating software for read me ( #4467 )
...
* Update README.md
Added chat/moderation bot to list of software.
* Update README.md
Fixed link error.
2024-05-16 13:55:14 -07:00
Jeffrey Morgan
583c1f472c
update llama.cpp submodule to 614d3b9
( #4414 )
2024-05-16 13:53:09 -07:00
Josh Yan
26bfc1c443
go fmt'd cmd.go
2024-05-15 17:26:39 -07:00
Josh Yan
799aa9883c
go fmt'd cmd.go
2024-05-15 17:24:17 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
...
return on part done
2024-05-15 17:16:24 -07:00
Josh Yan
c9e584fb90
updated double-width display
2024-05-15 16:45:24 -07:00