Jeffrey Morgan
6b5bdfa6c9
update runner submodule
2023-12-18 17:33:46 -05:00
Jeffrey Morgan
c063ee4af0
update runner submodule to fix hipblas build
2023-12-18 15:41:13 -05:00
Jeffrey Morgan
b85982eb91
update runner submodule
2023-12-18 12:43:31 -05:00
Bruce MacDonald
6ee8c80199
restore model load duration on generate response ( #1524 )
...
* restore model load duration on generate response
- set model load duration on generate and chat done response
- calculate createAt time when response created
* remove checkpoints predict opts
* Update routes.go
2023-12-14 12:15:50 -05:00
Jeffrey Morgan
31f0551dab
Update runner to support mixtral and mixture of experts (MoE) ( #1475 )
2023-12-13 17:15:10 -05:00
Michael Yang
4251b342de
Merge pull request #1469 from jmorganca/mxyng/model-types
...
remove per-model types
2023-12-12 12:27:03 -08:00
Bruce MacDonald
3144e2a439
exponential back-off ( #1484 )
2023-12-12 12:33:02 -05:00
Bruce MacDonald
c0960e29b5
retry on concurrent request failure ( #1483 )
...
- remove parallel
2023-12-12 12:14:35 -05:00
Patrick Devine
910e9401d0
Multimodal support ( #1216 )
...
---------
Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Michael Yang
56ffc3023a
remove per-model types
...
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Jeffrey Morgan
fa2f095bd9
fix model name returned by /api/generate
being different than the model name provided
2023-12-10 11:42:15 -05:00
Jeffrey Morgan
d9a250e9b5
seek to end of file when decoding older model formats
2023-12-09 21:14:35 -05:00
Jeffrey Morgan
944519ed16
seek to eof for older model binaries
2023-12-09 20:48:57 -05:00
Jeffrey Morgan
2dd040d04c
do not use --parallel 2
for old runners
2023-12-09 20:17:33 -05:00
Bruce MacDonald
bbe41ce41a
fix: parallel queueing race condition caused silent failure ( #1445 )
...
* fix: queued request failures
- increase parallel requests to 2 to complete queued request, queueing is managed in ollama
* log steam errors
2023-12-09 14:14:02 -05:00
Michael Yang
f1b049fed8
Merge pull request #1377 from jmorganca/mxyng/qwen
...
update for qwen
2023-12-06 12:31:51 -08:00
Michael Yang
b9495ea162
load projectors
2023-12-05 14:36:12 -08:00
Michael Yang
409bb9674e
Merge pull request #1308 from jmorganca/mxyng/split-from
...
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang
d3479c07a1
Merge pull request #1250 from jmorganca/mxyng/create-layer
...
refactor layer creation
2023-12-05 14:32:52 -08:00
Bruce MacDonald
195e3d9dbd
chat api endpoint ( #1392 )
2023-12-05 14:57:33 -05:00
Jeffrey Morgan
00d06619a1
Revert "chat api ( #991 )" while context variable is fixed
...
This reverts commit 7a0899d62d
.
2023-12-04 21:16:27 -08:00
Michael Yang
5a5dca13b2
comments
2023-12-04 16:59:23 -08:00
Michael Yang
72e7a49aa9
seek instead of copyn
2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40
split from into one or more models
2023-12-04 16:59:23 -08:00
Michael Yang
b2816bca67
unnecessary ReadSeeker for DecodeGGML
2023-12-04 16:59:23 -08:00
Bruce MacDonald
7a0899d62d
chat api ( #991 )
...
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang
6deebf2489
update for qwen
2023-12-04 11:38:05 -08:00
Jeffrey Morgan
16a9006306
add back f16c
instructions on intel mac
2023-11-26 15:59:49 -05:00
Jeffrey Morgan
9e4a316405
update submodule commit
2023-11-26 14:52:00 -05:00
Jing Zhang
82b9b329ff
windows CUDA support ( #1262 )
...
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jongwook Choi
12e8c12d2b
Disable CUDA peer access as a workaround for multi-gpu inference bug ( #1261 )
...
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.
See #961 .
2023-11-24 14:05:57 -05:00
Jeffrey Morgan
d77dde126b
consistent cpu instructions on macos and linux
2023-11-22 16:26:46 -05:00
Michael Yang
199941cd15
fix: gguf int type
2023-11-22 11:40:30 -08:00
Michael Yang
a00fac4ec8
update llama.cpp
2023-11-21 09:50:02 -08:00
Jeffrey Morgan
a3fcecf943
only set main_gpu
if value > 0 is provided
2023-11-20 19:54:04 -05:00
Michael Yang
19b7a4d715
recent llama.cpp update added kernels for fp32, q5_0, and q5_1
2023-11-20 13:44:31 -08:00
Purinda Gunasekara
be61a81758
main-gpu argument is not getting passed to llamacpp, fixed. ( #1192 )
2023-11-20 10:52:52 -05:00
Jeffrey Morgan
13ba6df5ab
enable cpu instructions on intel macs
2023-11-19 23:20:26 -05:00
Jeffrey Morgan
36a3bbf65f
Update llm/llama.go
2023-11-18 21:25:07 -05:00
Bruce MacDonald
43a726149d
fix potentially inaccurate error message
2023-11-18 21:25:07 -05:00
Jeffrey Morgan
41434a7cdc
build intel mac with correct binary and compile flags
2023-11-16 22:14:51 -05:00
Jeffrey Morgan
5cba29b9d6
JSON mode: add `"format" as an api parameter ( #1051 )
...
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Bruce MacDonald
1ae84bc2a2
skip gpu if less than 2GB VRAM are available ( #1059 )
2023-11-09 13:16:16 -08:00
Michael Yang
c5e1bbabda
instead of static number of parameters for each model family, get the real number from the tensors ( #1022 )
...
* parse tensor info
* refactor decoder
* return actual parameter count
* explicit rounding
* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Jeffrey Morgan
c44b619428
remove unused fmt.Println
2023-11-03 17:24:58 -07:00
Jeffrey Morgan
17678b7225
Restore system prompt on requests and default num_keep
to 0
2023-11-03 13:25:25 -07:00
Jeffrey Morgan
2e53704685
default rope params to 0 for new models ( #968 )
2023-11-02 08:41:30 -07:00
Michael Yang
642128b75a
append LD_LIBRARY_PATH
2023-10-31 15:54:49 -07:00
Jeffrey Morgan
3a1ed9ff70
restore building runner with AVX
on by default ( #900 )
2023-10-27 12:13:44 -07:00
Bruce MacDonald
6d283882b1
catch insufficient permissions nvidia err ( #934 )
2023-10-27 12:42:40 -04:00