Daniel Hiltgen
2619850fb4
Merge pull request #3933 from dhiltgen/ci_fixes
...
Move cuda/rocm dependency gathering into generate script
2024-04-26 07:01:24 -07:00
Daniel Hiltgen
8feb97dc0d
Move cuda/rocm dependency gathering into generate script
...
This will make it simpler for CI to accumulate artifacts from prior steps
2024-04-25 22:38:44 -07:00
Daniel Hiltgen
4e1ff6dcbb
Merge pull request #3926 from dhiltgen/ci_fixes
...
Fix release CI
2024-04-25 17:42:31 -07:00
Daniel Hiltgen
8589d752ac
Fix release CI
...
download-artifact path was being used incorrectly. It is where to
extract the zip not the files in the zip to extract. Default is
workspace dir which is what we want, so omit it
2024-04-25 17:27:11 -07:00
Michael Yang
de4ded68b0
Merge pull request #3923 from ollama/mxyng/mem
...
only count output tensors
2024-04-25 16:34:17 -07:00
Daniel Hiltgen
9b5a3c5991
Merge pull request #3914 from dhiltgen/mac_perf
...
Improve mac parallel performance
2024-04-25 16:28:31 -07:00
Jeffrey Morgan
00b0699c75
Reload model if num_gpu
changes ( #3920 )
...
* reload model if `num_gpu` changes
* dont reload on -1
* fix tests
2024-04-25 19:02:40 -04:00
Jeffrey Morgan
993cf8bf55
llm: limit generation to 10x context size to avoid run on generations ( #3918 )
...
* llm: limit generation to 10x context size to avoid run on generations
* add comment
* simplify condition statement
2024-04-25 19:02:30 -04:00
Michael Yang
7bb7cb8a60
only count output tensors
2024-04-25 15:24:08 -07:00
Daniel Hiltgen
b123be5b71
Adjust context size for parallelism
2024-04-25 13:58:54 -07:00
jmorganca
ddf5c09a9b
use matrix multiplcation kernels in more cases
2024-04-25 13:58:54 -07:00
Roy Yang
5f73c08729
Remove trailing spaces ( #3889 )
2024-04-25 14:32:26 -04:00
Daniel Hiltgen
f503a848c2
Merge pull request #3895 from brycereitano/shiftloading
...
Move ggml loading to when attempting to fit
2024-04-25 09:24:08 -07:00
Bryce Reitano
36a6daccab
Restructure loading conditional chain
2024-04-24 17:37:03 -06:00
Bryce Reitano
ceb0e26e5e
Provide variable ggml for TestLoad
2024-04-24 17:19:55 -06:00
Bryce Reitano
284e02bed0
Move ggml loading to when we attempt fitting
2024-04-24 17:17:24 -06:00
Michael Yang
3450a57d4a
Merge pull request #3713 from ollama/mxyng/modelname
...
update copy handler to use model.Name
2024-04-24 16:00:32 -07:00
Michael Yang
592dae31c8
update copy to use model.Name
2024-04-24 15:54:54 -07:00
Michael Yang
2010cbc5fa
Merge pull request #3833 from ollama/mxyng/fix-from
...
fix: from blob
2024-04-24 15:13:47 -07:00
Michael Yang
ac0801eced
only replace if it matches command
2024-04-24 14:49:26 -07:00
Michael Yang
ad66e5b060
split temp zip files
2024-04-24 14:18:01 -07:00
Blake Mizerany
ade4b55520
types/model: make ParseName use default without question ( #3886 )
2024-04-24 11:52:55 -07:00
Daniel Hiltgen
a6d62e0617
Merge pull request #3882 from dhiltgen/amd_gfx
...
AMD gfx patch rev is hex
2024-04-24 11:07:49 -07:00
Daniel Hiltgen
6e76348df7
Merge pull request #3834 from dhiltgen/not_found_in_path
...
Report errors on server lookup instead of path lookup failure
2024-04-24 10:50:48 -07:00
Daniel Hiltgen
0d6687f84c
AMD gfx patch rev is hex
...
Correctly handle gfx90a discovery
2024-04-24 09:43:52 -07:00
Patrick Devine
74d2a9ef9a
add OLLAMA_KEEP_ALIVE env variable to FAQ ( #3865 )
2024-04-23 21:06:51 -07:00
Patrick Devine
14476d48cc
fixes for gguf ( #3863 )
2024-04-23 20:57:20 -07:00
Patrick Devine
ce8ce82567
add mixtral 8x7b model conversion ( #3859 )
2024-04-23 20:17:04 -07:00
Blake Mizerany
4dc4f1be34
types/model: restrict digest hash part to a minimum of 2 characters ( #3858 )
...
This allows users of a valid Digest to know it has a minimum of 2
characters in the hash part for use when sharding.
This is a reasonable restriction as the hash part is a SHA256 hash which
is 64 characters long, which is the common hash used. There is no
anticipation of using a hash with less than 2 characters.
Also, add MustParseDigest.
Also, replace Digest.Type with Digest.Split for getting both the type
and hash parts together, which is most the common case when asking for
either.
2024-04-23 18:24:17 -07:00
Daniel Hiltgen
16b52331a4
Merge pull request #3857 from dhiltgen/mem_escape_valve
...
Add back memory escape valve
2024-04-23 17:32:24 -07:00
Daniel Hiltgen
5445aaa94e
Add back memory escape valve
...
If we get our predictions wrong, this can be used to
set a lower memory limit as a workaround. Recent multi-gpu
refactoring accidentally removed it, so this adds it back.
2024-04-23 17:09:02 -07:00
Daniel Hiltgen
2ac3dd6853
Merge pull request #3850 from dhiltgen/windows_packaging
...
Move nested payloads to installer and zip file on windows
2024-04-23 16:35:20 -07:00
Daniel Hiltgen
d8851cb7a0
Harden sched TestLoad
...
Give the go routine a moment to deliver the expired event
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
058f6cd2cc
Move nested payloads to installer and zip file on windows
...
Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
790cf34d17
Merge pull request #3846 from dhiltgen/missing_runner
...
Detect and recover if runner removed
2024-04-23 13:14:12 -07:00
Michael
928d844896
adding phi-3 mini to readme
...
adding phi-3 mini to readme
2024-04-23 13:58:31 -04:00
Daniel Hiltgen
939d6a8606
Make CI lint verbvose
2024-04-23 10:17:42 -07:00
Daniel Hiltgen
58888a74bc
Detect and recover if runner removed
...
Tmp cleaners can nuke the file out from underneath us. This detects the missing
runner, and re-initializes the payloads.
2024-04-23 10:05:26 -07:00
Daniel Hiltgen
cc5a71e0e3
Merge pull request #3709 from remy415/custom-gpu-defs
...
Adds support for customizing GPU build flags in llama.cpp
2024-04-23 09:28:34 -07:00
Michael Yang
e83bcf7f9a
Merge pull request #3836 from ollama/mxyng/mixtral
...
fix: mixtral graph
2024-04-23 09:15:10 -07:00
Daniel Hiltgen
5690e5ce99
Merge pull request #3418 from dhiltgen/concurrency
...
Request and model concurrency
2024-04-23 08:31:38 -07:00
Daniel Hiltgen
f2ea8470e5
Local unicode test case
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
8711d03df7
Report errors on server lookup instead of path lookup failure
2024-04-22 19:08:47 -07:00
Daniel Hiltgen
ee448deaba
Merge pull request #3835 from dhiltgen/harden_llm_override
...
Trim spaces and quotes from llm lib override
2024-04-22 19:06:54 -07:00
Bruce MacDonald
6e8db04716
tidy community integrations
...
- move some popular integrations to the top of the lists
2024-04-22 17:29:08 -07:00
Bruce MacDonald
658e60cf73
Revert "stop running model on interactive exit"
...
This reverts commit fad00a85e5
.
2024-04-22 17:23:11 -07:00
Bruce MacDonald
4c78f028f8
Merge branch 'main' of https://github.com/ollama/ollama
2024-04-22 17:22:28 -07:00
Michael Yang
435cc866a3
fix: mixtral graph
2024-04-22 17:19:44 -07:00
Hao Wu
c7d3a558f6
docs: update README to add chat (web UI) for LLM ( #3810 )
...
* add chat (web UI) for LLM
I have used chat with llama3 in local successfully and the code is MIT licensed.
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-04-22 20:19:39 -04:00