Commit graph

3166 commits

Author SHA1 Message Date
Daniel Hiltgen
1e6a28bf5b
Merge pull request #4009 from dhiltgen/cpu_concurrency
Fix concurrency for CPU mode
2024-04-28 14:20:27 -07:00
Daniel Hiltgen
d6e3b64582 Fix concurrency for CPU mode
Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads.  This adds that back, along with test coverage.

This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.
2024-04-28 13:42:39 -07:00
Blake Mizerany
114c932a8e
types/model: allow _ as starter character in Name parts (#3991) 2024-04-27 21:24:52 -07:00
Jeffrey Morgan
7f7103de06
mac: update setup command to llama3 (#3986) 2024-04-27 22:52:10 -04:00
Blake Mizerany
c631a9c726
types/model: relax name length constraint from 2 to 1 (#3984) 2024-04-27 17:58:41 -07:00
Blake Mizerany
8fd9e56804
types/structs: drop unused structs package (#3981) 2024-04-27 14:06:11 -07:00
Hernan Martinez
8a65717f55 Do not build AVX runners on ARM64 2024-04-26 23:55:32 -06:00
Hernan Martinez
6d3152a98a Use architecture specific folders in installer script 2024-04-26 23:35:16 -06:00
Hernan Martinez
b438d485f1 Use architecture specific folders in the generate script 2024-04-26 23:34:12 -06:00
Hernan Martinez
204349b17b Use architecture specific folders in the build script 2024-04-26 23:26:03 -06:00
Hernan Martinez
86e67fc4a9 Add import declaration for windows,arm64 to llm.go 2024-04-26 23:23:53 -06:00
Blake Mizerany
2bed62926e
types/model: remove Digest (for now) (#3970)
The Digest type needs more thought and is not necessary at the moment.
2024-04-26 21:14:28 -07:00
Jeffrey Morgan
aad8d128a0
also look at cwd as a root for windows runners (#3959) 2024-04-26 19:14:08 -04:00
Daniel Hiltgen
ec1acbb867
Merge pull request #3968 from dhiltgen/win_generate
Fine grain control over windows generate steps
2024-04-26 16:03:38 -07:00
Daniel Hiltgen
e4859c4563 Fine grain control over windows generate steps
This will speed up CI which already tries to only build static for unit tests
2024-04-26 15:49:46 -07:00
Nataly Merezhuk
8e30eb26bd
Updates the setup command to use llama3. (#3962) 2024-04-26 18:41:01 -04:00
Daniel Hiltgen
0b5c589ca2
Merge pull request #3966 from dhiltgen/bump
Fix target in gen_windows.ps1
2024-04-26 15:36:53 -07:00
Michael Yang
65fadddc85
Merge pull request #3964 from ollama/mxyng/weights
fix gemma, command-r layer weights
2024-04-26 15:23:33 -07:00
Daniel Hiltgen
ed5fb088c4 Fix target in gen_windows.ps1 2024-04-26 15:10:42 -07:00
Michael Yang
f81f308118 fix gemma, command-r layer weights 2024-04-26 15:00:55 -07:00
Blake Mizerany
b1390a7b37
types/model: export ParseNameBare and Merge (#3957)
These are useful outside this package.
2024-04-26 14:58:07 -07:00
Michael Yang
11d83386a5
Merge pull request #3951 from ollama/mxyng/zip
check file type before zip
2024-04-26 14:51:23 -07:00
Jeffrey Morgan
bb31def011
return code 499 when user cancels request while a model is loading (#3955) 2024-04-26 17:38:29 -04:00
Michael Yang
41e03ede95 check file type before zip 2024-04-26 14:18:07 -07:00
Michael Yang
7fea1ecdf6
Merge pull request #3958 from ollama/mxyng/fix-workflow
use merge base for diff-tree
2024-04-26 14:17:56 -07:00
Blake Mizerany
054894271d
.github/workflows/test.yaml: add in-flight cancellations on new push (#3956)
Also, remove a superfluous 'go get'
2024-04-26 13:54:24 -07:00
Michael Yang
6fef042f0b use merge base for diff-tree 2024-04-26 13:54:15 -07:00
Daniel Hiltgen
5c0c2d1d09
Merge pull request #3954 from dhiltgen/ci_fixes
Put back non-avx CPU build for windows
2024-04-26 13:09:03 -07:00
Blake Mizerany
37f9c8ad99
types/model: overhaul Name and Digest types (#3924) 2024-04-26 13:08:32 -07:00
Quinten van Buul
2a80f55e2a
Update windows.md (#3855)
Fixed a typo
2024-04-26 16:04:15 -04:00
Daniel Hiltgen
421c878a2d Put back non-avx CPU build for windows 2024-04-26 12:44:07 -07:00
Daniel Hiltgen
36666c2142
Merge pull request #3925 from dhiltgen/bump
Bump llama.cpp to b2737
2024-04-26 10:09:38 -07:00
Daniel Hiltgen
85801317d1 Fix clip log import 2024-04-26 09:43:46 -07:00
Daniel Hiltgen
2ed0d65948 Bump llama.cpp to b2737 2024-04-26 09:43:28 -07:00
Daniel Hiltgen
d459dc4ad1
Merge pull request #3950 from dhiltgen/windows_packaging
Fix exe name for zip packaging on windows
2024-04-26 09:27:37 -07:00
Daniel Hiltgen
40bc4622ef Fix exe name for zip packaging on windows
The zip file encodes the OS and architecture, so keep the short exe name
2024-04-26 09:18:05 -07:00
Daniel Hiltgen
c0f818a07a
Merge pull request #3948 from dhiltgen/win_generate
Refactor windows generate for more modular usage
2024-04-26 09:17:20 -07:00
Daniel Hiltgen
8671fdeda6 Refactor windows generate for more modular usage 2024-04-26 08:35:50 -07:00
Daniel Hiltgen
2619850fb4
Merge pull request #3933 from dhiltgen/ci_fixes
Move cuda/rocm dependency gathering into generate script
2024-04-26 07:01:24 -07:00
Daniel Hiltgen
8feb97dc0d Move cuda/rocm dependency gathering into generate script
This will make it simpler for CI to accumulate artifacts from prior steps
2024-04-25 22:38:44 -07:00
Daniel Hiltgen
4e1ff6dcbb
Merge pull request #3926 from dhiltgen/ci_fixes
Fix release CI
2024-04-25 17:42:31 -07:00
Daniel Hiltgen
8589d752ac Fix release CI
download-artifact path was being used incorrectly.  It is where to
extract the zip not the files in the zip to extract.  Default is
workspace dir which is what we want, so omit it
2024-04-25 17:27:11 -07:00
Michael Yang
de4ded68b0
Merge pull request #3923 from ollama/mxyng/mem
only count output tensors
2024-04-25 16:34:17 -07:00
Daniel Hiltgen
9b5a3c5991
Merge pull request #3914 from dhiltgen/mac_perf
Improve mac parallel performance
2024-04-25 16:28:31 -07:00
Jeffrey Morgan
00b0699c75
Reload model if num_gpu changes (#3920)
* reload model if `num_gpu` changes

* dont reload on -1

* fix tests
2024-04-25 19:02:40 -04:00
Jeffrey Morgan
993cf8bf55
llm: limit generation to 10x context size to avoid run on generations (#3918)
* llm: limit generation to 10x context size to avoid run on generations

* add comment

* simplify condition statement
2024-04-25 19:02:30 -04:00
Michael Yang
7bb7cb8a60 only count output tensors 2024-04-25 15:24:08 -07:00
Daniel Hiltgen
b123be5b71 Adjust context size for parallelism 2024-04-25 13:58:54 -07:00
jmorganca
ddf5c09a9b use matrix multiplcation kernels in more cases 2024-04-25 13:58:54 -07:00
Roy Yang
5f73c08729
Remove trailing spaces (#3889) 2024-04-25 14:32:26 -04:00