Jeffrey Morgan
d8def1ff94
llm: allow gemma 2 to context shift ( #5534 )
2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955
Update llama.cpp submodule to a8db2a9c
( #5530 )
2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc
llm: print caching notices in debug only ( #5533 )
2024-07-07 12:38:04 -04:00
Jeffrey Morgan
0ee87615c7
sched: don't error if paging to disk on Windows and macOS ( #5523 )
2024-07-06 22:01:52 -04:00
Jeffrey Morgan
f8241bfba3
gpu: report system free memory instead of 0 ( #5521 )
2024-07-06 19:35:04 -04:00
Jeffrey Morgan
4607c70641
llm: add -DBUILD_SHARED_LIBS=off
to common cpu cmake flags ( #5520 )
2024-07-06 18:58:16 -04:00
jmorganca
c12f1c5b99
release: move mingw library cleanup to correct job
2024-07-06 16:12:29 -04:00
jmorganca
a08f20d910
release: remove unwanted mingw dll.a files
2024-07-06 15:21:15 -04:00
jmorganca
6cea036027
Revert "llm: only statically link libstdc++"
...
This reverts commit 5796bfc401
.
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401
llm: only statically link libstdc++
2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56
llm: statically link pthread and stdc++ dependencies in windows build
2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e
llm: add GGML_STATIC
flag to windows static lib
2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8
llm: add COMMON_DARWIN_DEFS
to arm static build ( #5513 )
2024-07-05 22:42:42 -04:00
Jeffrey Morgan
2cc854f8cb
llm: fix missing dylibs by restoring old build behavior on Linux and macOS ( #5511 )
...
* Revert "fix cmake build (#5505 )"
This reverts commit 4fd5f3526a
.
* llm: fix missing dylibs by restoring old build behavior
* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan
5304b765b2
llm: put back old include dir ( #5507 )
...
* llm: put back old include dir
* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Michael Yang
fb6cbc02fb
update named templates
2024-07-05 16:29:32 -07:00
Jeffrey Morgan
4fd5f3526a
fix cmake build ( #5505 )
2024-07-05 19:07:01 -04:00
Daniel Hiltgen
842f85f758
Merge pull request #5502 from dhiltgen/ci_fixes
...
Always go build in CI generate steps
2024-07-05 15:39:11 -07:00
Daniel Hiltgen
9d30f9f8b3
Always go build in CI generate steps
...
With the recent cgo changes, bugs can sneak through
if we don't make sure to `go build` all the permutations
2024-07-05 15:31:52 -07:00
Blake Mizerany
631cfd9e62
types/model: remove knowledge of digest ( #5500 )
...
This was leading to ambiguity and confusion in ollama.com, and is not
used anywhere in ollama at the moment. Once manifests are addressable by
digest, we can add this back in, and in a way that is more tailored to
the concept of addressing a manifest by digest.
2024-07-05 13:42:30 -07:00
Michael Yang
326363b3a7
no funcs
2024-07-05 13:17:25 -07:00
Michael Yang
ac7a842e55
fix model reloading
...
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang
2c3fe1fd97
comments
2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2
update message processing
2024-07-05 13:16:58 -07:00
Jeffrey Morgan
78fb33dd07
fix typo in cgo directives in llm.go
( #5501 )
2024-07-05 15:18:36 -04:00
Jeffrey Morgan
8f8e736b13
update llama.cpp submodule to d7fd29f
( #5475 )
2024-07-05 13:25:58 -04:00
Jeffrey Morgan
d89454de80
Use slot with cached prompt instead of least recently used ( #5492 )
...
* Use common prefix to select slot
* actually report `longest`
2024-07-05 12:32:47 -04:00
Daniel Hiltgen
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
...
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
Jeffrey Morgan
e9188e971a
Fix assert on small embedding inputs ( #5491 )
...
* Fix assert on small embedding inputs
* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
Daniel Hiltgen
78eddfc068
Merge pull request #4412 from dhiltgen/win_docs
...
Document older win10 terminal problems
2024-07-05 08:18:22 -07:00
Daniel Hiltgen
02c24d3d01
Merge pull request #5466 from dhiltgen/fix_clip_unicode
...
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
Daniel Hiltgen
52abc8acb7
Document older win10 terminal problems
...
We haven't found a workaround, so for now recommend updating.
2024-07-03 17:32:14 -07:00
Jeffrey Morgan
4d71c559b2
fix error detection by limiting model loading error parsing ( #5472 )
2024-07-03 20:04:30 -04:00
Anatoli Babenia
0d16eb310e
fix: use envconfig.ModelsDir
directly ( #4821 )
...
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>
Co-authored-by: Maas Lalani <maas@lalani.dev>
2024-07-03 15:36:11 -07:00
Daniel Hiltgen
8072e205ff
Merge pull request #5447 from dhiltgen/fix_keepalive
...
Only set default keep_alive on initial model load
2024-07-03 15:34:38 -07:00
Daniel Hiltgen
955f2a4e03
Only set default keep_alive on initial model load
...
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load. Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Daniel Hiltgen
3c75113e37
Prevent loading models larger than total memory
...
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory. Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
Daniel Hiltgen
ccd7785859
Merge pull request #5243 from dhiltgen/modelfile_use_mmap
...
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan
3b5a4a77f3
Return Correct Prompt Eval Count Regardless of Cache Prompt ( #5371 )
...
* openai compatibility
* Revert "openai compatibility"
This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c.
* remove erroneous subtraction of prompt cache
2024-07-03 13:46:23 -07:00
Daniel Hiltgen
daed0634a9
Merge pull request #5467 from dhiltgen/bogus_cpu_mac_error
...
Fix corner cases on tmp cleaner on mac
2024-07-03 13:39:36 -07:00
Daniel Hiltgen
0d4dd707bc
Merge pull request #5465 from dhiltgen/better_cuda_logging
...
Better nvidia GPU discovery logging
2024-07-03 13:12:22 -07:00
Daniel Hiltgen
0e982bc1f4
Fix corner cases on tmp cleaner on mac
...
When ollama is running a long time, tmp cleaners can remove the
runners. This tightens up a few corner cases on arm macs where
we failed with "server cpu not listed in available servers map[]"
2024-07-03 13:10:14 -07:00
Daniel Hiltgen
6298f49816
Fix clip model loading with unicode paths
...
On windows, if the model dir contained unicode characters
clip models would fail to load. This fixes the file name
handling in clip.cpp to support utf16 on windows.
2024-07-03 12:46:36 -07:00
Daniel Hiltgen
ef757da2c9
Better nvidia GPU discovery logging
...
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
2024-07-03 10:50:40 -07:00
Michael Yang
e5352297d9
Merge pull request #5448 from ollama/mxyng/fix-generate
...
use model template by default
2024-07-02 16:48:06 -07:00
Michael Yang
65a5040e09
fix generate template
2024-07-02 16:42:17 -07:00
royjhan
d626b99b54
OpenAI: v1/completions compatibility ( #5209 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Add Middleware for Chat and List
* Completions Endpoint
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* Rename function
* float types
* type cleanup
* cleaning
* more cleaning
* Extra test cases
* merge conflicts
* merge conflicts
* merge conflicts
* merge conflicts
* cleaning
* cleaning
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 16:01:45 -07:00
Michael Yang
dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
...
add model capabilities
2024-07-02 14:26:07 -07:00
Michael Yang
400056e154
Merge pull request #5420 from ollama/mxyng/insecure-path
...
err on insecure path
2024-07-02 14:03:23 -07:00
Daniel Hiltgen
d2f19024d0
Merge pull request #5442 from dhiltgen/concurrency_docs
...
Add windows radeon concurrency note
2024-07-02 12:47:47 -07:00