ollama

Author	SHA1	Message	Date
Jeffrey Morgan	4e262eb2a8	remove `GGML_CUDA_FORCE_MMQ=on` from build (#5588 )	2024-07-10 13:17:13 -07:00
Daniel Hiltgen	4cfcbc328f	Merge pull request #5124 from dhiltgen/amd_windows Wire up windows AMD driver reporting	2024-07-10 12:50:23 -07:00
Daniel Hiltgen	79292ff3e0	Merge pull request #5555 from dhiltgen/msvc_deps Bundle missing CRT libraries	2024-07-10 12:50:02 -07:00
Daniel Hiltgen	8ea500441d	Merge pull request #5580 from dhiltgen/cuda_overhead Detect CUDA OS overhead	2024-07-10 12:47:31 -07:00
Daniel Hiltgen	b50c818623	Merge pull request #5607 from dhiltgen/win_rocm_v6 Bump ROCm on windows to 6.1.2	2024-07-10 12:47:10 -07:00
Daniel Hiltgen	b99e750b62	Merge pull request #5605 from dhiltgen/merge_glitch Remove duplicate merge glitch	2024-07-10 11:47:08 -07:00
Daniel Hiltgen	1f50356e8e	Bump ROCm on windows to 6.1.2 This also adjusts our algorithm to favor our bundled ROCm. I've confirmed VRAM reporting still doesn't work properly so we can't yet enable concurrency by default.	2024-07-10 11:01:22 -07:00
Daniel Hiltgen	22c81f62ec	Remove duplicate merge glitch	2024-07-10 09:01:33 -07:00
Daniel Hiltgen	2d1e3c3229	Merge pull request #5503 from dhiltgen/dual_rocm Workaround broken ROCm p2p copy	2024-07-09 15:44:16 -07:00
royjhan	4918fae535	OpenAI v1/completions: allow stop token list (#5551 ) * stop token parsing fix * add stop test	2024-07-09 14:01:26 -07:00
royjhan	0aff67877e	separate request tests (#5578 )	2024-07-09 13:48:31 -07:00
Daniel Hiltgen	f6f759fc5f	Detect CUDA OS Overhead This adds logic to detect skew between the driver and management library which can be attributed to OS overhead and records that so we can adjust subsequent management library free VRAM updates and avoid OOM scenarios.	2024-07-09 12:21:50 -07:00
Daniel Hiltgen	9544a57ee4	Merge pull request #5579 from dhiltgen/win_static_deps Statically link c++ and thread lib on windows	2024-07-09 12:21:13 -07:00
Daniel Hiltgen	b51e3b63ac	Statically link c++ and thread lib This makes sure we statically link the c++ and thread library on windows to avoid unnecessary runtime dependencies on non-standard DLLs	2024-07-09 11:34:30 -07:00
Michael Yang	6bbbc50f10	Merge pull request #5440 from ollama/mxyng/messages-templates update named templates	2024-07-09 09:36:32 -07:00
Michael Yang	9bbddc37a7	Merge pull request #5126 from ollama/mxyng/messages update message processing	2024-07-09 09:20:44 -07:00
Jeffrey Morgan	e4ff73297d	server: fix model reloads when setting `OLLAMA_NUM_PARALLEL` (#5560 ) * server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL` * remove whitespace change * undo some changes	2024-07-08 22:32:15 -07:00
Daniel Hiltgen	b44320db13	Bundle missing CRT libraries Some users are experienging runner startup errors due to not having these msvc redist libraries on their host	2024-07-08 18:24:21 -07:00
Daniel Hiltgen	0bacb30007	Workaround broken ROCm p2p copy Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.	2024-07-08 09:40:52 -07:00
Jeffrey Morgan	53da2c6965	llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535 )	2024-07-07 14:32:05 -04:00
Jeffrey Morgan	d8def1ff94	llm: allow gemma 2 to context shift (#5534 )	2024-07-07 13:41:51 -04:00
Jeffrey Morgan	571dc61955	Update llama.cpp submodule to `a8db2a9c` (#5530 )	2024-07-07 13:03:09 -04:00
Jeffrey Morgan	0e09c380fc	llm: print caching notices in debug only (#5533 )	2024-07-07 12:38:04 -04:00
Jeffrey Morgan	0ee87615c7	sched: don't error if paging to disk on Windows and macOS (#5523 )	2024-07-06 22:01:52 -04:00
Jeffrey Morgan	f8241bfba3	gpu: report system free memory instead of 0 (#5521 )	2024-07-06 19:35:04 -04:00
Jeffrey Morgan	4607c70641	llm: add `-DBUILD_SHARED_LIBS=off` to common cpu cmake flags (#5520 )	2024-07-06 18:58:16 -04:00
jmorganca	c12f1c5b99	release: move mingw library cleanup to correct job	2024-07-06 16:12:29 -04:00
jmorganca	a08f20d910	release: remove unwanted mingw dll.a files	2024-07-06 15:21:15 -04:00
jmorganca	6cea036027	Revert "llm: only statically link libstdc++" This reverts commit `5796bfc401`.	2024-07-06 15:10:48 -04:00
jmorganca	5796bfc401	llm: only statically link libstdc++	2024-07-06 14:06:20 -04:00
jmorganca	f1a379aa56	llm: statically link pthread and stdc++ dependencies in windows build	2024-07-06 12:54:02 -04:00
jmorganca	9ae146993e	llm: add `GGML_STATIC` flag to windows static lib	2024-07-06 03:27:05 -04:00
Jeffrey Morgan	e0348d3fe8	llm: add `COMMON_DARWIN_DEFS` to arm static build (#5513 )	2024-07-05 22:42:42 -04:00
Jeffrey Morgan	2cc854f8cb	llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511 ) * Revert "fix cmake build (#5505)" This reverts commit `4fd5f3526a`. * llm: fix missing dylibs by restoring old build behavior * crlf -> lf	2024-07-05 21:48:31 -04:00
Jeffrey Morgan	5304b765b2	llm: put back old include dir (#5507 ) * llm: put back old include dir * llm: update link paths for old submodule commits	2024-07-05 19:34:21 -04:00
Michael Yang	fb6cbc02fb	update named templates	2024-07-05 16:29:32 -07:00
Jeffrey Morgan	4fd5f3526a	fix cmake build (#5505 )	2024-07-05 19:07:01 -04:00
Daniel Hiltgen	842f85f758	Merge pull request #5502 from dhiltgen/ci_fixes Always go build in CI generate steps	2024-07-05 15:39:11 -07:00
Daniel Hiltgen	9d30f9f8b3	Always go build in CI generate steps With the recent cgo changes, bugs can sneak through if we don't make sure to `go build` all the permutations	2024-07-05 15:31:52 -07:00
Blake Mizerany	631cfd9e62	types/model: remove knowledge of digest (#5500 ) This was leading to ambiguity and confusion in ollama.com, and is not used anywhere in ollama at the moment. Once manifests are addressable by digest, we can add this back in, and in a way that is more tailored to the concept of addressing a manifest by digest.	2024-07-05 13:42:30 -07:00
Michael Yang	326363b3a7	no funcs	2024-07-05 13:17:25 -07:00
Michael Yang	ac7a842e55	fix model reloading ensure runtime model changes (template, system prompt, messages, options) are captured on model updates without needing to reload the server	2024-07-05 13:17:25 -07:00
Michael Yang	2c3fe1fd97	comments	2024-07-05 13:17:24 -07:00
Michael Yang	269ed6e6a2	update message processing	2024-07-05 13:16:58 -07:00
Jeffrey Morgan	78fb33dd07	fix typo in cgo directives in `llm.go` (#5501 )	2024-07-05 15:18:36 -04:00
Jeffrey Morgan	8f8e736b13	update llama.cpp submodule to `d7fd29f` (#5475 )	2024-07-05 13:25:58 -04:00
Jeffrey Morgan	d89454de80	Use slot with cached prompt instead of least recently used (#5492 ) * Use common prefix to select slot * actually report `longest`	2024-07-05 12:32:47 -04:00
Daniel Hiltgen	af28b94533	Merge pull request #5469 from dhiltgen/prevent_system_oom Prevent loading models larger than total memory	2024-07-05 08:22:20 -07:00
Jeffrey Morgan	e9188e971a	Fix assert on small embedding inputs (#5491 ) * Fix assert on small embedding inputs * Update llm/patches/09-pooling.diff	2024-07-05 11:20:57 -04:00
Daniel Hiltgen	78eddfc068	Merge pull request #4412 from dhiltgen/win_docs Document older win10 terminal problems	2024-07-05 08:18:22 -07:00

1 2 3 4 5 ...

3100 commits