ollama

Author	SHA1	Message	Date
Michael Yang	e4d0a9c325	fix(test): do not clobber models directory	2024-08-28 14:07:48 -07:00
Patrick Devine	7416ced70f	add llama3.1 chat template (#6545 )	2024-08-28 14:03:20 -07:00
Michael Yang	9cfd2dd3e3	Merge pull request #6522 from ollama/mxyng/detect-chat detect chat template from configs that contain lists	2024-08-28 11:04:18 -07:00
Michael Yang	8e6da3cbc5	update deprecated warnings	2024-08-28 09:55:11 -07:00
Michael Yang	d9d50c43cc	validate model path	2024-08-28 09:32:57 -07:00
Patrick Devine	6c1c1ad6a9	throw an error when encountering unsupport tensor sizes (#6538 )	2024-08-27 17:54:04 -07:00
Daniel Hiltgen	93ea9240ae	Move ollama executable out of bin dir (#6535 )	2024-08-27 16:19:00 -07:00
Michael Yang	413ae39f3c	update templates to use messages	2024-08-27 15:44:04 -07:00
Michael Yang	60e47573a6	more tokenizer tests	2024-08-27 14:51:10 -07:00
Patrick Devine	d13c3daa0b	add safetensors to the modelfile docs (#6532 )	2024-08-27 14:46:47 -07:00
Patrick Devine	1713eddcd0	Fix import image width (#6528 )	2024-08-27 14:19:47 -07:00
Daniel Hiltgen	4e1c4f6e0b	Update manual instructions with discrete ROCm bundle (#6445 )	2024-08-27 13:42:28 -07:00
Sean Khatiri	397cae7962	llm: fix typo in comment (#6530 )	2024-08-27 13:28:29 -07:00
Patrick Devine	1c70a00f71	adjust image sizes	2024-08-27 11:15:25 -07:00
Michael Yang	eae3af6807	clean up convert tokenizer	2024-08-27 11:11:43 -07:00
Michael Yang	3eb08377f8	detect chat template from configs that contain lists	2024-08-27 10:49:33 -07:00
Patrick Devine	ac80010db8	update the import docs (#6104 )	2024-08-26 19:57:26 -07:00
Jeffrey Morgan	47fa0839b9	server: clean up route names for consistency (#6524 )	2024-08-26 19:36:11 -07:00
Daniel Hiltgen	0f92b19bec	Only enable numa on CPUs (#6484 ) The numa flag may be having a performance impact on multi-socket systems with GPU loads	2024-08-24 17:24:50 -07:00
Daniel Hiltgen	69be940bf6	gpu: Group GPU Library sets by variant (#6483 ) The recent cuda variant changes uncovered a bug in ByLibrary which failed to group by common variant for GPU types.	2024-08-23 15:11:56 -07:00
Michael Yang	9638c24c58	Merge pull request #5446 from ollama/mxyng/faq update faq	2024-08-23 14:05:59 -07:00
Michael Yang	bb362caf88	update faq	2024-08-23 13:37:21 -07:00
Michael Yang	386af6c1a0	passthrough OLLAMA_HOST path to client	2024-08-23 13:23:28 -07:00
Patrick Devine	0c819e167b	convert safetensor adapters into GGUF (#6327 )	2024-08-23 11:29:56 -07:00
Daniel Hiltgen	7a1e1c1caf	gpu: Ensure driver version set before variant (#6480 ) During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.	2024-08-23 11:21:12 -07:00
Daniel Hiltgen	0b03b9c32f	llm: Align cmake define for cuda no peer copy (#6455 ) Define changed recently and this slipped through the cracks with the old name.	2024-08-23 11:20:39 -07:00
Daniel Hiltgen	90ca84172c	Fix embeddings memory corruption (#6467 ) * Fix embeddings memory corruption The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings. * Fix embed integration test assumption The token eval count has changed with recent llama.cpp bumps (0.3.5+)	2024-08-22 14:51:42 -07:00
Michael Yang	6bd8a4b0a1	Merge pull request #6064 from ollama/mxyng/convert-llama3 convert: update llama conversion for llama3.1	2024-08-21 12:57:09 -07:00
Michael Yang	77903ab8b4	llama3.1	2024-08-21 11:49:31 -07:00
Michael Yang	e22286c9e1	Merge pull request #5365 from ollama/mxyng/convert-gemma2 convert gemma2	2024-08-21 11:48:43 -07:00
Michael Yang	107f695929	Merge pull request #4917 from ollama/mxyng/convert-bert convert bert model from safetensors	2024-08-21 11:48:29 -07:00
Michael Yang	4ecc70d3b4	Merge pull request #6386 from zwwhdls/fix-new-layer fix: chmod new layer to 0o644 when creating it	2024-08-21 10:58:45 -07:00
Michael Yang	3546bbd08c	convert gemma2	2024-08-20 17:27:51 -07:00
Michael Yang	beb49eef65	create bert models from cli	2024-08-20 17:27:34 -07:00
Michael Yang	5a28b9cf5f	bert	2024-08-20 17:27:34 -07:00
Daniel Hiltgen	a017cf2fea	Split rocm back out of bundle (#6432 ) We're over budget for github's maximum release artifact size with rocm + 2 cuda versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can be extracted into the same location as the main bundle.	2024-08-20 07:26:38 -07:00
Daniel Hiltgen	19e5a890f7	CI: remove directories from dist dir before upload step (#6429 )	2024-08-19 15:19:21 -07:00
Daniel Hiltgen	f91c9e3709	CI: handle directories during checksum (#6427 )	2024-08-19 13:48:45 -07:00
Daniel Hiltgen	2df6905ede	Merge pull request #6424 from dhiltgen/cuda_v12 Fix overlapping artifact name on CI	2024-08-19 12:11:58 -07:00
Daniel Hiltgen	d8be22e47d	Fix overlapping artifact name on CI	2024-08-19 12:07:18 -07:00
Daniel Hiltgen	652c273f0e	Merge pull request #5049 from dhiltgen/cuda_v12 Cuda v12	2024-08-19 11:14:24 -07:00
Daniel Hiltgen	88e7705079	Merge pull request #6402 from rick-github/numParallel Override numParallel in pickBestPartialFitByLibrary() only if unset.	2024-08-19 11:07:22 -07:00
Daniel Hiltgen	f9e31da946	Review comments	2024-08-19 10:36:15 -07:00
Daniel Hiltgen	88bb9e3328	Adjust layout to bin+lib/ollama	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	3b19cdba2a	Remove Jetpack	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	927d98a6cd	Add windows cuda v12 + v11 support	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	f6c811b320	Enable cuda v12 flags	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	4fe3a556fa	Add cuda v12 variant and selection logic Based on compute capability and driver version, pick v12 or v11 cuda variants.	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	fc3b4cda89	Report GPU variant in log	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	d470ebe78b	Add Jetson cuda variants for arm This adds new variants for arm64 specific to Jetson platforms	2024-08-19 09:38:53 -07:00

... 3 4 5 6 7 ...

3604 commits