ollama

Author	SHA1	Message	Date
Jeffrey Morgan	8a8c7e7f8d	only build for metal on `arm64`	2024-01-09 13:51:08 -05:00
Jeffrey Morgan	f387e9631b	use runner if cuda alloc won't fit	2024-01-09 00:44:34 -05:00
Jeffrey Morgan	cb534e6ac2	use 10% vram overhead for cuda	2024-01-08 23:17:44 -05:00
Jeffrey Morgan	58ce2d8273	better estimate scratch buffer size	2024-01-08 21:32:44 -05:00
Jeffrey Morgan	18ddf6d57d	fix windows build	2024-01-08 20:04:01 -05:00
Jeffrey Morgan	08f1e18965	Offload layers to GPU based on new model size estimates (#1850 ) * select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2024-01-08 16:42:00 -05:00
Jeffrey Morgan	5feec959ad	dont use `-Wall` in static build (#1833 )	2024-01-07 10:39:19 -05:00
Jeffrey Morgan	dbdd50b283	add `-DCMAKE_SYSTEM_NAME=Darwin` cmake flag (#1832 )	2024-01-07 00:46:17 -05:00
Bruce MacDonald	3367b5f3df	remove unused generate patches (#1810 )	2024-01-05 11:25:45 -05:00
Daniel Hiltgen	9983fa5f4e	Cleaup stale submodule If the tree has a stale submodule, make sure we clean it up first	2024-01-04 13:40:16 -08:00
Daniel Hiltgen	fac9060da5	Init submodule with new path	2024-01-04 13:00:13 -08:00
Daniel Hiltgen	77d96da94b	Code shuffle to clean up the llm dir	2024-01-04 12:12:05 -08:00
Daniel Hiltgen	e9ce91e9a6	Load dynamic cpu lib on windows On linux, we link the CPU library in to the Go app and fall back to it when no GPU match is found. On windows we do not link in the CPU library so that we can better control our dependencies for the CLI. This fixes the logic so we correctly fallback to the dynamic CPU library on windows.	2024-01-04 08:41:41 -08:00
Jeffrey Morgan	c0285158a9	tweak memory requirements error text	2024-01-03 19:47:18 -05:00
Jeffrey Morgan	77a66df72c	add macOS memory check for 47B models	2024-01-03 19:46:16 -05:00
Jeffrey Morgan	5b4837f881	remove unused filetype check	2024-01-03 19:45:39 -05:00
Jeffrey Morgan	29340c2e62	update cmake flags for `amd64` macOS (#1780 ) * update cmake flags for intel macOS * remove `LLAMA_K_QUANTS` * put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`	2024-01-03 19:22:15 -05:00
Daniel Hiltgen	d5ec730354	Merge pull request #1779 from dhiltgen/refined_amd_gpu_list Improve maintainability of Radeon card list	2024-01-03 16:18:57 -08:00
Daniel Hiltgen	ddbfa6fe31	Fix CPU only builds Go embed doesn't like when there's no matching files, so put a dummy placeholder in to allow building without any GPU support If no "server" library is found, it's safely ignored at runtime.	2024-01-03 16:08:34 -08:00
Daniel Hiltgen	16f4603b67	Improve maintainability of Radeon card list This moves the list of AMD GPUs to an easier to maintain list which should make it easier to update over time.	2024-01-03 15:16:56 -08:00
Bruce MacDonald	0b3118e0af	fix: relay request opts to loaded llm prediction (#1761 )	2024-01-03 12:01:42 -05:00
Daniel Hiltgen	0498f7ce56	Get rid of one-line llama.log This one log line was triggering a single line llama.log to be generated in the pwd of the server	2024-01-02 15:36:16 -08:00
Daniel Hiltgen	738a8d12eb	Rename the ollama cmakefile	2024-01-02 15:36:16 -08:00
Daniel Hiltgen	d966b730ac	Switch windows build to fully dynamic Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.	2024-01-02 15:36:16 -08:00
Daniel Hiltgen	9a70aecccb	Refactor how we augment llama.cpp This changes the model for llama.cpp inclusion so we're not applying a patch, but instead have the C++ code directly in the ollama tree, which should make it easier to refine and update over time.	2024-01-02 15:35:55 -08:00
Jeffrey Morgan	d4ebdadbe7	enable `cache_prompt` by default	2023-12-27 14:23:42 -05:00
K0IN	10da41d677	Add Cache flag to api (#1642 )	2023-12-22 17:16:20 -05:00
Daniel Hiltgen	e5202eb687	Quiet down llama.cpp logging by default By default builds will now produce non-debug and non-verbose binaries. To enable verbose logs in llama.cpp and debug symbols in the native code, set `CGO_CFLAGS=-g`	2023-12-22 08:47:18 -08:00
Daniel Hiltgen	fa24e73b82	Remove CPU build, fixup linux build script	2023-12-21 18:21:31 -08:00
Daniel Hiltgen	325d74985b	Fix CPU performance on hyperthreaded systems The default thread count logic was broken and resulted in 2x the number of threads as it should on a hyperthreading CPU resulting in thrashing and poor performance.	2023-12-21 16:23:36 -08:00
Daniel Hiltgen	d9cd3d9667	Revive windows build The windows native setup still needs some more work, but this gets it building again and if you set the PATH properly, you can run the resulting exe on a cuda system.	2023-12-20 17:21:54 -08:00
Daniel Hiltgen	7555ea44f8	Revamp the dynamic library shim This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.	2023-12-20 14:45:57 -08:00
Daniel Hiltgen	6558f94ed0	Fix darwin intel build	2023-12-19 13:32:24 -08:00
Daniel Hiltgen	54dbfa4c4a	Carry ggml-metal.metal as payload	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	3269535a4c	Refine handling of shim presence This allows the CPU only builds to work on systems with Radeon cards	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	1b991d0ba9	Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	9adca7f711	Bump llama.cpp to b1662 and set n_parallel=1	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	89bbaafa64	Build linux using ubuntu 20.04 This changes the container-based linux build to use an older Ubuntu distro to improve our compatibility matrix for older user machines	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	35934b2e05	Adapted rocm support to cgo based llama.cpp	2023-12-19 09:05:46 -08:00
65a	f8ef4439e9	Use build tags to generate accelerated binaries for CUDA and ROCm on Linux. The build tags rocm or cuda must be specified to both go generate and go build. ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also used to switch VRAM detection between cuda and rocm implementations, using added "accelerator_foo.go" files which contain architecture specific functions and variables. accelerator_none is used when no tags are set, and a helper function addRunner will ignore it if it is the chosen accelerator. Fix go generate commands, thanks @deadmeu for testing.	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	d4cd695759	Add cgo implementation for llama.cpp Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.	2023-12-19 09:05:46 -08:00
Bruce MacDonald	811b1f03c8	deprecate ggml - remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>	2023-12-19 09:05:46 -08:00
Jeffrey Morgan	6b5bdfa6c9	update runner submodule	2023-12-18 17:33:46 -05:00
Jeffrey Morgan	c063ee4af0	update runner submodule to fix hipblas build	2023-12-18 15:41:13 -05:00
Jeffrey Morgan	b85982eb91	update runner submodule	2023-12-18 12:43:31 -05:00
Bruce MacDonald	6ee8c80199	restore model load duration on generate response (#1524 ) * restore model load duration on generate response - set model load duration on generate and chat done response - calculate createAt time when response created * remove checkpoints predict opts * Update routes.go	2023-12-14 12:15:50 -05:00
Jeffrey Morgan	31f0551dab	Update runner to support mixtral and mixture of experts (MoE) (#1475 )	2023-12-13 17:15:10 -05:00
Michael Yang	4251b342de	Merge pull request #1469 from jmorganca/mxyng/model-types remove per-model types	2023-12-12 12:27:03 -08:00
Bruce MacDonald	3144e2a439	exponential back-off (#1484 )	2023-12-12 12:33:02 -05:00
Bruce MacDonald	c0960e29b5	retry on concurrent request failure (#1483 ) - remove parallel	2023-12-12 12:14:35 -05:00

1 2 3 4

193 commits