Commit graph

  • ecbfc0182f Go bump to v1.21 to pick up slog Daniel Hiltgen 2024-01-18 11:51:34 -08:00
  • fedd705aea Mechanical switch from log to slog Daniel Hiltgen 2024-01-18 10:52:01 -08:00
  • 82ee019bfc
    add open interpreter to list of extensions (#2016) Mike Bird 2024-01-18 16:59:39 -05:00
  • ad9dbc2a04
    Haystack Ollama Integration (#2021) Sachin Sachdeva 2024-01-18 22:38:32 +01:00
  • fccdf4c635
    Merge pull request #1987 from xyproto/archlinux Daniel Hiltgen 2024-01-18 13:32:10 -08:00
  • d450fb1d1e
    Merge pull request #2055 from dhiltgen/cuda_docs Daniel Hiltgen 2024-01-18 12:07:31 -08:00
  • df40b11d03
    Merge pull request #2007 from dhiltgen/cpu_fallback Daniel Hiltgen 2024-01-18 11:32:29 -08:00
  • 9cd20b0ec8 Refine the linux cuda/rocm developer docs Daniel Hiltgen 2024-01-18 09:44:44 -08:00
  • b992bf65fc Disable arm64 for test phase Daniel Hiltgen 2024-01-17 19:00:30 -08:00
  • 1b249748ab Add multiple CPU variants for Intel Mac Daniel Hiltgen 2024-01-12 16:28:00 -08:00
  • cbe2adc78a
    Merge branch 'main' into archlinux Alexander F. Rødseth 2024-01-17 12:50:11 +01:00
  • d5a7353357
    Merge pull request #2026 from jmorganca/mxyng/fix-windows Michael Yang 2024-01-16 16:58:42 -08:00
  • 96cfb62641 fix: normalize name path before splitting Michael Yang 2024-01-16 16:48:05 -08:00
  • 7d00b5d110
    Merge pull request #1915 from dhiltgen/bump_llama_with_new_dep Daniel Hiltgen 2024-01-16 13:36:49 -08:00
  • 795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep Daniel Hiltgen 2024-01-10 15:52:35 -08:00
  • e282bdccdd
    Merge pull request #1990 from dhiltgen/ci_mac_cross Daniel Hiltgen 2024-01-16 12:31:37 -08:00
  • d9bfb2f08f install: pin fedora to max 37 Michael Yang 2024-01-16 11:45:12 -08:00
  • 598d6d5572
    Merge pull request #1937 from jmorganca/mxyng/remove-client-py Michael Yang 2024-01-16 11:01:41 -08:00
  • a897e833b8
    do not cache prompt (#2018) Bruce MacDonald 2024-01-16 13:48:05 -05:00
  • eef50accb4
    Fix show parameters (#2017) Patrick Devine 2024-01-16 10:34:44 -08:00
  • 05d53de7a1
    Merge pull request #1968 from jmorganca/mxyng/fix-request-retry Michael Yang 2024-01-16 10:33:50 -08:00
  • 8795447dad
    Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection Daniel Hiltgen 2024-01-14 18:00:11 -08:00
  • b3035112a1 Add macos cross-compile CI coverage Daniel Hiltgen 2024-01-14 09:19:45 -08:00
  • 95ad9a9fc8
    Merge pull request #1988 from dhiltgen/fix_intel_mac Daniel Hiltgen 2024-01-14 08:45:18 -08:00
  • 3ca5f69ce8 Fix typo in arm mac arch script Daniel Hiltgen 2024-01-14 08:32:57 -08:00
  • cfa6337960
    Merge pull request #1982 from dhiltgen/fix_intel_mac Daniel Hiltgen 2024-01-14 08:26:46 -08:00
  • f4bf1d514f Let gpu.go and gen_linux.sh also find CUDA on Arch Linux Alexander F. Rødseth 2024-01-14 13:40:36 +01:00
  • 557110d0ba
    Disable mmap with lora layers (#1985) Jeffrey Morgan 2024-01-13 23:36:31 -05:00
  • 2ecb247276 Fix intel mac build Daniel Hiltgen 2024-01-13 14:46:34 -08:00
  • 288ef8ff95
    add gcc -lstdc++ flag for linux cpu (#1974) Jeffrey Morgan 2024-01-13 03:53:00 -05:00
  • 4cf17990f7
    use g++ to build libext_server.so on linux (#1972) Jeffrey Morgan 2024-01-13 03:12:42 -05:00
  • 27331ae3a8 download: add inactivity monitor Michael Yang 2024-01-08 11:44:59 -08:00
  • b6c0ef1e70
    Merge pull request #1961 from jmorganca/mxyng/rm-double-newline Michael Yang 2024-01-12 15:18:19 -08:00
  • 356d178f6e
    Merge pull request #1971 from jmorganca/mxyng/max-context-length Michael Yang 2024-01-12 15:10:25 -08:00
  • eaed6f8c45 add max context length check Michael Yang 2024-01-12 14:54:01 -08:00
  • 6a5bfc2ed6 update actions/setup-go purificant 2024-01-12 17:03:06 +00:00
  • cf29bd2d72 fix: request retry with error Michael Yang 2024-01-12 13:32:24 -08:00
  • 905862e17b improve cuda detection (rel. issue #1704) Fabian Preiss 2024-01-09 21:55:36 +01:00
  • 565f8a3c44
    Convert the REPL to use /api/chat for interactive responses (#1936) Patrick Devine 2024-01-12 12:05:52 -08:00
  • 5121b7ac9c remove double newlines in /set parameter Michael Yang 2024-01-12 11:21:08 -08:00
  • a70262c6b2
    Update README.md Michael Yang 2024-01-12 09:43:04 -08:00
  • 40a0a90a88
    Add group delete to uninstall instructions (#1924) Tristram Oaten 2024-01-12 05:07:00 +00:00
  • cbe20c4375 update readme Michael Yang 2024-01-11 16:24:37 -08:00
  • 5ffbbea1d7 remove client.py Michael Yang 2024-01-11 15:51:47 -08:00
  • 3773fb6465
    Merge pull request #1935 from dhiltgen/cpu_fallback Daniel Hiltgen 2024-01-11 15:52:32 -08:00
  • 7427fa1387 Fix up the CPU fallback selection Daniel Hiltgen 2024-01-11 14:43:16 -08:00
  • f84537e0e0
    Merge pull request #1934 from jmorganca/mxyng/fix-slices Michael Yang 2024-01-11 14:36:20 -08:00
  • d2be6387c9 fix typo Michael Yang 2024-01-11 14:25:21 -08:00
  • d7af35d3d0 import fmt Michael Yang 2024-01-11 14:22:32 -08:00
  • defc1dbd6e use x/exp/slices Michael Yang 2024-01-11 14:20:13 -08:00
  • de2fbdec99
    Merge pull request #1819 from dhiltgen/multi_variant Daniel Hiltgen 2024-01-11 14:00:48 -08:00
  • f5faf79aa1
    Add semantic kernel to Readme (#1931) Eduard van Valkenburg 2024-01-11 20:40:23 +01:00
  • f4f939de28
    Merge pull request #1552 from jmorganca/mxyng/lint-test Michael Yang 2024-01-11 09:37:45 -08:00
  • 39928a42e8 Always dynamically load the llm server library Daniel Hiltgen 2024-01-09 20:29:58 -08:00
  • d88c527be3 Build multiple CPU variants and pick the best Daniel Hiltgen 2024-01-07 15:48:05 -08:00
  • 3bc8b9832b
    fix gpu_test.go Error (same type) uint64->uint32 (#1921) Fabian Preiß 2024-01-11 14:22:23 +01:00
  • ab6be852c7 revisit memory allocation to account for full kv cache on main gpu Jeffrey Morgan 2024-01-11 01:45:31 -05:00
  • 052b33b81b DRY out the Dockefile.build Daniel Hiltgen 2024-01-06 16:46:55 -08:00
  • 8da7bef05f Support multiple variants for a given llm lib type Daniel Hiltgen 2024-01-05 12:13:08 -08:00
  • b24e8d17b2
    Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) Jeffrey Morgan 2024-01-10 19:08:51 -05:00
  • f83881390f revert submodule back to 328b83de23b33240e28f4e74900d1d06726f5eb1 Jeffrey Morgan 2024-01-10 18:42:39 -05:00
  • ac70ab6761
    Merge pull request #1914 from dhiltgen/smarter_cuda_detection Daniel Hiltgen 2024-01-10 15:21:56 -08:00
  • 3c49c3ab0d Harden GPU mgmt library lookup Daniel Hiltgen 2024-01-10 14:39:51 -08:00
  • 9754ae4c89 Support optional override of the target archictures Daniel Hiltgen 2024-01-10 14:41:02 -08:00
  • 224fbf2795 update submodule to commit 1fc2f265ff9377a37fd2c61eae9cd813a3491bea until its main branch is fixed Jeffrey Morgan 2024-01-10 17:03:11 -05:00
  • 2c6e8f5248
    Update submodule to 6efb8eb30e7025b168f3fda3ff83b9b386428ad6 (#1885) Jeffrey Morgan 2024-01-10 16:48:38 -05:00
  • 34344d801c clean up cmake build directory when cross compiling macOS builds Jeffrey Morgan 2024-01-09 17:13:51 -05:00
  • e868c8a5c7
    Update api.md (#1878) Robin Glauser 2024-01-09 22:21:17 +01:00
  • c336693f07
    calculate overhead based number of gpu devices (#1875) Jeffrey Morgan 2024-01-09 15:53:33 -05:00
  • e89dc1d54b
    Merge pull request #1874 from dhiltgen/correct_cuda_min Daniel Hiltgen 2024-01-09 11:37:22 -08:00
  • 1961a81f03 Set corret CUDA minimum compute capability version Daniel Hiltgen 2024-01-09 11:28:24 -08:00
  • 8a8c7e7f8d only build for metal on arm64 Jeffrey Morgan 2024-01-09 13:51:04 -05:00
  • 6df83e6daa update rough cuda overhead estimate to 15% + 384MiB Jeffrey Morgan 2024-01-09 11:47:30 -05:00
  • f921e2696e typo Michael Yang 2024-01-09 09:45:42 -08:00
  • 4a33cede20 remove unused fields and functions Michael Yang 2023-12-22 09:55:18 -08:00
  • f95d2f25f3 fix temporary history file permissions Michael Yang 2023-12-18 10:53:51 -08:00
  • 2b9892a808 fix(windows): modelpath and list Michael Yang 2023-12-15 15:50:51 -08:00
  • 2bb2bdd5d4 fix lint Michael Yang 2023-12-15 14:07:34 -08:00
  • acfc376efd add .golangci.yaml Michael Yang 2023-12-15 14:25:12 -08:00
  • 997253143f add lint and test on pull_request Michael Yang 2023-12-15 11:33:52 -08:00
  • 62023177f6
    Merge pull request #1614 from jmorganca/mxyng/fix-set-template Michael Yang 2024-01-09 09:36:24 -08:00
  • 6164f378f2 revert cuda overhead to 20% Jeffrey Morgan 2024-01-09 00:54:25 -05:00
  • f387e9631b use runner if cuda alloc won't fit Jeffrey Morgan 2024-01-09 00:44:34 -05:00
  • 6566387ae3 add TODO for cuda overhead Jeffrey Morgan 2024-01-09 00:28:03 -05:00
  • 37708931fb update cuda overhead to 20% to fix crashes when switching between models and large context sizes Jeffrey Morgan 2024-01-09 00:05:23 -05:00
  • f6cb0a553c update cuda overhead to 15% or 400MiB Jeffrey Morgan 2024-01-08 23:45:45 -05:00
  • 2680078c13 fix build on linux Jeffrey Morgan 2024-01-08 23:44:13 -05:00
  • f1b7e5f560 update overhead to 15% Jeffrey Morgan 2024-01-08 23:37:45 -05:00
  • cb534e6ac2 use 10% vram overhead for cuda Jeffrey Morgan 2024-01-08 23:17:44 -05:00
  • 58ce2d8273 better estimate scratch buffer size Jeffrey Morgan 2024-01-08 21:32:44 -05:00
  • 18ddf6d57d fix windows build Jeffrey Morgan 2024-01-08 20:04:01 -05:00
  • 61e6502449
    Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt Michael Yang 2024-01-08 13:48:34 -08:00
  • 08f1e18965
    Offload layers to GPU based on new model size estimates (#1850) Jeffrey Morgan 2024-01-08 16:42:00 -05:00
  • 7e8f7c8358
    remove ggml automatic re-pull (#1856) Bruce MacDonald 2024-01-08 14:41:01 -05:00
  • 3f3eb19a3b
    document response in modelfile template variables (#1428) Bruce MacDonald 2024-01-08 14:38:51 -05:00
  • 059ae4585e
    Merge pull request #1834 from dhiltgen/old_cuda Daniel Hiltgen 2024-01-07 10:39:49 -08:00
  • 6347f501ca
    Merge pull request #1828 from dhiltgen/fix_llava Daniel Hiltgen 2024-01-07 09:05:46 -08:00
  • 5feec959ad
    dont use -Wall in static build (#1833) Jeffrey Morgan 2024-01-07 10:39:19 -05:00
  • dbdd50b283
    add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832) Jeffrey Morgan 2024-01-07 00:46:17 -05:00
  • d74ce6bd4f Detect very old CUDA GPUs and fall back to CPU Daniel Hiltgen 2024-01-06 21:40:04 -08:00