ext_server
|
feat: add support for flash_attn (#4120)
|
2024-05-20 13:36:03 -07:00 |
generate
|
Port cuda/rocm skip build vars to linux
|
2024-05-15 15:56:43 -07:00 |
patches
|
update llama.cpp submodule to 614d3b9 (#4414)
|
2024-05-16 13:53:09 -07:00 |
filetype.go
|
comments
|
2024-05-06 15:24:01 -07:00 |
ggla.go
|
simplify safetensors reading
|
2024-05-21 11:28:22 -07:00 |
ggml.go
|
simplify safetensors reading
|
2024-05-21 11:28:22 -07:00 |
gguf.go
|
simplify safetensors reading
|
2024-05-21 11:28:22 -07:00 |
llm.go
|
comments
|
2024-05-06 15:24:01 -07:00 |
llm_linux.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |
memory.go
|
typo
|
2024-05-13 14:18:34 -07:00 |
server.go
|
Use flash attention flag for now (#4580)
|
2024-05-22 21:52:09 -07:00 |
status.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |