ext_server
|
feat: add support for flash_attn (#4120)
|
2024-05-20 13:36:03 -07:00 |
generate
|
Port cuda/rocm skip build vars to linux
|
2024-05-15 15:56:43 -07:00 |
patches
|
update llama.cpp submodule to 614d3b9 (#4414)
|
2024-05-16 13:53:09 -07:00 |
filetype.go
|
comments
|
2024-05-06 15:24:01 -07:00 |
ggla.go
|
refactor tensor query
|
2024-04-10 11:37:20 -07:00 |
ggml.go
|
add phi2 mem
|
2024-05-10 12:13:28 -07:00 |
gguf.go
|
llama3 conversion
|
2024-05-20 16:13:57 -07:00 |
llm.go
|
comments
|
2024-05-06 15:24:01 -07:00 |
llm_linux.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |
memory.go
|
typo
|
2024-05-13 14:18:34 -07:00 |
server.go
|
feat: add support for flash_attn (#4120)
|
2024-05-20 13:36:03 -07:00 |
status.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |