c826e57475
-Update mllama to take the cross attention state as embeddings in a batch, more similar to how Llava handles it. This improves integration with the input cache. -Pass locations in a prompt for embeddings using tags similar to Llava. -Abstract interface to vision models so the main runner accesses Clip and Mllama similarly Co-authored-by: Michael Yang <mxyng@pm.me> |
||
---|---|---|
.. | ||
0001-cuda.patch | ||
0002-pretokenizer.patch | ||
0003-metal.patch | ||
0004-ggml-metal.patch | ||
0005-embeddings.patch | ||
0006-clip-unicode.patch | ||
0007-solar-pro.patch | ||
0008-conditional-fattn.patch | ||
0009-blas.patch | ||
0010-add-mllama-support.patch | ||
0011-add-unpad-operator.patch | ||
0012-fix-deepseek-deseret-regex.patch |