Andrei Betlen
a5cfeb7763
feat: Update llama.cpp
2024-02-15 15:17:30 -05:00
Douglas Hanley
7bb91f025f
fix: Incorporate embedding pooling layer fixes ( #1194 )
...
* remove division by token count
* truncate to n_batch, not n_ctx
2024-02-15 15:16:30 -05:00
Andrei Betlen
ae71ad1a14
Bump version
2024-02-14 04:31:42 -05:00
Andrei Betlen
f300d4310a
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-02-14 04:27:33 -05:00
Andrei Betlen
c336f78269
Update llama.cpp
2024-02-14 04:27:30 -05:00
Douglas Hanley
d7a67917ba
feat: Support batch embeddings ( #1186 )
...
* handle batched embeddings
* fix normalization issue
* fix type hints, ensure no breaking changes to embed
* Clear kv cache / reset internal state after embedding complete
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-14 04:26:09 -05:00
Andrei Betlen
36b843228f
misc: fix makefile build commands
2024-02-14 03:47:40 -05:00
Andrei Betlen
7b9960d1cb
Update llama.cpp
2024-02-14 03:47:21 -05:00
Andrei Betlen
6943bab6d8
fix: destructor exception where internal classes are missing some uninitialized attributes
2024-02-14 03:38:41 -05:00
Andrei Betlen
07a783779a
fix: Update openbuddy prompt format. Closes #1155
2024-02-13 23:57:10 -05:00
Andrei Betlen
7a79e5ac49
Update llama.cpp
2024-02-13 23:54:05 -05:00
Andrei Betlen
7dbbfdecad
fix: submodule kompute is not included in sdist. Closes #1165
2024-02-13 23:53:56 -05:00
Andrei Betlen
345215a76c
fix: more chatml-function-calling fixes
2024-02-13 23:02:50 -05:00
Andrei Betlen
b1637c2319
Bump version
2024-02-13 12:35:04 -05:00
Andrew Lapp
d6be5333e1
fix: sample idx off-by-one error for logit_processors ( #1179 )
...
* fix sample_idx off-by-one error
* self._scores is indexed differently, only modify the index within self._input_ids
---------
Co-authored-by: Andrew Lapp <andrew@rew.la>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-13 12:26:07 -05:00
Andrei Betlen
f7cdf78788
Update llama.cpp
2024-02-13 12:24:00 -05:00
Andrei Betlen
68fb71b6a2
fix: missing generation_prompt in chatml-function-calling
2024-02-13 03:24:41 -05:00
Andrei Betlen
4b0e3320bd
fix: minor formatting bugs for chatml-function-calling
2024-02-13 03:11:35 -05:00
Andrei Betlen
6fe8b427e1
Bump version
2024-02-13 02:46:52 -05:00
Andrei Betlen
d1822fed6b
fix: Don't change order of json schema object properties unless prop_order is passed, Closes #1180
2024-02-13 02:44:00 -05:00
Andrei Betlen
5efc45bdfd
Update llama.cpp
2024-02-13 02:43:07 -05:00
Andrei Betlen
4348a6cdf0
docs: Fix typo
2024-02-13 02:04:54 -05:00
Andrei Betlen
d605875772
Bump version
2024-02-12 16:28:30 -05:00
Andrei Betlen
b82b0e1014
docs: Temporarily revert function calling docs
2024-02-12 16:27:43 -05:00
Andrei Betlen
cb791716b4
fix: Always set logits_all = True when using speculative decoding
2024-02-12 16:19:05 -05:00
Andrei
153a0049d9
feat: Generic chatml Function Calling ( #957 )
...
* Add demo notebook
* Add initial chat handler
* Update OpenAI types
* Add generic chatml function calling (wip)
* Update chatml generic function calling.
* Progress on auto-tool calls
* fix streaming functions
* Remove print statements
* fix: Suppress output from llama.cpp init and grammar creation
* Add OpenAI v1 python api compatible chat completion function
* Support non-streaming multi-tool calls
* Format
* Include function_call in response.
2024-02-12 15:56:07 -05:00
Andrei Betlen
69413ce08e
Update llama.cpp
2024-02-11 19:00:17 -05:00
Andrei Betlen
9368670639
Update llama.cpp
2024-02-11 14:02:46 -05:00
Connor
a05d90446f
fix: Circular dependancy preventing early Llama object free ( #1176 )
...
commit 901827013b
introduced a cyclic dependency
within Llama objects. That change causes old models to linger in memory longer
than necessary, thereby creating memory bloat in most applications attempting
to switch between models at runtime. This patch simply removes the problematic
line, allowing models to deallocate without relying on GC. One might also
consider combining `weakref.ref` with a `@property` if the `llama` attribute is
absolutely necessary to expose in the tokenizer class.
2024-02-11 13:57:57 -05:00
Akarshan Biswas
918ff27e50
docs: Set the correct command for compiling with syscl support ( #1172 )
2024-02-11 13:55:15 -05:00
Douglas Hanley
19b55ad3e5
feat: use gpu backend for clip if available ( #1175 )
2024-02-11 13:53:59 -05:00
Andrei Betlen
63b0c37836
Update llama.cpp
2024-02-09 13:36:58 -05:00
Andrei Betlen
4abb8c9386
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2024-02-09 13:32:31 -05:00
Andrei Betlen
e16f06e6eb
fix: revert _create_completions.
2024-02-09 02:02:13 -05:00
Andrei Betlen
dfc1b17341
Update llama.cpp
2024-02-08 23:38:12 -05:00
Andrei Betlen
5b4ad6c80b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-02-08 23:34:45 -05:00
Andrei Betlen
85d3374b4d
fix: broken import
2024-02-08 01:13:28 -05:00
Andrei Betlen
b5fca911b5
feat: Move tokenizer to own module
2024-02-08 01:08:18 -05:00
Andrei Betlen
2ef7ba3aed
misc: rename grammar test
2024-02-08 01:07:44 -05:00
Jeffrey Fong
901827013b
feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class ( #1078 )
...
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* set up parallel function calling
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* resolve merge conflict
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* Cleanup PR, fix breaking changes
* Use hf_pretrained_model_name_or_path for tokenizer
* fix hf tokenizer in streaming
* update README
* refactor offset mapping
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-07 20:07:03 -05:00
Andrei Betlen
ce12775490
Update llama.cpp
2024-02-06 18:50:56 -05:00
Andrei Betlen
34f31040f6
Bump version
2024-02-06 12:47:59 -05:00
Andrei Betlen
5e3e67af47
Update llama.cpp
2024-02-06 12:44:07 -05:00
Andrei Betlen
310fbf4e49
Update llama.cpp
2024-02-05 22:07:14 -05:00
Andrei Betlen
59760c85ed
fix: Use llama_log_callback to avoid suppress_stdout_stderr
2024-02-05 21:52:12 -05:00
Andrei Betlen
3553b14670
Update llama.cpp
2024-02-05 13:26:50 -05:00
Andrei
7467f129e5
Revert "Fix: fileno error google colab ( #729 ) ( #1156 )" ( #1157 )
...
This reverts commit bebfba0f08
.
2024-02-02 12:18:55 -05:00
Dulsara
bebfba0f08
Fix: fileno error google colab ( #729 ) ( #1156 )
...
Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.
2024-02-02 12:05:46 -05:00
Andrei Betlen
8a5911bd5d
Update llama.cpp
2024-02-02 09:41:27 -05:00
Andrei Betlen
de526d0214
Update llama.cpp
2024-02-01 12:35:31 -05:00