F16_KV appears to have been removed here: af99c6fbfc
This addresses two issues:
- #995 which just requests to add the KV cache offloading param
- #1006 a NULL ptr exception when using the embeddings (introduced by
leaving f16_kv in the fields struct)
If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.
See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.
* llava v1.5 integration
* Point llama.cpp to fork
* Add llava shared library target
* Fix type
* Update llama.cpp
* Add llava api
* Revert changes to llama and llama_cpp
* Update llava example
* Add types for new gpt-4-vision-preview api
* Fix typo
* Update llama.cpp
* Update llama_types to match OpenAI v1 API
* Update ChatCompletionFunction type
* Reorder request parameters
* More API type fixes
* Even More Type Updates
* Add parameter for custom chat_handler to Llama class
* Fix circular import
* Convert to absolute imports
* Fix
* Fix pydantic Jsontype bug
* Accept list of prompt tokens in create_completion
* Add llava1.5 chat handler
* Add Multimodal notebook
* Clean up examples
* Add server docs
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
* Add common grammars and json-schema-to-grammar utility function from llama.cpp
* Pass functions to format function
* Add basic functionary formatting
* Add LlamaChatHandler for more complex chat use cases
* Add function calling example notebook
* Add support for regular chat completions alongside function calling
* Add low-level batching notebook
* fix: tokenization of special characters: (#850)
It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly
* Update CHANGELOG
* Cleanup
* Fix runner label
* Update notebook
* Use llama_decode and batch api
* Support logits_all parameter
---------
Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>