commit 901827013b introduced a cyclic dependency
within Llama objects. That change causes old models to linger in memory longer
than necessary, thereby creating memory bloat in most applications attempting
to switch between models at runtime. This patch simply removes the problematic
line, allowing models to deallocate without relying on GC. One might also
consider combining `weakref.ref` with a `@property` if the `llama` attribute is
absolutely necessary to expose in the tokenizer class.
* Add draft model param to llama class, implement basic prompt lookup decoding draft model
* Use samplingcontext for sampling
* Use 1d array
* Use draft model for sampling
* Fix dumb mistake
* Allow for later extensions to the LlamaDraftModel api
* Cleanup
* Adaptive candidate prediction
* Update implementation to match hf transformers
* Tuning
* Fix bug where last token was not used for ngram prediction
* Remove heuristic for num_pred_tokens (no benefit)
* fix: n_candidates bug.
* Add draft_model_num_pred_tokens server setting
* Cleanup
* Update README
* Added mistral instruct chat format as "mistral"
* Fix stop sequence (merge issue)
* Update chat format name to `mistral-instruct`
---------
Co-authored-by: Andrei <abetlen@gmail.com>
* feat: Add support for jinja templating
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* fix: Refactor chat formatter and update interface for jinja templates
- Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality.
- Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility.
- Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string.
- Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation.
These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage.
* Add outline for Jinja2 templating integration documentation
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Add jinja2 as a dependency with version range for Hugging Face transformers compatibility
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Update jinja2 version constraint for mkdocs-material compatibility
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Fix attribute name in AutoChatFormatter
- Changed attribute name from `self._renderer` to `self._environment`
---------
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>