Releases: marella/ctransformers
Releases · marella/ctransformers
0.2.27
Changes
- Skip evaluating tokens that are evaluated in the past. This can significantly speed up prompt processing in chat applications that prepend previous messages to prompt.
- Deprecate
LLM.reset()
method. Use high-level API instead.
- Add support for batching and beam search to 🤗 model.
- Remove universal binary option when building for AVX2, AVX on macOS.
0.2.26
Changes
- Add support for 🤗 Transformers
0.2.25
Changes
- Add support for GGUF v2
- Add CUDA support for Falcon GGUF models
- Add ROCm support
- Add low-level API for
add_bos_token
, bos_token_id
0.2.24
Changes
- Add GGUF format support for Llama and Falcon models
- Add support for Code Llama models
0.2.23
Changes
- Add
mmap
and mlock
parameters for LLaMA and Falcon models
- Add
revision
option for models on Hugging Face Hub
0.2.22
Changes
- Add experimental CUDA support for StarCoder, StarChat models
- Add
gpt_bigcode
as model type for StarCoder, StarChat models
- Fix loading GPTQ models from a local path
0.2.21
Changes
- Simplify CUDA installation by using precompiled runtime libraries from NVIDIA
0.2.20
Changes
- Add experimental CUDA support for MPT models
0.2.19
Changes
- Add Metal support for LLaMA 2 70B models
- Update llama.cpp
0.2.18
Changes
- Add experimental support for GPTQ models using ExLlama