koboldcpp-1.43
koboldcpp-1.43
- Re-added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a
--ropeconfig
. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified--contextsize
. Setting--ropeconfig
will override this. This was bugged and removed in the previous release, but it should be working fine now. - HIP and CUDA visible devices set to that GPU only, if GPU number is provided and tensor split is not specified.
- Fixed RWKV models being broken after recent upgrades.
- Tweaked
--unbantokens
to decrease the banned token logit values further, as very rarely they could still appear. Still not using-inf
as that causes issues with typical sampling. - Integrate SSE streaming improvements from @kalomaze
- Added mutex for thread-safe polled-streaming from @Elbios
- Added support for older GGML (ggjt_v3) for 34B llama2 models by @vxiiduu, note that this may still have issues if n_gqa is not 1, in which case using GGUF would be better.
- Fixed support for Windows 7, which should work in noavx2 and failsafe modes again. Also, SSE3 flags are now enabled for failsafe mode.
- Updated Kobold Lite, now uses placeholders for instruct tags that get swapped during generation.
- Tab navigation order improved in GUI launcher, though some elements like checkboxes still require mouse to toggle.
- Pulled other fixes and improvements from upstream.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.
Of Note:
- Reminder that HIPBLAS requires self compilation, and is not included by default in the prebuilt executables.
- Remember that token unbans can now be set via API (and Lite) in addition to the command line.