Releases · LostRuins/koboldcpp

01 May 16:57

0703cdf

koboldcpp-1.17

koboldcpp-1.17

Removed Cloudflare Insights - this was previously in Kobold Lite and was included in KoboldCpp. For disclosure: Cloudflare Insights is a GDPR compliant tool that Kobold Lite used previously used to provide information on browser and platform distribution (e.g. ratio of desktop/mobile users), browser type (chrome/firefox etc), to determine which browser platforms I have to support for Kobold Lite. You can read more about it here: https://www.cloudflare.com/insights/ It did not track any personal information, and did not relay any data you load, use, enter or access within Kobold. It was not intended to be included in KoboldCpp, and I originally removed it but forgot for subsequent versions. As of this version, it is removed from both Kobold Lite and KoboldCpp by request.
Added the Token Unbanning to the UI, and allowed it to prevent generation of the EOS token, which is required for newer Pygmalion models. You can trigger it with --unbantokens
Pulled upstream fixes.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Assets 3

30 Apr 06:31

LostRuins

v1.16

b331545

koboldcpp-1.16

koboldcpp-1.16

Integrated the overhauled Token Samplers. The whole sampling system has been reworked for Top-P, Top-K and Rep Pen, all model architectures and types now use the same sampling functions. Also added 2 new samplers - Tail Free Sampling (TFS) and Typical Sampling. As I did not test the new implementations for correctness, please let me know if you are experiencing weird results (or degradations for previous samplers).
Integrated CLBlast support for the q5_0 and q5_1 formats. Note: Upstream llama.cpp repo has completely removed support for the q4_3 format. For now I still plan to keep support for q4_3 available within KoboldCpp but you are strongly advised not to use q4_3 anymore. Please switch or reconvert any q4_3 models if you can.
Fixed a few edge cases with GPT2 models going OOM with small batch sizes.
Fixed a regression where older GPT-J models (e.g. the original model from Alpin's Pyg.cpp fork) failed to load due to some upstream changes in the GGML library. You are strongly advised to not use outdated formats - reconvert if you can, it will be faster.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

Assets 3

28 Apr 07:23

LostRuins

v1.15

f75de52

koboldcpp-1.15

koboldcpp-1.15

Added a brand new "Easy Mode" GUI which triggers if no command line arguments are set. This is aimed to be a noob-friendly way to get into KoboldCpp, but for full functionality you are still advised to run it from the command line with customized arguments. You can skip it with any command line argument, or using the flag --skiplauncher which does nothing else.
Pulled the new quantization format support for q5_0 and q5_1 for llama.cpp from upstream. Also pulled the q5 changes for GPT-2, GPT-J and GPT-NeoX formats. Note that these will not work in CLBlast yet - but OpenBLAS should work fine.
Added a new flag --debugmode which shows the Tokenized prompt being sent to the backend within the terminal window.
Setting --stream flag now automatically redirects the URL in the embedded Kobold Lite UI, no need to type ?streaming=1 anymore.
Updated Kobold Lite, now supports multiple custom stopping sequences which you can specify, separating in the UI with the ||$|| delimiter. Lite also now saves your custom stopping sequences into your save files and autosaves.
Merged upstream fixes and improvements.
Minor console fixes for Linux, and OSX compatibility.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

Assets 3

26 Apr 15:38

LostRuins

v1.14

93a8e00

koboldcpp-1.14

koboldcpp-1.14

Added backwards compatibility for an older version of NeoX with different quantizations
Fixed a few scenarios where users may encounter OOM crashes
Pulled upstream updates

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Alternative Options:
Non-AVX2 version now included in the same .exe file, enable with --noavx2 flags
Big context too slow? Try the --smartcontext flag to reduce prompt processing frequency
Run with your GPU using CLBlast, with --useclblast flag for a speedup

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

Assets 3

24 Apr 13:55

LostRuins

v1.13.1

59fb174

koboldcpp-1.13.1

koboldcpp-1.13.1

A multithreading bug fix has allowed CLBlast to greatly increase prompt processing speed. It should now be up to 50% faster than before, and just slightly slower than CuBLAS alternatives. Because of this, we probably will no longer need to integrate CuBLAS.
Merged the q4_2 and q4_3 CLBlast dequantization kernels, allowing them to be used with CLBlast.
Added a new flag --unbantokens. Normally, KoboldAI prevents certain tokens such as EOS and Square Brackets. This flag unbans them.
Edit: Fixed compile errors, made mmap automatic when lora is selected, added updated quantizers, and quantization handling for gpt neox gpt 2 and gptj

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

Assets 3

23 Apr 08:00

LostRuins

v1.12

9129e93

koboldcpp-1.12

koboldcpp-1.12
This is a bugfix release

Fixed a few more scenarios where GPT2/GPTJ/GPTNeoX will go out of memory when using BLAS. Also, the max blas batch for non llama models currently capped to 256.
Minor CLBlast optimizations should slightly increase speed

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

Assets 4

22 Apr 08:36

LostRuins

v1.11

811989c

koboldcpp-1.11

koboldcpp-1.11

Now has GPT-NeoX / Pythia / StableLM support!
- Try my special model, Pythia-70m-ChatSalad here: https://huggingface.co/concedo/pythia-70m-chatsalad-ggml/tree/main
Added upstream LORA file support for llama, use the --lora parameter.
Added limited fast-forwarding capabilities for RWKV, context can be reused if its completely unmodified.
Kobold Lite now supports using an additional custom stopping sequence, edit it in the Memory panel.
Updated Kobold Lite, and pulled llama improvements from upstream.
Improved OSX and Linux build support - now automatically builds all libraries with the requested flags, and you can select which ones to use at runtime. Example: do a make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 and it will build both OpenBlas and CLBlast libraries on your platform, then you select clblast with --useclblast at runtime.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

Assets 3

18 Apr 13:52

LostRuins

v1.10

f39def8

koboldcpp-1.10

koboldcpp-1.10

Now has RWKV support without needing pytorch or tokernizers or other external libraries!
- Try RWKV-v4-169m here: https://huggingface.co/concedo/rwkv-v4-169m-ggml/tree/main
Now allows direct launching browser with --launch parameter. You can also do something like --stream --launch.
Updated Kobold Lite, and pulled llama improvements from upstream.
API now lists the KoboldCpp version number with a new endpoint /api/extra/version

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Assets 3

16 Apr 14:44

LostRuins

v1.9

9581171

koboldcpp-1.9

koboldcpp-1.9

This was such a good update that I had to make a new version, so there are 2 new releases today.

Now has support for stopping sequences fully implemented in the API! They've been implemented in a similar and compatible way to my United PR one-some/KoboldAI-united#5 and they should be shortly usable in online Lite as well as (eventually) the main kobold client when it gets merged. What this means is that now the AI will be able to finish a response early even if not all the response tokens are consumed, and save time by sending the reply instead of generating excess unneeded tokens. Automatically integrates into the latest version of Kobold Lite which sets the correct stop sequences from Chat and Instruct mode, which is also updated here.
GPT-J and GPT2 models now support BLAS mode! They will use a smaller batch size than llama models, but the effect should still be very noticeably faster!

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Assets 3

16 Apr 08:01

LostRuins

v1.8.1

8bf2e50

koboldcpp-1.8.1

koboldcpp-1.8.1

Another amazing improvement by @0cc4m, CLBlast now does the 4bit dequantization on GPU! That translates to about a 20% speed increase when using CLBlast for me, and should be a very welcome improvement. To use it, run with --useclblast [platform_id] [device_id] (you may have to figure out the values for your correct GPU through trial and error)
Merged fixes and optimizations from upstream
Fixed a compile error in OSX

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Alternative Options:
Non-AVX2 version now included in the same .exe file, enable with --noavx2 flags

Contributors

0cc4m

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Releases: LostRuins/koboldcpp

koboldcpp-1.17

koboldcpp-1.16

koboldcpp-1.15

koboldcpp-1.14

koboldcpp-1.13.1

koboldcpp-1.12

koboldcpp-1.11

koboldcpp-1.10

koboldcpp-1.9

koboldcpp-1.8.1

Contributors