Skip to content

Conversation

@WhyPenguins
Copy link
Contributor

@WhyPenguins WhyPenguins commented Dec 21, 2025

Description

This draft PR adds support for downloading and running language models via SplashKit, using the llama.cpp library. Usage is as simple as:

write_line( generate_reply(QWEN3_0_6B_INSTRUCT, "What is the capital of Australia? Answer with one word.") );

I'd be interested to know if this is on the right track, or if there are any changes that would make it more likely to be merged in. Thanks!

Details

Llama.cpp is used to perform inference for the language models - it has been added as a submodule to splashkit-external, and added to CMakeLists.txt as an External Project rather than a subdirectory. This was done so that it could have settings configured independently to the main project (in particular being set to Release mode, which is much quicker).

On the API side there is an enum that contains a list of supported models (language_model), and an accompanying array that contain URLs, names, and default inference settings (models, in genai.cpp). At least for now I've built llama.cpp so that only CPU inferencing is supported, so the models are chosen such that they still run at acceptable speeds, and also download within a reasonable amount of time (500mb ~ 1.7gb).

When first used, a model is auto-downloaded if it doesn't already exist in ~/.splashkit/models/... - this download can be resumed if interrupted (see sk_http_get_file).

The model is then loaded, the user's prompt formatted (if in "reply" mode) and tokenized, and then the output text is recorded and returned to the user. The backend (genai_backend.cpp/.h) abstracts this out so that tokens can be fetched one at a time (used in __generate_common in genai.cpp).

It's also possible to stream the text back, by using conversation objects. These can be created with create_conversation(...), and have functions for adding new messages, and receiving individual tokens + information about them. test_genai.cpp shows the current usage - there are some rough edges still to be fixed up but here's how it can look now:
image

The supported models list also contains base, instruct, and thinking variants for each model (where released).

Basic usage looks like:

// Generates a reply to a prompt
string generate_reply(string prompt);
// Generates text that continues from existing text (similar to auto-complete)
string generate_text(string text);

These use the default Qwen3 0.6B Instruct model. Overloads allow for changing settings - there is an overload to simply set the model via an enum, and also an overload that allows for changing all settings. For example:

generate_reply("Hello!", option_max_tokens(option_language_model(GEMMA3_1B_INSTRUCT), 1000));

In the finished PR each option will be exposed, similar to drawing_options.


Hopefully that's generally on the right track, let me know if there's anything that needs adjustment!

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation (update or new)

How Has This Been Tested?

So far testing has been a bit limited:

  • I've tested downloading and running four of the supported models on Linux
  • Tested download errors when disconnected from the Internet
  • Tested errors when models are corrupted/incomplete
  • Testing a variety of prompts with generate_text and generate_reply to ensure the model versions are usable
  • Have also added a simple genai_test in sktest, though I plan to expand it a bit further

I would like to test the PR on Windows as well, and ensure all the models download and run.

Testing Checklist

  • Tested with sktest
  • Tested with skunit_tests

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have requested a review from ... on the Pull Request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant