HTTP API & Thread state #8

iimez · 2024-07-22T03:16:59Z

iimez
Jul 22, 2024
Maintainer

Stateless design of the OAI v1 chat endpoints does not play very well with the way llama.cpp works. The current context cache is a nice workaround to make the default case of turn-by-turn chat fast, but a better solution would be to offer a way to address chat sessions explicitly, and add the concept of a (stateful, persistable) thread.

Technically it would be possible to improve the context cache for stateless requests as well (ie, remember the last N states, allow for reuse of context even on larger message-array-differences and editing of messages without having to reingest everything). I'm afraid though that this would add a lot of complexity while still being sub optimal in some cases. Thats why I dont plan to prioritize the stateless chat context cache for now. If we were to add complexity it seems more valuable to put it into adding thread state and into a "native" HTTP API that exposes functionality in a way that is closer modelled after how the internals work, and focus on keeping that as simple and transparent as possible while exposing as much functionality as we can.

What I had in mind for such an API:

Allow creation of all task types and try keep inputs as close as possible to their respective interfaces in node
Support monitoring tasks, and cancelling them
Include both stateful (/threads?) and stateless (/tasks/chat?) chat endpoints (Some drafts)
Package an in-memory thread store that can be swapped out for persistence
Allow mutation of these threads (make requests for generation of new assistant messages explicit)
The instance locking process (or the existence of model instances, or engines) should probably not be part of this API. Can't yet think of a way it could be useful. (Other than including the instance id in responses for debugging)

Everything up for debate. I have no time to work on this yet, but I thought I should post early on the rough direction I'd like it to go.

Contributions welcome :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP API & Thread state #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

HTTP API & Thread state #8

iimez Jul 22, 2024 Maintainer

Replies: 0 comments

iimez
Jul 22, 2024
Maintainer