-
Notifications
You must be signed in to change notification settings - Fork 238
Add AUGMENT stage to request pipeline #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording some thoughts/questions about the code, let me know what you think.
I can start making some changes to simplify this patch soon, can I push them to your branch?
asynchronous. | ||
|
||
Returns the transformed info at the end." | ||
(if (null fns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes the augmentors to run sequentially -- not an issue for synchronous augmentors, but is unnecessarily slow if you have async ones.
I thought about why you might have designed it this way instead of using a counting callback and running through gptel-augment-handler-functions
with run-hook-with-args
-- the only reason I can think of is that alterations to fsm
might be noncommutative, and that the ordering of gptel-augment-handler-functions
might not be respected when there are async callbacks involved. Is this something you foresee as a problem?
Otherwise we can switch to a counting callback, which can shorten the augment wait considerably.
#'(lambda (fsm) (gptel--augment-info fsm (cdr fns))) | ||
fsm))) | ||
|
||
(defun gptel--handle-augment (fsm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we switch to a counting callback in gptel--augment-info
(see comment), we don't need a (recursive) gptel--augment-info
or gptel--finish-augmentation
any more, and gptel--handle-augment
can handle the whole augment process, localizing all the augment code.
@@ -2379,53 +2471,83 @@ be used to rerun or continue the request at a later time." | |||
gptel-backend (and gptel--num-messages-to-send | |||
(* 2 gptel--num-messages-to-send)))))) | |||
((consp prompt) (gptel--parse-list gptel-backend prompt))))) | |||
(info (list :data (gptel--request-data gptel-backend full-prompt) | |||
(info (list :data (list :args t | |||
:full-prompt full-prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem here is that full-prompt
is not a backend-agnostic format -- it can be of a different type/schema for each backend. This makes modifying the prompt in an augmentor difficult.
This can be papered over somewhat with gptel--inject-prompt
, which can act as a backend-agnostic interface. Even if I add its counterpart gptel--retrieve-prompt
and make these setf
able, it's not a great interface.
My original idea was that the copied buffer itself is the backend-agnostic prompt list, and another intermediate, universal format for prompts between it and the final messages array is unnecessary. (It can also produce a lot of garbage.)
Now we need a new backend-agnostic prompts list format. Note that it can include many roles: user
, llm
/assistant
, tool-call
, tool-result
and possibly more in the future.
I still think using the buffer can work because it's a simple data structure and Emacs is great at modifying it. Adding text to the top and bottom of the buffer (corresponding to before the first messages array and after the last) is very simple, as is deleting or adding text anywhere in the middle. This doesn't have to be done via gptel-prompt-filter-hook
-- it can be part of the augment process.
(setf (gptel-fsm-info fsm) info)) | ||
(unless dry-run (gptel--fsm-transition fsm)) ;INIT -> WAIT | ||
fsm) | ||
|
||
(defun gptel--realize-info (info) | ||
(let ((data (plist-get info :data))) | ||
(if (not (plist-member data :args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of :args
here?
@@ -2379,53 +2471,83 @@ be used to rerun or continue the request at a later time." | |||
gptel-backend (and gptel--num-messages-to-send | |||
(* 2 gptel--num-messages-to-send)))))) | |||
((consp prompt) (gptel--parse-list gptel-backend prompt))))) | |||
(info (list :data (gptel--request-data gptel-backend full-prompt) | |||
(info (list :data (list :args t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me yet how to avoid this step of storing and rehydrating this state in :data
. I'll continue to think about this. But at minimum I think capturing many of these keys is unnecessary since you pass info
(actually the whole fsm
) to the augmentors, and they are available as :keys
inside info
in the current implementation on master.
:context
: Accessible insideinfo
as:context
anyway.:gptel-include-reasoning
: Available as:include-reasoning
in:info
.:callback
: Do we expect the augmentors to change the callback?:in-place
: Available as:in-place
in:info
.
@@ -2829,7 +2951,13 @@ the response is inserted into the current buffer after point." | |||
(kill-buffer buf))) | |||
nil t nil))) | |||
;; TODO: Add transformer here. | |||
(setf (alist-get proc-buf gptel--request-alist) fsm)))) | |||
(setf (alist-get proc-buf gptel--request-alist) | |||
(cons fsm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to capture proc-buf
into the closure, since this information is found in gptel-abort
anyway and we can pass it as an argument to the abort-callback. This should make the callbacks "static" functions that can be native-compiled.
I will read through your changes in more detail tomorrow; but in the meantime, please push whatever changes you wish! |
This PR lays the groundwork for implementation many forms of RAG in gptel, including: