Skip to content

Add AUGMENT stage to request pipeline #815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

jwiegley
Copy link

This PR lays the groundwork for implementation many forms of RAG in gptel, including:

  • Document inclusion, as selected by vector similarity
  • "Agents" that augment the context when referenced
  • "Dynamic context" that modifies the context is response to inputs or conditions

Copy link
Owner

@karthink karthink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording some thoughts/questions about the code, let me know what you think.

I can start making some changes to simplify this patch soon, can I push them to your branch?

asynchronous.

Returns the transformed info at the end."
(if (null fns)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the augmentors to run sequentially -- not an issue for synchronous augmentors, but is unnecessarily slow if you have async ones.

I thought about why you might have designed it this way instead of using a counting callback and running through gptel-augment-handler-functions with run-hook-with-args -- the only reason I can think of is that alterations to fsm might be noncommutative, and that the ordering of gptel-augment-handler-functions might not be respected when there are async callbacks involved. Is this something you foresee as a problem?

Otherwise we can switch to a counting callback, which can shorten the augment wait considerably.

#'(lambda (fsm) (gptel--augment-info fsm (cdr fns)))
fsm)))

(defun gptel--handle-augment (fsm)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we switch to a counting callback in gptel--augment-info (see comment), we don't need a (recursive) gptel--augment-info or gptel--finish-augmentation any more, and gptel--handle-augment can handle the whole augment process, localizing all the augment code.

@@ -2379,53 +2471,83 @@ be used to rerun or continue the request at a later time."
gptel-backend (and gptel--num-messages-to-send
(* 2 gptel--num-messages-to-send))))))
((consp prompt) (gptel--parse-list gptel-backend prompt)))))
(info (list :data (gptel--request-data gptel-backend full-prompt)
(info (list :data (list :args t
:full-prompt full-prompt
Copy link
Owner

@karthink karthink May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem here is that full-prompt is not a backend-agnostic format -- it can be of a different type/schema for each backend. This makes modifying the prompt in an augmentor difficult.

This can be papered over somewhat with gptel--inject-prompt, which can act as a backend-agnostic interface. Even if I add its counterpart gptel--retrieve-prompt and make these setfable, it's not a great interface.

My original idea was that the copied buffer itself is the backend-agnostic prompt list, and another intermediate, universal format for prompts between it and the final messages array is unnecessary. (It can also produce a lot of garbage.)

Now we need a new backend-agnostic prompts list format. Note that it can include many roles: user, llm/assistant, tool-call, tool-result and possibly more in the future.

I still think using the buffer can work because it's a simple data structure and Emacs is great at modifying it. Adding text to the top and bottom of the buffer (corresponding to before the first messages array and after the last) is very simple, as is deleting or adding text anywhere in the middle. This doesn't have to be done via gptel-prompt-filter-hook -- it can be part of the augment process.

(setf (gptel-fsm-info fsm) info))
(unless dry-run (gptel--fsm-transition fsm)) ;INIT -> WAIT
fsm)

(defun gptel--realize-info (info)
(let ((data (plist-get info :data)))
(if (not (plist-member data :args))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of :args here?

@@ -2379,53 +2471,83 @@ be used to rerun or continue the request at a later time."
gptel-backend (and gptel--num-messages-to-send
(* 2 gptel--num-messages-to-send))))))
((consp prompt) (gptel--parse-list gptel-backend prompt)))))
(info (list :data (gptel--request-data gptel-backend full-prompt)
(info (list :data (list :args t
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me yet how to avoid this step of storing and rehydrating this state in :data. I'll continue to think about this. But at minimum I think capturing many of these keys is unnecessary since you pass info (actually the whole fsm) to the augmentors, and they are available as :keys inside info in the current implementation on master.

  • :context: Accessible inside info as :context anyway.
  • :gptel-include-reasoning: Available as :include-reasoning in :info.
  • :callback: Do we expect the augmentors to change the callback?
  • :in-place: Available as :in-place in :info.

@@ -2829,7 +2951,13 @@ the response is inserted into the current buffer after point."
(kill-buffer buf)))
nil t nil)))
;; TODO: Add transformer here.
(setf (alist-get proc-buf gptel--request-alist) fsm))))
(setf (alist-get proc-buf gptel--request-alist)
(cons fsm
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to capture proc-buf into the closure, since this information is found in gptel-abort anyway and we can pass it as an argument to the abort-callback. This should make the callbacks "static" functions that can be native-compiled.

@jwiegley
Copy link
Author

jwiegley commented May 2, 2025

Recording some thoughts/questions about the code, let me know what you think.

I can start making some changes to simplify this patch soon, can I push them to your branch?

I will read through your changes in more detail tomorrow; but in the meantime, please push whatever changes you wish!

@jwiegley jwiegley marked this pull request as ready for review May 2, 2025 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants