-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Allow stream listener to work on any type #8833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9b584d8
0454cd5
146f62f
45facc4
67a1095
839b939
d562879
43e5fb0
7714b4f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,6 +3,7 @@ | |||||||||
| from queue import Queue | ||||||||||
| from typing import TYPE_CHECKING, Any | ||||||||||
|
|
||||||||||
| import jiter | ||||||||||
| from litellm import ModelResponseStream | ||||||||||
|
|
||||||||||
| from dspy.adapters.chat_adapter import ChatAdapter | ||||||||||
|
|
@@ -49,6 +50,8 @@ def __init__( | |||||||||
| self.cache_hit = False | ||||||||||
| self.allow_reuse = allow_reuse | ||||||||||
|
|
||||||||||
| self.json_adapter_state = {"field_accumulated_tokens": ""} | ||||||||||
|
|
||||||||||
| self.adapter_identifiers = { | ||||||||||
| "ChatAdapter": { | ||||||||||
| "start_identifier": f"[[ ## {self.signature_field_name} ## ]]", | ||||||||||
|
|
@@ -62,7 +65,7 @@ def __init__( | |||||||||
| "end_identifier": re.compile(r"\w*\"(,|\s*})"), | ||||||||||
| "start_indicator": '"', | ||||||||||
| "end_pattern_prefixes": ['"', '",', '" ', '"}'], | ||||||||||
| "end_pattern_contains": None, | ||||||||||
| "end_pattern_contains": "}", | ||||||||||
| }, | ||||||||||
| "XMLAdapter": { | ||||||||||
| "start_identifier": f"<{self.signature_field_name}>", | ||||||||||
|
|
@@ -126,6 +129,7 @@ def receive(self, chunk: ModelResponseStream): | |||||||||
| self.cache_hit = False | ||||||||||
| self.field_start_queue = [] | ||||||||||
| self.field_end_queue = Queue() | ||||||||||
| self.json_adapter_state["field_accumulated_tokens"] = "" | ||||||||||
| self.stream_start = False | ||||||||||
| else: | ||||||||||
| return | ||||||||||
|
|
@@ -147,7 +151,7 @@ def receive(self, chunk: ModelResponseStream): | |||||||||
| is_last_chunk=self.stream_end, | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| if chunk_message and start_identifier in chunk_message: | ||||||||||
| if chunk_message and start_identifier in chunk_message and not isinstance(settings.adapter, JSONAdapter): | ||||||||||
| # If the cache is hit, the chunk_message could be the full response. When it happens we can | ||||||||||
| # directly end the stream listening. In some models like gemini, each stream chunk can be multiple | ||||||||||
| # tokens, so it's possible that response only has one chunk, we also fall back to this logic. | ||||||||||
|
|
@@ -180,10 +184,13 @@ def receive(self, chunk: ModelResponseStream): | |||||||||
| # Keep the part after the start_identifier from the concat_message, we need to write it to the buffer. | ||||||||||
| value_start_index = concat_message.find(start_identifier) + len(start_identifier) | ||||||||||
| chunk_message = concat_message[value_start_index:].lstrip() | ||||||||||
| if isinstance(settings.adapter, JSONAdapter) and chunk_message.startswith('"'): | ||||||||||
| # For JSONAdapter, we need to remove the leading ". We cannot do this with the start_identifier | ||||||||||
| # because there could be a few splitters between ':' and '"', e.g., '"name": "value"'. | ||||||||||
| chunk_message = chunk_message[1:] | ||||||||||
|
|
||||||||||
| if isinstance(settings.adapter, JSONAdapter): | ||||||||||
| # For JSONAdapter, we rely on partial json parsing to detect the end of the field we are listening | ||||||||||
| # to, so we need to maintain a few extra states to help us with that. | ||||||||||
| # We add an extra "{" to the beginning of the field_accumulated_tokens, so we can detect the | ||||||||||
| # appearance of the next key. | ||||||||||
| self.json_adapter_state["field_accumulated_tokens"] += "{" + start_identifier | ||||||||||
|
|
||||||||||
| elif self._buffered_message_end_with_start_identifier(concat_message.strip(), start_identifier): | ||||||||||
| # If the buffered message ends with part of the start_identifier, we keep looking for the | ||||||||||
|
|
@@ -196,30 +203,98 @@ def receive(self, chunk: ModelResponseStream): | |||||||||
|
|
||||||||||
| if self.stream_start and chunk_message: | ||||||||||
| # The stream is started, we keep returning the token until we see the start of the next field. | ||||||||||
| token = None | ||||||||||
| self.field_end_queue.put(chunk_message) | ||||||||||
|
|
||||||||||
| token = None | ||||||||||
| concat_message = "".join(self.field_end_queue.queue).strip() | ||||||||||
| if re.search(end_identifier, concat_message): | ||||||||||
| # The next field is identified, we can end the stream and flush out all tokens in the buffer. | ||||||||||
| self.stream_end = True | ||||||||||
| token = self.flush() | ||||||||||
| token = token.rstrip() # Remove the trailing \n\n | ||||||||||
| elif not self._could_form_end_identifier(concat_message, adapter_name): | ||||||||||
|
|
||||||||||
| if not self._could_form_end_identifier(concat_message, adapter_name): | ||||||||||
| # Buffer cannot form end identifier, safe to flush out the tokens in the buffer. | ||||||||||
| token = self.flush() | ||||||||||
| elif self.field_end_queue.qsize() > 10: | ||||||||||
| # Buffer could form end identifier, but we've exceeded max buffer size | ||||||||||
| # Yield the oldest token to prevent unbounded buffering | ||||||||||
| # We keep the last 10 tokens in the buffer if they can potentially form the end_identifier to avoid | ||||||||||
| # sending the DSPy bolilerplate tokens to users. 10 is a heuristic number that is sufficient to capture | ||||||||||
|
||||||||||
| # sending the DSPy bolilerplate tokens to users. 10 is a heuristic number that is sufficient to capture | |
| # sending the DSPy boilerplate tokens to users. 10 is a heuristic number that is sufficient to capture |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we accumulate chunk_message instead of token?
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'b racket' (with extra spaces) to 'bracket'.
| # When the accumulated tokens ends with a curly b racket, that means the streaming for the predict we are | |
| # When the accumulated tokens ends with a curly bracket, that means the streaming for the predict we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # When the accumulated tokens ends with a curly b racket, that means the streaming for the predict we are | |
| # When the accumulated tokens end with a curly bracket, that means the streaming for the prediction we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So overall we will return a raw string chunk so the deserialization needs to happen on the caller side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we limit this to be ValueError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, can't we just count the number of { and }?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, please see the new implementation for a more robust solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is parsed.keys ordered based on the key order in the json string?
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we delete is_streamable method of Type?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to introduce other keys to
self.json_adapter_state? or can we flatten the structure?