Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added streaming functionality to the LLM. #35

Merged
merged 8 commits into from
Jan 31, 2025

Conversation

kevincogan
Copy link
Collaborator

Add Streaming Support to RAG Pipeline

Summary

This PR introduces token-level streaming to the RAG pipeline, enhancing interactivity and responsiveness. The new RagStreamHandler class manages real-time response streaming and integrates into RagPipelineWrapper.

Key Changes

  • Streaming Handler:
    • Added RagStreamHandler for real-time token streaming with methods to start, stop, and yield response chunks.
  • Pipeline Integration:
    • Introduced stream_query in RagPipelineWrapper to handle streaming queries.
    • Integrated streaming callbacks into _add_llm.
    • Updated run to switch between streaming and non-streaming modes based on the stream setting.
  • Configuration:
    • Added STREAM_TIMEOUT to manage streaming timeout.
    • Enabled stream in settings.py:
      "stream": True

Testing

  • Verified token-level streaming works correctly.
  • Confirmed compatibility with non-streaming mode.
  • Ensured proper thread and queue management.

Impact

  • Improved Responsiveness: Faster, interactive responses with streaming.
  • No Regression: Non-streaming functionality remains intact.

How to Use

  1. Enable streaming in settings.py:
    "stream": True
  2. Use the stream_query method in RagPipelineWrapper to enable streaming queries.
  3. Handle and print streaming responses using the main.py script:
    result_generator = execute_rag_query(args.query, **custom_settings)
    for chunk in result_generator:
        decoded_chunk = chunk.decode("utf-8")
        print(decoded_chunk, end="", flush=True)

Notes for Reviewers

  • Review RagStreamHandler for robustness and edge-case handling.
  • Verify pipeline integration ensures consistent streaming behavior.

@kevincogan kevincogan self-assigned this Jan 24, 2025
@kevincogan kevincogan added the enhancement New feature or request label Jan 24, 2025
@kevincogan kevincogan linked an issue Jan 24, 2025 that may be closed by this pull request
Copy link
Collaborator

@ilya-kolchinsky ilya-kolchinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the remarks

pragmatic/main.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/settings.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
@kevincogan kevincogan removed the request for review from hemajv January 29, 2025 07:02
Copy link
Collaborator

@ilya-kolchinsky ilya-kolchinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done - please finish these last fixes and I will approve.

pragmatic/main.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
pragmatic/pipelines/rag.py Outdated Show resolved Hide resolved
test/sanity_test.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ilya-kolchinsky ilya-kolchinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@ilya-kolchinsky ilya-kolchinsky merged commit fdfc83d into master Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pragmatic should stream the responses back in real time
2 participants