Skip to content

KaramSahoo/VR-Visual-Question-Answering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgenticVQA

AgenticVQA is a Vision Question Answering (VQA) system that leverages LLMs and multimodal models to answer questions about images, with support for audio transcription and multiagentic workflows.

Installation

Prerequisites

  • Python 3.12+
  • pip
  • (Optional) ffmpeg (for audio conversion)

Setup

  1. Clone the repository:
    git clone https://github.com/KaramSahoo/AgenticVQA
    cd AgenticVQA
  2. Install dependencies:
    pip install -r requirements.txt
    If you use audio features, install ffmpeg:
    # Windows
    choco install ffmpeg
    # MacOS
    brew install ffmpeg
    # Linux
    sudo apt-get install ffmpeg
  3. Configure API keys for OpenAI, Anthropic, LangSmith, etc. as environment variables or in your code.

Folder Structure

AgenticVQA/
├── answer_log.csv                # Log of answers generated by the system
├── app.py                        # Main application entry point
├── agents/                       # Core agent logic for VQA and evaluation
│   ├── florence_agent.py         # Florence VQA agent implementation
│   ├── query_evaluator.py        # Evaluates queries and answers
│   └── write_answer.py           # Agent with VLM that analyzes the image and audio and generates answer based on user query. 
├── blueprints/                   # Flask Blueprints for managing routing logic
│   └── generate.py               # API Endpoints for performing VQA and using OD/OCR tools
├── config/                       # Configuration files for different environments
│   ├── __init__.py               # Config package init
│   ├── development_config.py     # Development settings
│   └── production_config.py      # Production settings
├── helper/                       # Helper utilities (audio, etc.)
│   └── audio.py                  # Audio file conversion and transcription
├── prompts/                      # Prompt templates for LLMs
│   ├── system_message.py         # System prompt templates
│   └── user_prompts.py           # User prompt templates
├── tools/                        # Vision and OCR tools
│   ├── ocr.py                    # OCR utility functions
│   └── od.py                     # Object detection utility functions
├── utils/                        # Utility functions and schemas
│   ├── logger.py                 # Logging functionality
│   └── schemas.py                # Structured Output schemas
├── workflows/                    # Multi-Agentic Workflow scripts for VQA using LangGraph
│   └── vqa_workflow.py           # Main VQA workflow logic
└── requirements.txt              # Python dependencies

Explanation of Key Folders and Files

app.py

Main entry point for running the AgenticVQA application. Handles initialization and routing.

agents/

Contains agent logic for answering VQA queries and evaluating responses.

  • florence_agent.py: Implements the Florence VQA agent.
  • query_evaluator.py: Evaluates the quality and correctness of answers.
  • write_answer.py: Handles writing answers to logs or files.

blueprints/

Flask Blueprints for managing routing logic and API endpoints.

  • generate.py: API Endpoints for performing VQA and using OD/OCR tools

config/

Configuration files for different environments.

  • development_config.py: Settings for development.
  • production_config.py: Settings for production.

helper/

Helper utilities for audio and other tasks.

  • audio.py: Functions for audio file conversion and speech-to-text transcription.

prompts/

Prompt templates for LLMs.

  • system_message.py: System-level prompt templates.
  • user_prompts.py: User-level prompt templates.

tools/

Vision and OCR tools.

  • ocr.py: Optical Character Recognition utility.
  • od.py: Object Detection utility.

utils/

General utility functions and data schemas.

  • logger.py: Logging setup and helpers.
  • schemas.py: Data validation schemas.

workflows/

Workflow scripts for orchestrating VQA tasks.

  • vqa_workflow.py: Main workflow for VQA pipeline.

requirements.txt

Lists all Python dependencies required for the project.

Usage

  1. Prepare your images and audio files in the workspace.
  2. Run app.py or use the provided notebooks in evaluations/ to start VQA tasks.
  3. Use the helper scripts for audio transcription and database management as needed.
  4. Customize prompts and workflows for your specific use case.

Contributing

Feel free to open issues or submit pull requests for improvements or new features.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages