AgenticVQA

AgenticVQA is a Vision Question Answering (VQA) system that leverages LLMs and multimodal models to answer questions about images, with support for audio transcription and multiagentic workflows.

Installation

Prerequisites

Python 3.12+
pip
(Optional) ffmpeg (for audio conversion)

Setup

Clone the repository:

git clone https://github.com/KaramSahoo/AgenticVQA
cd AgenticVQA

Install dependencies:

pip install -r requirements.txt

If you use audio features, install ffmpeg:

# Windows
choco install ffmpeg
# MacOS
brew install ffmpeg
# Linux
sudo apt-get install ffmpeg

Configure API keys for OpenAI, Anthropic, LangSmith, etc. as environment variables or in your code.

Folder Structure

AgenticVQA/
├── answer_log.csv                # Log of answers generated by the system
├── app.py                        # Main application entry point
├── agents/                       # Core agent logic for VQA and evaluation
│   ├── florence_agent.py         # Florence VQA agent implementation
│   ├── query_evaluator.py        # Evaluates queries and answers
│   └── write_answer.py           # Agent with VLM that analyzes the image and audio and generates answer based on user query. 
├── blueprints/                   # Flask Blueprints for managing routing logic
│   └── generate.py               # API Endpoints for performing VQA and using OD/OCR tools
├── config/                       # Configuration files for different environments
│   ├── __init__.py               # Config package init
│   ├── development_config.py     # Development settings
│   └── production_config.py      # Production settings
├── helper/                       # Helper utilities (audio, etc.)
│   └── audio.py                  # Audio file conversion and transcription
├── prompts/                      # Prompt templates for LLMs
│   ├── system_message.py         # System prompt templates
│   └── user_prompts.py           # User prompt templates
├── tools/                        # Vision and OCR tools
│   ├── ocr.py                    # OCR utility functions
│   └── od.py                     # Object detection utility functions
├── utils/                        # Utility functions and schemas
│   ├── logger.py                 # Logging functionality
│   └── schemas.py                # Structured Output schemas
├── workflows/                    # Multi-Agentic Workflow scripts for VQA using LangGraph
│   └── vqa_workflow.py           # Main VQA workflow logic
└── requirements.txt              # Python dependencies

Explanation of Key Folders and Files

`app.py`

Main entry point for running the AgenticVQA application. Handles initialization and routing.

`agents/`

Contains agent logic for answering VQA queries and evaluating responses.

florence_agent.py: Implements the Florence VQA agent.
query_evaluator.py: Evaluates the quality and correctness of answers.
write_answer.py: Handles writing answers to logs or files.

`blueprints/`

Flask Blueprints for managing routing logic and API endpoints.

generate.py: API Endpoints for performing VQA and using OD/OCR tools

`config/`

Configuration files for different environments.

development_config.py: Settings for development.
production_config.py: Settings for production.

`helper/`

Helper utilities for audio and other tasks.

audio.py: Functions for audio file conversion and speech-to-text transcription.

`prompts/`

Prompt templates for LLMs.

system_message.py: System-level prompt templates.
user_prompts.py: User-level prompt templates.

`tools/`

Vision and OCR tools.

ocr.py: Optical Character Recognition utility.
od.py: Object Detection utility.

`utils/`

General utility functions and data schemas.

logger.py: Logging setup and helpers.
schemas.py: Data validation schemas.

`workflows/`

Workflow scripts for orchestrating VQA tasks.

vqa_workflow.py: Main workflow for VQA pipeline.

`requirements.txt`

Lists all Python dependencies required for the project.

Usage

Prepare your images and audio files in the workspace.
Run app.py or use the provided notebooks in evaluations/ to start VQA tasks.
Use the helper scripts for audio transcription and database management as needed.
Customize prompts and workflows for your specific use case.

Contributing

Feel free to open issues or submit pull requests for improvements or new features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgenticVQA

Installation

Prerequisites

Setup

Folder Structure

Explanation of Key Folders and Files

`app.py`

`agents/`

`blueprints/`

`config/`

`helper/`

`prompts/`

`tools/`

`utils/`

`workflows/`

`requirements.txt`

Usage

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
agents		agents
blueprints		blueprints
config		config
helper		helper
prompts		prompts
tools		tools
utils		utils
workflows		workflows
.gitignore		.gitignore
README.md		README.md
app.py		app.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt

KaramSahoo/VR-Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

AgenticVQA

Installation

Prerequisites

Setup

Folder Structure

Explanation of Key Folders and Files

app.py

agents/

blueprints/

config/

helper/

prompts/

tools/

utils/

workflows/

requirements.txt

Usage

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`app.py`

`agents/`

`blueprints/`

`config/`

`helper/`

`prompts/`

`tools/`

`utils/`

`workflows/`

`requirements.txt`

Packages