Skip to content

UTAustin-SwarmLab/Neuro-Symbolic-Agent-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neuro-Symbolic Agent Challenge

Overview

The Neuro-Symbolic Agent Challenge aims to foster research in video-based agent tasks by encouraging the development of datasets, benchmarks, and evaluation frameworks analogous to LLM-based agent tools. This challenge provides an initial dataset and evaluation metrics to serve as a foundation for future research.

Challenge Goals

The primary objective is to design and evaluate video agents that leverage deep-learning and neuro-symbolic methods to process videos and respond to complex natural language queries. The three specified tasks for a video agent are video search, tool calling, and video generation.

1. Video Search

Predicts the temporal span of a video segment corresponding to a query.

Requirements:

  • Parsing Queries: Extract objects, events, and temporal logic.
  • Perception: Utilize models to detect relevant elements.
  • Prediction: Identify spans with high probability.

2. Tool Calling

Determines the correct tool and executes it with appropriate inputs.

Requirements:

  • Tool Selection: Identify the right API/tool for a given span.
  • Tool Invocation: Provide tool inputs based on the detected video clip.

3. Video Generation

Synthesizes videos based on extended natural language queries.

Requirements:

  • Synthesis: Generate novel video sequences.
  • Evaluation: Ensure high visual and semantic quality.
  • Improvement & Editing: Iteratively refine videos with neuro-symbolic feedback.

Datasets

  • Accuracy of Events: F1-score comparing predicted and ground-truth spans.
  • Tool Calling: Accuracy of tool selection and input specification.
  • Synthetic Video Quality: Measured via VBench (visual fidelity) and Neus-V (temporal coherence).

Evaluation Metrics

  • Accuracy of Events: F1-score comparing predicted and ground-truth spans.
  • Tool Calling: Accuracy of tool selection and input specification.
  • Synthetic Video Quality: Measured via VBench (visual fidelity) and Neus-V (temporal coherence).

Get Started

  1. Clone the repository:
    git clone https://github.com/UTAustin-SwarmLab/Neuro-Symbolic-Agent-Challenge.git
  2. Install dependencies:
    pip install -r requirements.txt
  3. Install pkl files for NeuS-V:
    wget https://raw.githubusercontent.com/UTAustin-SwarmLab/NeuS-V/main/assets/distributions.pkl -O assets/distributions.pkl
    
  4. Download the TLV Dataset from Hugging Face.
  5. Explore the dataset and benchmarks.

Resources

About

A Challenge to Build Neuro-Symbolic Video Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published