The Neuro-Symbolic Agent Challenge aims to foster research in video-based agent tasks by encouraging the development of datasets, benchmarks, and evaluation frameworks analogous to LLM-based agent tools. This challenge provides an initial dataset and evaluation metrics to serve as a foundation for future research.
The primary objective is to design and evaluate video agents that leverage deep-learning and neuro-symbolic methods to process videos and respond to complex natural language queries. The three specified tasks for a video agent are video search, tool calling, and video generation.
Predicts the temporal span of a video segment corresponding to a query.
- Parsing Queries: Extract objects, events, and temporal logic.
- Perception: Utilize models to detect relevant elements.
- Prediction: Identify spans with high probability.
Determines the correct tool and executes it with appropriate inputs.
- Tool Selection: Identify the right API/tool for a given span.
- Tool Invocation: Provide tool inputs based on the detected video clip.
Synthesizes videos based on extended natural language queries.
- Synthesis: Generate novel video sequences.
- Evaluation: Ensure high visual and semantic quality.
- Improvement & Editing: Iteratively refine videos with neuro-symbolic feedback.
- Accuracy of Events: F1-score comparing predicted and ground-truth spans.
- Tool Calling: Accuracy of tool selection and input specification.
- Synthetic Video Quality: Measured via VBench (visual fidelity) and Neus-V (temporal coherence).
- Accuracy of Events: F1-score comparing predicted and ground-truth spans.
- Tool Calling: Accuracy of tool selection and input specification.
- Synthetic Video Quality: Measured via VBench (visual fidelity) and Neus-V (temporal coherence).
- Clone the repository:
git clone https://github.com/UTAustin-SwarmLab/Neuro-Symbolic-Agent-Challenge.git
- Install dependencies:
pip install -r requirements.txt
- Install pkl files for NeuS-V:
wget https://raw.githubusercontent.com/UTAustin-SwarmLab/NeuS-V/main/assets/distributions.pkl -O assets/distributions.pkl
- Download the TLV Dataset from Hugging Face.
- Explore the dataset and benchmarks.
- Dataset: TLV Dataset
- Metrics: NeuS-V