This repository contains tools for solving Abstract Reasoning Corpus (ARC) tasks using LangGraph-based AI agents.
Purpose: This is the main script for running a multi-solution LangGraph agent on ARC tasks. It uses multiple AI models to generate, refine, and fuse solution strategies through iterative reasoning, code generation, and execution.
What it does:
- Runs AI agents to solve ARC challenges using configurable language models (GPT-4, Gemini, Llama, Qwen, etc.)
- Supports both single task testing and batch processing with parallel workers
- Generates multiple solution attempts per task through iterative refinement and fusion
- Stores results in structured output directories under
output/output_agent/ - Supports resuming previous runs and evaluating existing solutions
- Optional features include RAG hints, visual cues, and parallel evaluation
How to run:
# Basic usage - runs with default configuration
python run_langgraph_agent.py
# The script is configured via variables at the top of the file:
# - REASONING_MODEL: Model for reasoning & reflection
# - CODING_MODEL: Model for code generation
# - MODE: "single" or "batch"
# - NUM_TASKS: Number of tasks to process in batch mode
# - RESUME_RUN: Set to "latest" or specific run ID to resumeKey configuration variables (edit in the script):
REASONING_MODEL: Choose your reasoning model (e.g., "gemini-2.5-flash", "gpt-4o-mini")CODING_MODEL: Choose your coding modelMODE: "single" for one task, "batch" for multiple tasksNUM_TASKS: Number of tasks to process (in batch mode)NUM_WORKERS: Number of parallel workers for batch processingRESUME_RUN: Resume a previous run (set to run ID or "latest")EVALUATE_ONLY: Only evaluate existing solutions without generating new ones
Purpose: A Flask-based web application for visualizing and inspecting ARC task solutions generated by the agent.
What it does:
- Provides a web interface to browse all runs stored in
output/output_agent/ - Displays run summaries with accuracy statistics
- Shows detailed visualizations of input/output grids for each task
- Highlights differences between expected and predicted outputs
- Allows inspection of individual solution attempts and their reasoning traces
How to run:
# Install Flask if not already installed
pip install flask
# Start the visualizer
python arc_visualizer.py
# Open your browser to http://localhost:5000Features:
- Browse all completed runs sorted by date (newest first)
- View task-level results with color-coded grid visualizations
- Compare expected vs. predicted outputs with diff highlighting
- Inspect solution code and reasoning for each attempt
Install dependencies:
pip install -r requirements.txtKey dependencies include:
- langchain & langgraph for agent orchestration
- openai, anthropic, google-generativeai for LLM providers
- flask for visualization
- qdrant-client for optional RAG features
- Run the agent using
run_langgraph_agent.pyto generate solutions - Visualize results using
arc_visualizer.pyto inspect and analyze the outputs - Iterate by adjusting configuration parameters and re-running
Results are saved to output/output_agent/<timestamp>/:
- Each task has its own subdirectory with detailed JSON output
params.json: Run configuration parameterstraining_task_ids.txt/evaluation_task_ids.txt: Lists of processed tasks- Individual task folders contain solution attempts, code, and reasoning traces