Skip to content

sonpham-org/arc-solver

Repository files navigation

ARC Solver

This repository contains tools for solving Abstract Reasoning Corpus (ARC) tasks using LangGraph-based AI agents.

Main Scripts

1. run_langgraph_agent.py

Purpose: This is the main script for running a multi-solution LangGraph agent on ARC tasks. It uses multiple AI models to generate, refine, and fuse solution strategies through iterative reasoning, code generation, and execution.

What it does:

  • Runs AI agents to solve ARC challenges using configurable language models (GPT-4, Gemini, Llama, Qwen, etc.)
  • Supports both single task testing and batch processing with parallel workers
  • Generates multiple solution attempts per task through iterative refinement and fusion
  • Stores results in structured output directories under output/output_agent/
  • Supports resuming previous runs and evaluating existing solutions
  • Optional features include RAG hints, visual cues, and parallel evaluation

How to run:

# Basic usage - runs with default configuration
python run_langgraph_agent.py

# The script is configured via variables at the top of the file:
# - REASONING_MODEL: Model for reasoning & reflection
# - CODING_MODEL: Model for code generation
# - MODE: "single" or "batch"
# - NUM_TASKS: Number of tasks to process in batch mode
# - RESUME_RUN: Set to "latest" or specific run ID to resume

Key configuration variables (edit in the script):

  • REASONING_MODEL: Choose your reasoning model (e.g., "gemini-2.5-flash", "gpt-4o-mini")
  • CODING_MODEL: Choose your coding model
  • MODE: "single" for one task, "batch" for multiple tasks
  • NUM_TASKS: Number of tasks to process (in batch mode)
  • NUM_WORKERS: Number of parallel workers for batch processing
  • RESUME_RUN: Resume a previous run (set to run ID or "latest")
  • EVALUATE_ONLY: Only evaluate existing solutions without generating new ones

2. arc_visualizer.py

Purpose: A Flask-based web application for visualizing and inspecting ARC task solutions generated by the agent.

What it does:

  • Provides a web interface to browse all runs stored in output/output_agent/
  • Displays run summaries with accuracy statistics
  • Shows detailed visualizations of input/output grids for each task
  • Highlights differences between expected and predicted outputs
  • Allows inspection of individual solution attempts and their reasoning traces

How to run:

# Install Flask if not already installed
pip install flask

# Start the visualizer
python arc_visualizer.py

# Open your browser to http://localhost:5000

Features:

  • Browse all completed runs sorted by date (newest first)
  • View task-level results with color-coded grid visualizations
  • Compare expected vs. predicted outputs with diff highlighting
  • Inspect solution code and reasoning for each attempt

Requirements

Install dependencies:

pip install -r requirements.txt

Key dependencies include:

  • langchain & langgraph for agent orchestration
  • openai, anthropic, google-generativeai for LLM providers
  • flask for visualization
  • qdrant-client for optional RAG features

Workflow

  1. Run the agent using run_langgraph_agent.py to generate solutions
  2. Visualize results using arc_visualizer.py to inspect and analyze the outputs
  3. Iterate by adjusting configuration parameters and re-running

Output Structure

Results are saved to output/output_agent/<timestamp>/:

  • Each task has its own subdirectory with detailed JSON output
  • params.json: Run configuration parameters
  • training_task_ids.txt / evaluation_task_ids.txt: Lists of processed tasks
  • Individual task folders contain solution attempts, code, and reasoning traces

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors