Skip to content

dataopsnick/autoformalization

Repository files navigation

arXiv Auto-Formalizer and Lean 3 Proof Checker

Python 3.8+ License: MIT

This repository contains an experimental pipeline for automatically formalizing mathematical proofs from arXiv papers into the Lean 3 theorem prover language and subsequently verifying the generated code. The process leverages the Google Gemini API for the natural language to formal code translation.

The project is split into two main components:

  1. Formalizer (arxiv_formalizer_public_release.ipynb): Scans new arXiv submissions, extracts LaTeX source code, and uses an LLM to generate Lean 3 proofs in a structured JSON format.
  2. Verifier (lean3proofchecker.ipynb): Sets up a complete Lean 3 and mathlib environment, and then compiles the generated Lean 3 code to check for syntactic and logical correctness.

Both scripts are designed to be run in a Google Colab environment.

Workflow Overview

The end-to-end workflow is as follows:

graph LR
    A[arXiv New Submissions] -->|Scrape Identifiers| B(arxiv_formalizer_public_release.ipynb);
    B -->|Download LaTeX Source| C{Gemini API};
    C -->|Auto-Formalize Proofs| B;
    B -->|Save Output| D[results.pkl (JSON with Lean 3)];
    D -->|Manual Copy/Paste| E(lean3proofchecker.ipynb);
    E -->|Setup Lean 3 + Mathlib| F[Lean 3 Project];
    E -->|Compile Proofs| F;
    F -->|Verification Results| G[Console Output: Success/Failure];
Loading

Files in this Repository

1. arxiv_formalizer_public_release.ipynb

Open In Colab

This script automates the process of finding and formalizing new mathematical papers.

Features:

  • Scrapes new submission identifiers from over 30 different mathematics categories on arXiv.
  • Downloads the .tar.gz source for each paper and extracts .tex files.
  • Uses a highly specialized prompt with a "child savant" persona to instruct the Gemini API to formalize proofs.
  • Requests a structured JSON output containing separate entries for theorems, lemmas, propositions, etc., each with its natural language text and a Lean 3 proof.
  • Saves the raw LaTeX inputs (inputs.pkl) and the generated JSON outputs (results.pkl) for later use.

2. lean3proofchecker.ipynb

Open In Colab

This script provides a robust environment to verify the Lean 3 code generated by the formalizer.

Features:

  • Automated Setup: Installs elan (the Lean toolchain manager), sets the correct Lean 3 version (3.42.1), and installs mathlibtools.
  • Project Management: Programmatically creates a valid Lean 3 project structure.
  • Best Practices: Uses leanproject get-mathlib-cache to download pre-compiled mathlib object files (.olean), dramatically speeding up compilation time.
  • Isolated Verification: Checks each proof in a temporary file within the configured project, ensuring dependencies on mathlib are correctly resolved.
  • Clear Reporting: Provides a clean, color-coded summary of which proofs passed and which failed, along with compiler errors for the failed proofs.

Setup and Usage

This pipeline is designed for Google Colab.

Prerequisites

  • A Google Account to use Colab.
  • A Google Gemini API Key. You can get one from Google AI Studio.

Step 1: Generate Formalizations

  1. Open arxiv_formalizer_public_release.ipynb in Google Colab using the badge above.
  2. Set up your Gemini API key. In the Colab notebook, go to the "Secrets" tab (key icon on the left) and create a new secret named GEMINI_API_KEY. Paste your API key as the value.
  3. Run all cells in the notebook (Runtime -> Run all).
  4. The script will begin scraping arXiv, processing papers, and calling the Gemini API. This may take a significant amount of time.
  5. When finished, the notebook will have generated inputs.pkl and results.pkl. You can download these files, but more importantly, the final cell will print the contents of the results dictionary. This is what you'll need for the next step.

Step 2: Verify the Generated Proofs

  1. Open lean3proofchecker.ipynb in Google Colab.

  2. Copy a JSON object from the output of the previous step. For example, if the formalizer script printed {'2401.12345': {'theorems': [...]}}, you would copy the entire {'theorems': [...]} dictionary.

  3. In the lean3proofchecker.ipynb notebook, find the variable JSON_DATA and replace its content with the JSON you just copied.

    # Before
    JSON_DATA = r"""
    {
      "conjectures": [],
      "theorems": [ ... old example data ... ]
    }
    """
    
    # After
    JSON_DATA = r"""
    {
      "theorems": [
        {
          "1": {
            "text": "The actual theorem text from the paper...",
            "proof": "import algebra.group.basic\n\ntheorem my_theorem : 1 + 1 = 2 := rfl"
          }
        }
      ]
    }
    """
  4. Run all cells in the notebook.

  5. The first cells will execute a one-time setup of the Lean 3 environment, which may take a few minutes as it downloads the mathlib cache. Subsequent runs will be much faster.

  6. The final cells will execute the proof checks and print a summary report to the console, indicating which proofs were successfully compiled by Lean 3.

Limitations and Future Work

  • LLM Accuracy: The quality of the formalization is entirely dependent on the LLM. The generated Lean 3 code may contain errors, use non-existent tactics, or rely on sorry placeholders. This is an experimental tool, not a guaranteed proof generator.
  • Manual Data Transfer: The workflow currently requires a manual copy-paste step between the two scripts. Future work could involve reading directly from the results.pkl file.
  • Environment Dependency: The scripts are tightly coupled to the Google Colab environment for dependencies and secrets management.
  • Complex LaTeX: The scripts may struggle with complex LaTeX projects involving multiple files, custom macros, or non-standard document structures.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Generating lean 3 proof assistant code from arxiv.org identifiers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published