This repository contains an experimental pipeline for automatically formalizing mathematical proofs from arXiv papers into the Lean 3 theorem prover language and subsequently verifying the generated code. The process leverages the Google Gemini API for the natural language to formal code translation.
The project is split into two main components:
- Formalizer (
arxiv_formalizer_public_release.ipynb
): Scans new arXiv submissions, extracts LaTeX source code, and uses an LLM to generate Lean 3 proofs in a structured JSON format. - Verifier (
lean3proofchecker.ipynb
): Sets up a complete Lean 3 andmathlib
environment, and then compiles the generated Lean 3 code to check for syntactic and logical correctness.
Both scripts are designed to be run in a Google Colab environment.
The end-to-end workflow is as follows:
graph LR
A[arXiv New Submissions] -->|Scrape Identifiers| B(arxiv_formalizer_public_release.ipynb);
B -->|Download LaTeX Source| C{Gemini API};
C -->|Auto-Formalize Proofs| B;
B -->|Save Output| D[results.pkl (JSON with Lean 3)];
D -->|Manual Copy/Paste| E(lean3proofchecker.ipynb);
E -->|Setup Lean 3 + Mathlib| F[Lean 3 Project];
E -->|Compile Proofs| F;
F -->|Verification Results| G[Console Output: Success/Failure];
This script automates the process of finding and formalizing new mathematical papers.
Features:
- Scrapes new submission identifiers from over 30 different mathematics categories on arXiv.
- Downloads the
.tar.gz
source for each paper and extracts.tex
files. - Uses a highly specialized prompt with a "child savant" persona to instruct the Gemini API to formalize proofs.
- Requests a structured JSON output containing separate entries for theorems, lemmas, propositions, etc., each with its natural language text and a Lean 3 proof.
- Saves the raw LaTeX inputs (
inputs.pkl
) and the generated JSON outputs (results.pkl
) for later use.
This script provides a robust environment to verify the Lean 3 code generated by the formalizer.
Features:
- Automated Setup: Installs
elan
(the Lean toolchain manager), sets the correct Lean 3 version (3.42.1
), and installsmathlibtools
. - Project Management: Programmatically creates a valid Lean 3 project structure.
- Best Practices: Uses
leanproject get-mathlib-cache
to download pre-compiledmathlib
object files (.olean
), dramatically speeding up compilation time. - Isolated Verification: Checks each proof in a temporary file within the configured project, ensuring dependencies on
mathlib
are correctly resolved. - Clear Reporting: Provides a clean, color-coded summary of which proofs passed and which failed, along with compiler errors for the failed proofs.
This pipeline is designed for Google Colab.
- A Google Account to use Colab.
- A Google Gemini API Key. You can get one from Google AI Studio.
- Open
arxiv_formalizer_public_release.ipynb
in Google Colab using the badge above. - Set up your Gemini API key. In the Colab notebook, go to the "Secrets" tab (key icon on the left) and create a new secret named
GEMINI_API_KEY
. Paste your API key as the value. - Run all cells in the notebook (
Runtime -> Run all
). - The script will begin scraping arXiv, processing papers, and calling the Gemini API. This may take a significant amount of time.
- When finished, the notebook will have generated
inputs.pkl
andresults.pkl
. You can download these files, but more importantly, the final cell will print the contents of theresults
dictionary. This is what you'll need for the next step.
-
Open
lean3proofchecker.ipynb
in Google Colab. -
Copy a JSON object from the output of the previous step. For example, if the formalizer script printed
{'2401.12345': {'theorems': [...]}}
, you would copy the entire{'theorems': [...]}
dictionary. -
In the
lean3proofchecker.ipynb
notebook, find the variableJSON_DATA
and replace its content with the JSON you just copied.# Before JSON_DATA = r""" { "conjectures": [], "theorems": [ ... old example data ... ] } """ # After JSON_DATA = r""" { "theorems": [ { "1": { "text": "The actual theorem text from the paper...", "proof": "import algebra.group.basic\n\ntheorem my_theorem : 1 + 1 = 2 := rfl" } } ] } """
-
Run all cells in the notebook.
-
The first cells will execute a one-time setup of the Lean 3 environment, which may take a few minutes as it downloads the
mathlib
cache. Subsequent runs will be much faster. -
The final cells will execute the proof checks and print a summary report to the console, indicating which proofs were successfully compiled by Lean 3.
- LLM Accuracy: The quality of the formalization is entirely dependent on the LLM. The generated Lean 3 code may contain errors, use non-existent tactics, or rely on
sorry
placeholders. This is an experimental tool, not a guaranteed proof generator. - Manual Data Transfer: The workflow currently requires a manual copy-paste step between the two scripts. Future work could involve reading directly from the
results.pkl
file. - Environment Dependency: The scripts are tightly coupled to the Google Colab environment for dependencies and secrets management.
- Complex LaTeX: The scripts may struggle with complex LaTeX projects involving multiple files, custom macros, or non-standard document structures.
This project is licensed under the MIT License. See the LICENSE file for details.