NOVAGEN: AI-Driven Material Discovery Pipeline

NOVAGEN is an end-to-end generative AI pipeline designed to discover novel, thermodynamically stable, and functionally specific semiconductor materials. By utilizing a "Curriculum Learning" approach, the system progresses from understanding basic crystallographic geometry to predicting complex quantum electronic properties.

🧠 System Architecture Overview

The project is divided into three major stages: Generative Modeling, Multi-Phase Training, and Industrial Deployment. It bridges the gap between deep learning and materials engineering by ensuring all generated materials are chemically valid, physically stable, and synthesizable.

1. The Core Generator (The Brain)

Powered by CrystalFormer, a causal (decoder-only) Transformer adapted for crystallographic data.

Autoregressive Generation: Predicts elements, Wyckoff positions, and 3D fractional coordinates atom-by-atom.
Symmetry-Aware: Enforces Space Group rules before any physics are applied.
Lattice Bias Head: A learnable scalar parameter that dynamically optimizes the unit cell volume.
Continuous Coordinates: Uses a Von Mises Mixture Density Network (MDN) to handle periodic boundary conditions.

2. The Training Curriculum (Evolutionary Loop)

The model is fine-tuned using Policy Gradient Reinforcement Learning (PPO) via the Adam Optimizer, progressing through three distinct phases:

Phase I: Geometric Stabilization (Spatial): Penalizes atomic overlaps and singularities. Teaches the model how to pack atoms into a solid.
Phase II: Thermodynamic Physics (Stability): Minimizes the "Free Energy" of the crystal using the CHGNet Graph Neural Network (GNN).
Phase III: Functional Properties (Lab-Grade): Targets specific electronic properties, such as a 2.8 eV Band Gap, using the MEGNet Oracle.

3. Deployment & Validation (The Factory)

High-throughput batch generation of crystal candidates, followed by rigorous deep-relaxation validation to output lab-ready .cif files.

📂 Repository Structure

Configuration & Initialization

config.yaml: The architectural blueprint defining Transformer layers, attention heads, embedding sizes, and vocabulary limits.
generator_service.py: Initializes the Transformer instance from the config and injects the pre-trained state dictionary (.pt checkpoint) for inference and training.

Training Scripts

train_phase1_spatial.py: RL script for geometric stability. Uses low learning rates and gradient accumulation.
train_phase2_physics.py: RL script integrating CHGNet to reward low-energy states. Includes active teaching filters (e.g., penalties for >40 atoms).
train_phase3_properties.py: RL script targeting specific band gaps using the CPU-bound MEGNet Oracle.

Reward Engines & Physics Oracles

sentinel.py / reward_phase1.py: The "Geometric Bouncer" that quickly rejects physically impossible structures.
product_relaxer.py: Wraps CHGNet to perform GPU-accelerated atomic relaxation and force prediction.
product_oracle.py: Wraps MEGNet to predict final electronic properties (Band Gap) in milliseconds.

Deployment (Inference)

generate_crystals.py: The high-throughput "Factory" script. Features memory purging, timeout fail-safes, and fast-density filtering to mass-produce 5,000+ candidate crystals overnight.
final_relaxation.py: The "QA Lab." Subjects surviving candidates to 500-step deep relaxation and re-verifies band gaps to confirm ground-state stability.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CrystalFormer		CrystalFormer
final_results		final_results
generated results		generated results
pretrained_model		pretrained_model
reports		reports
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
check_novelty.py		check_novelty.py
chgnet_0.3.0_weights.pth.tar		chgnet_0.3.0_weights.pth.tar
cif_view.py		cif_view.py
final_relaxation.py		final_relaxation.py
generate_crystals.py		generate_crystals.py
generator_service.py		generator_service.py
novagen_freeze.yml		novagen_freeze.yml
product_oracle.py		product_oracle.py
product_relaxer.py		product_relaxer.py
product_relaxer_m3gnet.py		product_relaxer_m3gnet.py
requirement.txt		requirement.txt
reward_phase1.py		reward_phase1.py
reward_phase2.py		reward_phase2.py
sentinel.py		sentinel.py
test.py		test.py
train_phase1_spatial.py		train_phase1_spatial.py
train_phase2_physics.py		train_phase2_physics.py
train_phase3_properties.py		train_phase3_properties.py
visualize_elite.py		visualize_elite.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOVAGEN: AI-Driven Material Discovery Pipeline

🧠 System Architecture Overview

1. The Core Generator (The Brain)

2. The Training Curriculum (Evolutionary Loop)

3. Deployment & Validation (The Factory)

📂 Repository Structure

Configuration & Initialization

Training Scripts

Reward Engines & Physics Oracles

Deployment (Inference)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

akima6/NOVAGEN

Folders and files

Latest commit

History

Repository files navigation

NOVAGEN: AI-Driven Material Discovery Pipeline

🧠 System Architecture Overview

1. The Core Generator (The Brain)

2. The Training Curriculum (Evolutionary Loop)

3. Deployment & Validation (The Factory)

📂 Repository Structure

Configuration & Initialization

Training Scripts

Reward Engines & Physics Oracles

Deployment (Inference)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages