Skip to content

akima6/NOVAGEN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NOVAGEN: AI-Driven Material Discovery Pipeline

NOVAGEN is an end-to-end generative AI pipeline designed to discover novel, thermodynamically stable, and functionally specific semiconductor materials. By utilizing a "Curriculum Learning" approach, the system progresses from understanding basic crystallographic geometry to predicting complex quantum electronic properties.

🧠 System Architecture Overview

The project is divided into three major stages: Generative Modeling, Multi-Phase Training, and Industrial Deployment. It bridges the gap between deep learning and materials engineering by ensuring all generated materials are chemically valid, physically stable, and synthesizable.

1. The Core Generator (The Brain)

Powered by CrystalFormer, a causal (decoder-only) Transformer adapted for crystallographic data.

  • Autoregressive Generation: Predicts elements, Wyckoff positions, and 3D fractional coordinates atom-by-atom.
  • Symmetry-Aware: Enforces Space Group rules before any physics are applied.
  • Lattice Bias Head: A learnable scalar parameter that dynamically optimizes the unit cell volume.
  • Continuous Coordinates: Uses a Von Mises Mixture Density Network (MDN) to handle periodic boundary conditions.

2. The Training Curriculum (Evolutionary Loop)

The model is fine-tuned using Policy Gradient Reinforcement Learning (PPO) via the Adam Optimizer, progressing through three distinct phases:

  • Phase I: Geometric Stabilization (Spatial): Penalizes atomic overlaps and singularities. Teaches the model how to pack atoms into a solid.
  • Phase II: Thermodynamic Physics (Stability): Minimizes the "Free Energy" of the crystal using the CHGNet Graph Neural Network (GNN).
  • Phase III: Functional Properties (Lab-Grade): Targets specific electronic properties, such as a 2.8 eV Band Gap, using the MEGNet Oracle.

3. Deployment & Validation (The Factory)

High-throughput batch generation of crystal candidates, followed by rigorous deep-relaxation validation to output lab-ready .cif files.


📂 Repository Structure

Configuration & Initialization

  • config.yaml: The architectural blueprint defining Transformer layers, attention heads, embedding sizes, and vocabulary limits.
  • generator_service.py: Initializes the Transformer instance from the config and injects the pre-trained state dictionary (.pt checkpoint) for inference and training.

Training Scripts

  • train_phase1_spatial.py: RL script for geometric stability. Uses low learning rates and gradient accumulation.
  • train_phase2_physics.py: RL script integrating CHGNet to reward low-energy states. Includes active teaching filters (e.g., penalties for >40 atoms).
  • train_phase3_properties.py: RL script targeting specific band gaps using the CPU-bound MEGNet Oracle.

Reward Engines & Physics Oracles

  • sentinel.py / reward_phase1.py: The "Geometric Bouncer" that quickly rejects physically impossible structures.
  • product_relaxer.py: Wraps CHGNet to perform GPU-accelerated atomic relaxation and force prediction.
  • product_oracle.py: Wraps MEGNet to predict final electronic properties (Band Gap) in milliseconds.

Deployment (Inference)

  • generate_crystals.py: The high-throughput "Factory" script. Features memory purging, timeout fail-safes, and fast-density filtering to mass-produce 5,000+ candidate crystals overnight.
  • final_relaxation.py: The "QA Lab." Subjects surviving candidates to 500-step deep relaxation and re-verifies band gaps to confirm ground-state stability.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages