K12-KGraph

A Curriculum-Aligned Knowledge Graph for Benchmarking & Training Educational LLMs

_{Built from the official People's Education Press (PEP) Chinese K–12 textbooks,
K12-KGraph aligns the same scientific concept across definition, formula, experiment,
exercise, structural location, and relational neighborhood.}

🌐 Explore the interactive project page →

🌟 Why K12-KGraph?

Modern LLMs can answer "what is the Pythagorean theorem?" but struggle with curriculum cognition — the structured understanding of:

🧭 What are the prerequisites of a concept?
🔬 Which experiment verifies it?
📝 Which exercises test it?
📚 Where does it live in the textbook?
🕸 What are its taxonomic and relational neighbors?

K12-KGraph is the first open, multi-subject, official-textbook-grounded knowledge graph that explicitly aligns all five dimensions around each STEM concept, yielding two ready-to-use AI assets:

	K12-Bench	K12-Train
Size	23,640 multi-select questions	2,267 instruction–response pairs
Purpose	Evaluate structural curriculum cognition	Teach it via KG-guided SFT
Task families / sources	Ground · Prereq · Neighbor · Evidence · Locate	Node-grounded + Edge-grounded + Deterministic templates
Headline result	Gemini-3-Flash reaches only 57.1% EM	Beats 8 mainstream SFT corpora on GaokaoBench & EduEval under a strict 2,300-sample budget

📊 Leaderboard Snapshot (K12-Bench, zero-shot)

Instance-level macro F1 and exact match, in %.

Model	Overall EM	Overall F1
Random guess baseline	6.7	36.4
Meta-LLaMA-3-8B-Instruct	7.2	52.6
GLM-4.7-Flash	31.7	63.9
GPT-4o	31.1	65.9
Qwen3-32B	42.6	69.5
Gemma-4-31B-IT	46.4	69.5
GPT-5.2	42.8	68.0
Gemini-2.5-Flash	48.3	66.7
Gemini-3-Flash	57.1	73.0

Even the strongest proprietary model leaves > 40% of items unsolved on Prereq and Neighbor — the tasks requiring directed, structural reasoning. See the project page for the full 5-task breakdown.

🗺️ What's in this Repository?

K12-Dataset/
├── src/
│   ├── kg/          # Knowledge-graph construction pipeline
│   ├── benchmark/   # K12-Bench generation from graph queries
│   ├── sft_qa/      # K12-Train synthesis (node & edge grounded)
│   └── utils/       # Shared config / LLM client / IO
├── eval/            # Multiple-choice evaluation runner (OpenAI / vLLM)
├── config/          # Default pipeline configuration
├── demo/            # Trimmed JSON/JSONL samples
├── books.yaml       # Book registry
├── docs/img/        # README figures
└── requirements.txt

Pipeline flow:

PDF textbooks ─► MinerU parsing ─► Section split ─► GPT-5.2 schema-constrained extraction
               ─► Hierarchical merge (book → subject → global) ─► DAG validation + expert review
               ─► K12-KGraph ─► K12-Bench (queries) + K12-Train (QA synthesis)

🚀 Quick Start

1. Install

git clone https://github.com/haolpku/K12-Dataset.git
cd K12-Dataset
pip install -r requirements.txt

If you will run the graph pipeline from PDFs, also install MinerU and make magic-pdf callable from the shell (command name configurable via config/default.yaml).

2. Load the released dataset

from datasets import load_dataset

kg    = load_dataset("lhpku20010120/K12-KGraph", split="train")
bench = load_dataset("lhpku20010120/K12-KGraph", name="bench", split="test")
train = load_dataset("lhpku20010120/K12-KGraph", name="train", split="train")

3. Build the graph from scratch

cp config/.env.example config/.env         # add your OPENAI_API_KEY etc.
python src/kg/run_pipeline.py \
    --config config/default.yaml \
    --filter-prefix <YourBookPrefix>       # e.g. math_7a_rjb

4. Derive Bench and SFT data

python src/benchmark/run_pipeline.py --help
python src/sft_qa/run_pipeline.py   --help

5. Evaluate a model on K12-Bench

cp eval/configs/.env.example eval/configs/.env
chmod +x eval/run.sh
./eval/run.sh <model-config-stem>          # eval/configs/models/<stem>.yaml

🧱 Schema at a Glance

7 node types — Book · Chapter · Section · Concept · Skill · Experiment · Exercise

9 edge types — is_a · prerequisites_for · relates_to · verifies · tests_concept · tests_skill · appears_in · leads_to · is_part_of

Every Concept carries name, definition, importance, and optional formula, aliases, examples. Every Experiment carries instruments, is_student, process, phenomena, conclusion. Full schema and attribute specification in docs/schema.md (coming soon) or on the project page.

A concrete example: how the same prerequisites_for subgraph yields a K12-Bench item (A) and a K12-Train QA pair (B)

📚 Dataset Composition

K12-Bench distribution across subjects, task families, and difficulty.

Subject	Books	Concepts	Skills	Experiments	Exercises
Mathematics	23	1,475	428	0	471
Physics	9	1,154	197	220	186
Chemistry	7	2,302	451	309	270
Biology	9	1,648	288	123	244
Total	48	6,579	1,364	652	1,171

🧪 Quality Assurance

Fleiss' κ = 0.84 overall, from 12 subject-qualified expert annotators (κ by relation: is_a 0.91, prerequisites_for 0.82, relates_to 0.69, verifies 0.88)
Automatic DAG validation on is_a and prerequisites_for subgraphs
Per-edge evidence field linking back to textbook source text for auditability
98.4% stratified K12-Bench items verified as "fully correct" in a 3-expert spot-check

🌈 Explore Interactively

Want to browse nodes, sample bench items, or inspect the training data without cloning the repo? The companion project page offers a rich interactive view:

🤝 Contribute

Contributions are welcome! We particularly appreciate:

🏫 Adding support for other textbook publishers (BNU, Jiangsu, etc.)
🧪 New task families that extend beyond the current 5
🐛 Bug reports and quality issues on existing graph edges (please cite the specific edge ID)
🌍 Translation of the schema/documentation into additional languages

Open an issue or pull request — GitHub Issues are monitored within 48 hours.

📖 Citation

If you find K12-KGraph useful in your research, please cite:

@misc{k12kgraph2026,
  title        = {K12-KGraph: A Curriculum-Aligned Knowledge Graph for
                  Benchmarking and Training Educational LLMs},
  author       = {Hao Liang and others},
  year         = {2026},
  howpublished = {Submitted to NeurIPS 2026 Evaluations and Datasets Track},
  url          = {https://github.com/haolpku/K12-Dataset}
}

📄 License

Dataset (graph, benchmark, training data): CC BY-NC-SA 4.0
Code (this repository): MIT

_{Made with care by the K12-KGraph team ·
Project Page ·
Dataset ·
中文 README}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
demo		demo
docs/img		docs/img
eval		eval
src		src
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-CODE		LICENSE-CODE
README.md		README.md
README_zh.md		README_zh.md
books.yaml		books.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K12-KGraph

🌟 Why K12-KGraph?

📊 Leaderboard Snapshot (K12-Bench, zero-shot)

🗺️ What's in this Repository?

🚀 Quick Start

1. Install

2. Load the released dataset

3. Build the graph from scratch

4. Derive Bench and SFT data

5. Evaluate a model on K12-Bench

🧱 Schema at a Glance

📚 Dataset Composition

🧪 Quality Assurance

🌈 Explore Interactively

🤝 Contribute

📖 Citation

📄 License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

K12-KGraph

🌟 Why K12-KGraph?

📊 Leaderboard Snapshot (K12-Bench, zero-shot)

🗺️ What's in this Repository?

🚀 Quick Start

1. Install

2. Load the released dataset

3. Build the graph from scratch

4. Derive Bench and SFT data

5. Evaluate a model on K12-Bench

🧱 Schema at a Glance

📚 Dataset Composition

🧪 Quality Assurance

🌈 Explore Interactively

🤝 Contribute

📖 Citation

📄 License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages