Skip to content

haolpku/K12-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

K12-KGraph construction pipeline

K12-KGraph

A Curriculum-Aligned Knowledge Graph for Benchmarking & Training Educational LLMs

Dataset on Hugging Face Project Page Paper License

Nodes Edges Bench Train Subjects Stage

Built from the official People's Education Press (PEP) Chinese K–12 textbooks, K12-KGraph aligns the same scientific concept across definition, formula, experiment, exercise, structural location, and relational neighborhood.

🌐 Explore the interactive project page β†’


🌟 Why K12-KGraph?

Modern LLMs can answer "what is the Pythagorean theorem?" but struggle with curriculum cognition β€” the structured understanding of:

  • 🧭 What are the prerequisites of a concept?
  • πŸ”¬ Which experiment verifies it?
  • πŸ“ Which exercises test it?
  • πŸ“š Where does it live in the textbook?
  • πŸ•Έ What are its taxonomic and relational neighbors?

K12-KGraph is the first open, multi-subject, official-textbook-grounded knowledge graph that explicitly aligns all five dimensions around each STEM concept, yielding two ready-to-use AI assets:

K12-Bench K12-Train
Size 23,640 multi-select questions 2,267 instruction–response pairs
Purpose Evaluate structural curriculum cognition Teach it via KG-guided SFT
Task families / sources Ground Β· Prereq Β· Neighbor Β· Evidence Β· Locate Node-grounded + Edge-grounded + Deterministic templates
Headline result Gemini-3-Flash reaches only 57.1% EM Beats 8 mainstream SFT corpora on GaokaoBench & EduEval under a strict 2,300-sample budget

πŸ“Š Leaderboard Snapshot (K12-Bench, zero-shot)

Instance-level macro F1 and exact match, in %.

Model Overall EM Overall F1
Random guess baseline 6.7 36.4
Meta-LLaMA-3-8B-Instruct 7.2 52.6
GLM-4.7-Flash 31.7 63.9
GPT-4o 31.1 65.9
Qwen3-32B 42.6 69.5
Gemma-4-31B-IT 46.4 69.5
GPT-5.2 42.8 68.0
Gemini-2.5-Flash 48.3 66.7
Gemini-3-Flash 57.1 73.0

Even the strongest proprietary model leaves > 40% of items unsolved on Prereq and Neighbor β€” the tasks requiring directed, structural reasoning. See the project page for the full 5-task breakdown.


πŸ—ΊοΈ What's in this Repository?

K12-Dataset/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ kg/          # Knowledge-graph construction pipeline
β”‚   β”œβ”€β”€ benchmark/   # K12-Bench generation from graph queries
β”‚   β”œβ”€β”€ sft_qa/      # K12-Train synthesis (node & edge grounded)
β”‚   └── utils/       # Shared config / LLM client / IO
β”œβ”€β”€ eval/            # Multiple-choice evaluation runner (OpenAI / vLLM)
β”œβ”€β”€ config/          # Default pipeline configuration
β”œβ”€β”€ demo/            # Trimmed JSON/JSONL samples
β”œβ”€β”€ books.yaml       # Book registry
β”œβ”€β”€ docs/img/        # README figures
└── requirements.txt

Pipeline flow:

PDF textbooks ─► MinerU parsing ─► Section split ─► GPT-5.2 schema-constrained extraction
               ─► Hierarchical merge (book β†’ subject β†’ global) ─► DAG validation + expert review
               ─► K12-KGraph ─► K12-Bench (queries) + K12-Train (QA synthesis)

πŸš€ Quick Start

1. Install

git clone https://github.com/haolpku/K12-Dataset.git
cd K12-Dataset
pip install -r requirements.txt

If you will run the graph pipeline from PDFs, also install MinerU and make magic-pdf callable from the shell (command name configurable via config/default.yaml).

2. Load the released dataset

from datasets import load_dataset

kg    = load_dataset("lhpku20010120/K12-KGraph", split="train")
bench = load_dataset("lhpku20010120/K12-KGraph", name="bench", split="test")
train = load_dataset("lhpku20010120/K12-KGraph", name="train", split="train")

3. Build the graph from scratch

cp config/.env.example config/.env         # add your OPENAI_API_KEY etc.
python src/kg/run_pipeline.py \
    --config config/default.yaml \
    --filter-prefix <YourBookPrefix>       # e.g. math_7a_rjb

4. Derive Bench and SFT data

python src/benchmark/run_pipeline.py --help
python src/sft_qa/run_pipeline.py   --help

5. Evaluate a model on K12-Bench

cp eval/configs/.env.example eval/configs/.env
chmod +x eval/run.sh
./eval/run.sh <model-config-stem>          # eval/configs/models/<stem>.yaml

🧱 Schema at a Glance

7 node types β€” Book Β· Chapter Β· Section Β· Concept Β· Skill Β· Experiment Β· Exercise

9 edge types β€” is_a Β· prerequisites_for Β· relates_to Β· verifies Β· tests_concept Β· tests_skill Β· appears_in Β· leads_to Β· is_part_of

Every Concept carries name, definition, importance, and optional formula, aliases, examples. Every Experiment carries instruments, is_student, process, phenomena, conclusion. Full schema and attribute specification in docs/schema.md (coming soon) or on the project page.

A concrete example: how the same prerequisites_for subgraph yields a K12-Bench item (A) and a K12-Train QA pair (B)

πŸ“š Dataset Composition

K12-Bench distribution across subjects, task families, and difficulty.
Subject Books Concepts Skills Experiments Exercises
Mathematics 23 1,475 428 0 471
Physics 9 1,154 197 220 186
Chemistry 7 2,302 451 309 270
Biology 9 1,648 288 123 244
Total 48 6,579 1,364 652 1,171

πŸ§ͺ Quality Assurance

  • Fleiss' ΞΊ = 0.84 overall, from 12 subject-qualified expert annotators (ΞΊ by relation: is_a 0.91, prerequisites_for 0.82, relates_to 0.69, verifies 0.88)
  • Automatic DAG validation on is_a and prerequisites_for subgraphs
  • Per-edge evidence field linking back to textbook source text for auditability
  • 98.4% stratified K12-Bench items verified as "fully correct" in a 3-expert spot-check

🌈 Explore Interactively

Want to browse nodes, sample bench items, or inspect the training data without cloning the repo? The companion project page offers a rich interactive view:


🀝 Contribute

Contributions are welcome! We particularly appreciate:

  • 🏫 Adding support for other textbook publishers (BNU, Jiangsu, etc.)
  • πŸ§ͺ New task families that extend beyond the current 5
  • πŸ› Bug reports and quality issues on existing graph edges (please cite the specific edge ID)
  • 🌍 Translation of the schema/documentation into additional languages

Open an issue or pull request β€” GitHub Issues are monitored within 48 hours.


πŸ“– Citation

If you find K12-KGraph useful in your research, please cite:

@misc{k12kgraph2026,
  title        = {K12-KGraph: A Curriculum-Aligned Knowledge Graph for
                  Benchmarking and Training Educational LLMs},
  author       = {Hao Liang and others},
  year         = {2026},
  howpublished = {Submitted to NeurIPS 2026 Evaluations and Datasets Track},
  url          = {https://github.com/haolpku/K12-Dataset}
}

πŸ“„ License

  • Dataset (graph, benchmark, training data): CC BY-NC-SA 4.0
  • Code (this repository): MIT

Made with care by the K12-KGraph team Β· Project Page Β· Dataset Β· δΈ­ζ–‡ README

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE-CODE

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors