A command-line tool to convert git commit history to Parquet format, including unified diffs for data analysis and AI applications.
npm install -g git2parquet# Export git history of current repo to gitlog.parquet
git2parquet
# Export to custom filename
git2parquet commits.parquet
# Export and open with hyperparam
git2parquet --open
# Export to custom file and open with hyperparam
git2parquet commits.parquet --openThe generated Parquet file contains the following columns:
hash(STRING): Git commit hashauthorName(STRING): Author's nameauthorEmail(STRING): Author's email addressdate(TIMESTAMP): Commit date in ISO formatsubject(STRING): Commit message subject linediff(STRING): Unified diff showing file changes
- Node.js
- Must be run from within a git repository
- Git must be available in PATH
--help,-h: Show help message--open: Open the generated Parquet file with hyperparam after export
- Analyzing code change patterns over time
- Training ML models on code evolution
- Creating datasets for software engineering research
- Building commit history dashboards
Hyperparam is a tool for exploring and curating AI datasets. The Hyperparam CLI (npx hyperparam) is a local viewer for ML datasets that launches a small HTTP server and opens your browser to interactively explore the generated git2parquet output file.