Skip to content

hyparam/git2parquet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git2parquet

npm mit license dependencies

A command-line tool to convert git commit history to Parquet format, including unified diffs for data analysis and AI applications.

Installation

npm install -g git2parquet

Usage

Command Line

# Export git history of current repo to gitlog.parquet
git2parquet

# Export to custom filename
git2parquet commits.parquet

# Export and open with hyperparam
git2parquet --open

# Export to custom file and open with hyperparam
git2parquet commits.parquet --open

Output Schema

The generated Parquet file contains the following columns:

  • hash (STRING): Git commit hash
  • authorName (STRING): Author's name
  • authorEmail (STRING): Author's email address
  • date (TIMESTAMP): Commit date in ISO format
  • subject (STRING): Commit message subject line
  • diff (STRING): Unified diff showing file changes

Requirements

  • Node.js
  • Must be run from within a git repository
  • Git must be available in PATH

Options

  • --help, -h: Show help message
  • --open: Open the generated Parquet file with hyperparam after export

Use Cases

  • Analyzing code change patterns over time
  • Training ML models on code evolution
  • Creating datasets for software engineering research
  • Building commit history dashboards

Hyperparam

Hyperparam is a tool for exploring and curating AI datasets. The Hyperparam CLI (npx hyperparam) is a local viewer for ML datasets that launches a small HTTP server and opens your browser to interactively explore the generated git2parquet output file.

About

CLI tool to export git commits in parquet format

Resources

License

Stars

Watchers

Forks

Packages

No packages published