This is an engine that sources insights about project health from the github archive.
To get started with the insights tool:
- Ensure you have Ruby installed (2.7 or higher recommended)
- Clone this repository
- Install dependencies:
bundle install
- Set up your GitHub API token (required for API access):
export GITHUB_TOKEN=your_github_personal_access_token
- Run the tool with an example repository:
This will generate a report for the etcd-io/etcd repository for the last 30 days (default). The data will be stored in
ruby build.rb -r etcd-io/etcd
repo.db
by default. - View the generated report in the
reports
directory:open reports/etcd-io-etcd-lottery.html
build.rb # Main script to build reports
Gemfile # Ruby dependencies
repo.db # SQLite database for storing repository data
lib/
data_fetcher.rb # Fetches data from GitHub API
database.rb # Database operations
github_client.rb # GitHub API client
html_generator.rb # Generates HTML reports
lottery_factor_tool.rb # Calculate lottery factor metrics
reports/ # Generated HTML reports
Working with @jpmcb to prototype a tool that sources project health insights from a selection of CNCF projects. This is still a WIP but we will follow this path:
- POC for sourcing archive for a selection of repositories (probably a yml list)
- Store results in
postgres DB or duckdbsqlite - Generate lottery factor as the first insight.
- Nice to haves or stretch goal: Contributor Confidence, YOLO coder, and Outside Contributor.
The project uses MiniTest for testing with mock and stub capabilities to ensure API interactions can be properly tested without making actual API calls.
To run the test suite:
# Run all tests
bundle exec rake test
# Run tests with coverage report
bundle exec rake coverage
# Run a single test file
bundle exec ruby -Ilib:test test/path/to/test_file.rb
# Run a specific test within a file
bundle exec ruby -Ilib:test test/path/to/test_file.rb -n test_method_name
Test coverage is tracked using SimpleCov. After running the tests with coverage enabled, you can:
-
View the detailed HTML coverage report:
open coverage/index.html
-
Generate a coverage badge for your README:
bundle exec rake coverage_badge
-
See the coverage breakdown by file in the terminal:
bundle exec rake coverage
Tests automatically run on GitHub Actions for all pull requests to ensure code quality. The workflow:
- Runs all tests
- Checks code style with RuboCop
- Generates and reports test coverage metrics
The lottery factor indicates how dependent a project is on a small number of contributors. A high lottery factor suggests that if those key contributors left (won the lottery), the project might struggle.
ruby build.rb -r <owner/repo> [options]
Required parameters:
-r, --repo REPO
: GitHub repository in the format 'owner/repo' (e.g., etcd-io/etcd)
Optional parameters:
-d, --database FILENAME
: SQLite database filename (defaults to repo.db)-t, --time-range DAYS
: Time range in days to analyze (defaults to 30)-o, --output FILENAME
: Output HTML filename (default: owner-repo-lottery.html)-f, --force
: Force fetching new data even if recent data exists--top-display COUNT
: Number of top contributors to display individually (default: 6)
Example:
ruby build.rb -r etcd-io/etcd
ruby build.rb -r etcd-io/etcd -f # Force fetch new data
To debug issues with contributor data, you can inspect the SQLite database directly using the following commands:
# Open the SQLite database
sqlite3 repo.db
# List all tables
.tables
# Show schema for pull_requests table
.schema pull_requests
# Check recent pull requests (last 30 days)
SELECT pr.author, COUNT(*) as pr_count
FROM pull_requests pr
JOIN repositories r ON pr.repository_id = r.id
WHERE pr.merged_at >= datetime('now', '-30 days')
GROUP BY pr.author
ORDER BY pr_count DESC;
# View raw pull request data with dates
SELECT pr.author, pr.merged_at
FROM pull_requests pr
JOIN repositories r ON pr.repository_id = r.id
ORDER BY pr.merged_at DESC
LIMIT 10;
# Exit SQLite
.quit
Common issues to check:
- Verify pull request data exists in the database
- Check if dates are stored in correct ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)
- Confirm the repository owner and name are correct in the repositories table
- Ensure pull requests have proper merged_at timestamps
If no data appears in the HTML report but exists in the database, check:
- The date filtering in the database queries
- The data transformation in the HTML generator
- The GitHub API token permissions and rate limits