Skip to content
/ insights Public

This is an engine that sources insights about project health from the github archive

License

Notifications You must be signed in to change notification settings

cncf/insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

insights

This is an engine that sources insights about project health from the github archive.

Coverage

Quick Start

To get started with the insights tool:

  1. Ensure you have Ruby installed (2.7 or higher recommended)
  2. Clone this repository
  3. Install dependencies:
    bundle install
    
  4. Set up your GitHub API token (required for API access):
    export GITHUB_TOKEN=your_github_personal_access_token
    
  5. Run the tool with an example repository:
    ruby build.rb -r etcd-io/etcd
    
    This will generate a report for the etcd-io/etcd repository for the last 30 days (default). The data will be stored in repo.db by default.
  6. View the generated report in the reports directory:
    open reports/etcd-io-etcd-lottery.html
    

Project Structure

build.rb                # Main script to build reports
Gemfile                 # Ruby dependencies
repo.db                 # SQLite database for storing repository data
lib/
  data_fetcher.rb       # Fetches data from GitHub API
  database.rb           # Database operations
  github_client.rb      # GitHub API client
  html_generator.rb     # Generates HTML reports
  lottery_factor_tool.rb # Calculate lottery factor metrics
reports/                # Generated HTML reports

Plan

Working with @jpmcb to prototype a tool that sources project health insights from a selection of CNCF projects. This is still a WIP but we will follow this path:

Testing

The project uses MiniTest for testing with mock and stub capabilities to ensure API interactions can be properly tested without making actual API calls.

Running Tests

To run the test suite:

# Run all tests
bundle exec rake test

# Run tests with coverage report
bundle exec rake coverage

# Run a single test file
bundle exec ruby -Ilib:test test/path/to/test_file.rb

# Run a specific test within a file
bundle exec ruby -Ilib:test test/path/to/test_file.rb -n test_method_name

Test Coverage

Test coverage is tracked using SimpleCov. After running the tests with coverage enabled, you can:

  1. View the detailed HTML coverage report:

    open coverage/index.html
  2. Generate a coverage badge for your README:

    bundle exec rake coverage_badge
  3. See the coverage breakdown by file in the terminal:

    bundle exec rake coverage

Continuous Integration

Tests automatically run on GitHub Actions for all pull requests to ensure code quality. The workflow:

  • Runs all tests
  • Checks code style with RuboCop
  • Generates and reports test coverage metrics

Available Insights

Lottery Factor

The lottery factor indicates how dependent a project is on a small number of contributors. A high lottery factor suggests that if those key contributors left (won the lottery), the project might struggle.

Usage

ruby build.rb -r <owner/repo> [options]

Required parameters:

  • -r, --repo REPO: GitHub repository in the format 'owner/repo' (e.g., etcd-io/etcd)

Optional parameters:

  • -d, --database FILENAME: SQLite database filename (defaults to repo.db)
  • -t, --time-range DAYS: Time range in days to analyze (defaults to 30)
  • -o, --output FILENAME: Output HTML filename (default: owner-repo-lottery.html)
  • -f, --force: Force fetching new data even if recent data exists
  • --top-display COUNT: Number of top contributors to display individually (default: 6)

Example:

ruby build.rb -r etcd-io/etcd
ruby build.rb -r etcd-io/etcd -f  # Force fetch new data

Debugging

SQLite Database Inspection

To debug issues with contributor data, you can inspect the SQLite database directly using the following commands:

# Open the SQLite database
sqlite3 repo.db

# List all tables
.tables

# Show schema for pull_requests table
.schema pull_requests

# Check recent pull requests (last 30 days)
SELECT pr.author, COUNT(*) as pr_count
FROM pull_requests pr
JOIN repositories r ON pr.repository_id = r.id
WHERE pr.merged_at >= datetime('now', '-30 days')
GROUP BY pr.author
ORDER BY pr_count DESC;

# View raw pull request data with dates
SELECT pr.author, pr.merged_at
FROM pull_requests pr
JOIN repositories r ON pr.repository_id = r.id
ORDER BY pr.merged_at DESC
LIMIT 10;

# Exit SQLite
.quit

Common issues to check:

  • Verify pull request data exists in the database
  • Check if dates are stored in correct ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)
  • Confirm the repository owner and name are correct in the repositories table
  • Ensure pull requests have proper merged_at timestamps

If no data appears in the HTML report but exists in the database, check:

  1. The date filtering in the database queries
  2. The data transformation in the HTML generator
  3. The GitHub API token permissions and rate limits

About

This is an engine that sources insights about project health from the github archive

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages