Skip to content

Scoring tests separately#350

Open
seroperson wants to merge 1 commit intoentrius:testfrom
seroperson:i339-tests-separate-scoring
Open

Scoring tests separately#350
seroperson wants to merge 1 commit intoentrius:testfrom
seroperson:i339-tests-separate-scoring

Conversation

@seroperson
Copy link
Copy Markdown

Closes #339

This PR addresses the issue that test code affect scoring process.

Before

  • Tests + source + non-code changes are counted for contribution bonus and also counted in density calculation.
  • Adding tests significantly reduce your score, because they give almost no score, but still affect density.
  • Adding non-code changes also affect the source density and may lower the score. The situation is better than with tests, as recognizable file (via extension) at least contribute some score. But unrecognized files still always lower the score, as they give zero score and affect density.
  • Situation is the same with deleted and binary files: they contribute no score, but still affect the density.

After

  • Test + source + non-code changes has its' own density and calculate its' own score.
  • Only source changes are counted for contribution bonus.
  • After each category is scored calculated, they are summed.

Implementation details

  • Added ScoringCategory enum. Possible values are TEST (if it's a test file), SOURCE (if scoring method is tree_sitter and it's not a test file), NON_CODE (everything else: recognized non-code changes, unrecognized non-code changes, deleted files, binary files)
  • Added PrScoringResultCategorized, which holds PrScoringResult per each existing category.
  • Scoring logic moved from scoring.py into PrScoringResultCategorized methods.

Test cases

Test files:

  • test_adding_tests_does_not_reduce_score - Adding test files to a source PR never lowers the base score
  • test_tests_do_not_affect_contribution_bonus - Small and large test files produce modest, similar increases (test weight is 0.05x)
  • test_same_code_in_test_path_scores_much_lower - Identical code in a test directory scores 10x+ lower than in a source path
  • test_tests_do_not_affect_threshold - Test files can't push a below-threshold PR past the token score threshold

Non-code files:

  • test_adding_non_code_files_does_not_reduce_score - Adding non-code files (markdown, yaml) never lowers the base score
  • test_non_code_does_not_affect_contribution_bonus - Small and large non-code files produce the same density increase (no bonus impact)
  • test_source_code_scores_much_higher_than_non_code - Tree-diff scored source code scores 10x+ higher than line-count scored non-code files
  • test_non_code_does_not_affect_threshold - Non-code files can't push a below-threshold PR past the token score threshold

Zero-score files:

  • test_deleted_file_does_not_change_score - Deleted files (score=0) don't affect the base score
  • test_unsupported_file_does_not_change_score - Unsupported extensions (score=0) don't affect the base score

Density:

  • test_adding_test_category_increases_score_beyond_single_cap - Per-category density cap allows multiple categories to contribute independently
  • test_verbose_formatting_decreases_score - Same logic in more lines produces lower density and lower score
  • test_modified_file_scores_diff_only - Modified files score only the AST diff, not the entire file

Threshold:

  • test_below_threshold_scores_less - Trivial changes below token score threshold score less than substantial changes

@anderdc
Copy link
Copy Markdown
Collaborator

anderdc commented Apr 7, 2026

just merged big changes into test please fix conflicts thanks

@seroperson seroperson force-pushed the i339-tests-separate-scoring branch from e39bbe8 to d9ff4cd Compare April 7, 2026 08:05
@seroperson seroperson force-pushed the i339-tests-separate-scoring branch from d9ff4cd to d8a9df5 Compare April 7, 2026 08:50
@seroperson
Copy link
Copy Markdown
Author

@anderdc I've rebased it on current test 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tests affect code density and reduce resulting score very much

2 participants