Skip to content

Conversation

@paulz
Copy link
Contributor

@paulz paulz commented Mar 15, 2025

This pull request includes several changes aimed at improving the testing framework, refactoring code for better organization, and enhancing error handling and reporting. The most important changes include the extraction of a script for generating statistical reports, the addition of new helper functions and fixtures, and the refactoring of existing test functions for better clarity and structure.

Improvements to testing framework and error handling:

Refactoring and code organization:

Enhancements to existing tests:

Addition of helper functions:

@tkersey tkersey requested a review from Copilot March 16, 2025 01:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the testing framework and error handling by introducing a modular script for generating statistical reports, refactoring test functions for clarity, and consolidating helper functions.

  • Extracts the CAT AI statistical report script for workflow modularity
  • Adds a new function for JSON schema validation and refactors tests to use helper functions
  • Improves test configuration and organization by refactoring redundant functions

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
examples/team_recommender/src/response_matches_json_schema.py Adds a function to validate JSON responses against a schema
examples/team_recommender/tests/helpers.py Introduces new helper functions (e.g., natural sort, success rate assertion)
examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py Refactors threshold measurement tests and error messages
examples/team_recommender/tests/example_7_schema_validators/test_response_has_valid_schema.py Updates schema validation test with the new helper function
.github/workflows/cat-test-examples.yml Updates workflow step to use the new statistical report script
examples/team_recommender/tests/conftest.py Refactors fixtures and example discovery logic
Comments suppressed due to low confidence (2)

examples/team_recommender/tests/helpers.py:189

  • The expected order in this assertion relies on lexicographical sorting, which does not reflect natural number ordering; consider using the natural_sort_key in the sort or updating the expected order accordingly.
assert [ "example_10_threshold", "example_1_text_response", "example_2_unit", "example_8_retry_network", "example_9_retry_with_open_telemetry", ] == sorted(unsorted), "The list should be sorted by the number in the name"

examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py:85

  • [nitpick] The error message 'Expected {expected_success_rate_measured} to be within of the success rate' is unclear; consider revising it to clearly indicate that the success rate is expected to lie within a specific confidence interval.
assert is_within_expected(expected_success_rate_measured, failure_count, sample_size), (

paulz added 20 commits March 16, 2025 16:33
The issue is that the condition uses success(), which means the step will only run if all previous steps in the job were successful.
@paulz paulz linked an issue Mar 17, 2025 that may be closed by this pull request
Copy link
Contributor

@austinworks austinworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

@austinworks austinworks merged commit 3794b45 into main Mar 17, 2025
2 checks passed
@austinworks austinworks deleted the ci-experiment/add-notice branch March 17, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: - test_failure_rate_bar_graph - Failed: Timeout >10.0s

3 participants