Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 22, 2026

Integration tests intermittently fail in CI with HTTP 404 when accessing calibration app endpoints. Discovery completes (setUpClass passes) but individual app REST resources are not yet accessible—a timing race.

Changes

Added readiness helper that polls /apps/calibration endpoint:

  • 10s timeout, 200ms intervals
  • Returns on HTTP 200, raises unittest.SkipTest on timeout
  • Preserves test coverage when app is available, prevents false negatives when it's not
def _ensure_calibration_app_ready(self, timeout: float = 10.0, interval: float = 0.2):
    """Wait for calibration app REST resource, skip test if unavailable."""
    start_time = time.time()
    last_error = None
    while time.time() - start_time < timeout:
        try:
            response = requests.get(f'{self.BASE_URL}/apps/calibration', timeout=2)
            if response.status_code == 200:
                return
            last_error = f'Status {response.status_code}'
        except requests.exceptions.RequestException as e:
            last_error = str(e)
        time.sleep(interval)
    
    raise unittest.SkipTest(
        f'Calibration app not available after {timeout}s '
        f'(flaky discovery readiness race in CI). Last error: {last_error}'
    )

Applied to 4 tests that depend on calibration app:

  • test_12_app_no_topics
  • test_31_operation_call_calibrate_service
  • test_32_operation_call_nonexistent_operation
  • test_37_operations_listed_in_app_discovery

Each test now starts with:

self._ensure_calibration_app_ready()

Impact

  • Eliminates CI false negatives from discovery race
  • Skip messages clearly identify the race condition
  • Typical overhead: ~200ms when app is available
  • No gateway implementation changes
Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] Flaky tests: Integration tests fails on CI</issue_title>
<issue_description># Bug report

Integration tests in ros2_medkit_gateway are flaky. Tests that depend on the calibration app intermittently fail with HTTP 404 (app not found), even though the calibration demo nodes are started and discovery later reports completion.

Steps to reproduce

  1. Run CI workflow build_and_test (GitHub Actions) on main or on a PR.
  2. Observe colcon test-result --verbose for ros2_medkit_gateway.
  3. Integration suite test_integration fails intermittently.

Expected behavior

  • CI on main should not fail due to flaky integration tests.
  • Tests that rely on discovery should wait until the required entities (apps/operations) are present before asserting on endpoints.

Actual behavior

The suite fails with 2 errors and 2 failures. The recurring pattern is that the REST API returns 404 Not Found for endpoints under /api/v1/apps/calibration....

Failing tests (clustered around the same underlying issue):

test_12_app_no_topics
GET /api/v1/apps/calibration/data → 404

test_31_operation_call_calibrate_service
expects 200, gets 404

test_37_operations_listed_in_app_discovery
GET /api/v1/apps/calibration → 404

Proposed fix options:
Option A (preferred): Make tests robust (remove flakiness)
Add an explicit readiness wait in the integration test setup before any assertions that depend on calibration.

Option B: Temporarily quarantine flaky tests
Mark the affected tests as skipped in CI until the race is fixed.

Additionally:

test_32_operation_call_nonexistent_operation fails because it expects 'Operation not found' but the API returns 'Entity not found'. This appears to be a contract/message mismatch, not necessarily flakiness.

Additional information

    NO TESTS RAN
    -- run_test.py: return code 1
    -- run_test.py: verify result file '/__w/ros2_medkit/ros2_medkit/build/ros2_medkit_gateway/test_results/ros2_medkit_gateway/test_integration.xunit.xml'
  >>>
build/ros2_medkit_gateway/test_results/ros2_medkit_gateway/test_integration.xunit.xml: 70 tests, 2 errors, 2 failures, 0 skipped
- ros2_medkit_gateway.TestROS2MedkitGatewayIntegration test_12_app_no_topics
  <<< error message
    Traceback (most recent call last):
      File "/__w/ros2_medkit/ros2_medkit/src/ros2_medkit_gateway/test/test_integration.test.py", line 684, in test_12_app_no_topics
        data = self._get_json('/apps/calibration/data')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/__w/ros2_medkit/ros2_medkit/src/ros2_medkit_gateway/test/test_integration.test.py", line 332, in _get_json
        response.raise_for_status()
      File "/usr/lib/python3/dist-packages/requests/models.py", line 1021, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
    requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:8080/api/v1/apps/calibration/data
  >>>
- ros2_medkit_gateway.TestROS2MedkitGatewayIntegration test_31_operation_call_calibrate_service
  <<< failure message
    Traceback (most recent call last):
      File "/__w/ros2_medkit/ros2_medkit/src/ros2_medkit_gateway/test/test_integration.test.py", line 1201, in test_31_operation_call_calibrate_service
        self.assertEqual(response.status_code, 200)
    AssertionError: 404 != 200
  >>>
- ros2_medkit_gateway.TestROS2MedkitGatewayIntegration test_32_operation_call_nonexistent_operation
  <<< failure message
    Traceback (most recent call last):
      File "/__w/ros2_medkit/ros2_medkit/src/ros2_medkit_gateway/test/test_integration.test.py", line 1237, in test_32_operation_call_nonexistent_operation
        self.assertIn('Operation not found', data['error'])
    AssertionError: 'Operation not found' not found in 'Entity not found'
  >>>
- ros2_medkit_gateway.TestROS2MedkitGatewayIntegration test_37_operations_listed_in_app_discovery
  <<< error message
    Traceback (most recent call last):
      File "/__w/ros2_medkit/ros2_medkit/src/ros2_medkit_gateway/test/test_integration.test.py", line 1359, in test_37_operations_listed_in_app_discovery
        data = self._get_json('/apps/calibration')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/__w/ros2_medkit/ros2_medkit/src/ros2_medkit_gateway/test/test_integration.test.py", line 332, in _get_json
        response.raise_for_status()
      File "/usr/lib/python3/dist-packages/requests/models.py", line 1021, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
    requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:8080/api/v1/apps/calibration
  >>>

Summary: 1573 tests, 2 errors, 3 failures, 173 skipped
```</issue_description>

<agent_instructions>You are working in the ros2_medkit repository. Please implement a temporary CI quarantine for flaky ROS 2 integration tests in:

src/ros2_medkit_gateway/test/test_integration.te...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes selfpatch/ros2_medkit#131

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

Copilot AI and others added 2 commits January 22, 2026 16:40
Co-authored-by: bburda <24573178+bburda@users.noreply.github.com>
Co-authored-by: bburda <24573178+bburda@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix flaky integration tests in ros2_medkit_gateway Fix flaky integration tests with calibration app readiness check Jan 22, 2026
Copilot AI requested a review from bburda January 22, 2026 16:44
@bburda bburda marked this pull request as ready for review January 22, 2026 16:57
Copilot AI review requested due to automatic review settings January 22, 2026 16:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses flaky integration tests in CI by implementing a readiness check for the calibration app. The tests were intermittently failing with HTTP 404 errors due to a race condition where discovery completed but individual app REST resources were not yet accessible.

Changes:

  • Added _ensure_calibration_app_ready() helper method that polls the calibration app endpoint with a 10-second timeout
  • Applied the readiness check to 4 failing tests identified in the bug report
  • Tests now skip gracefully with clear messages if the calibration app is unavailable, preventing false negatives in CI

@mfaferek93 mfaferek93 self-requested a review January 22, 2026 17:22
@bburda bburda merged commit 1b405e9 into main Jan 22, 2026
10 checks passed
@bburda bburda deleted the copilot/fix-flaky-integration-tests branch January 22, 2026 17:23
@bburda bburda linked an issue Jan 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Flaky tests: Integration tests fails on CI

3 participants