Enforce HTTP 200 requirement for MCP gateway /health endpoint with 120s curl retry#9632
Conversation
Remove acceptance of HTTP 204 (No Content) from health checks to comply with MCP Gateway Specification v1.3.0 which requires /health endpoint to return HTTP 200 with JSON body containing specVersion and gatewayVersion fields. Changes: - verify_mcp_gateway_health.sh: Only accept HTTP 200 (removed 204) - start_mcp_gateway.sh: Explicitly check for HTTP 200 and non-empty response body Fixes compliance with specification section 8.1.1 and tests T-HLT-006 through T-HLT-009. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident... |
|
💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges... |
|
💀 Blimey! Changeset Generator failed and walked the plank! No treasure today, matey! ☠️ |
See error Error: Process completed with exit code 7. |
- Log health endpoint HTTP code and response body on every attempt - Log last HTTP code and response before error exit - Improve debugging when gateway health checks fail This helps diagnose failures like "Process completed with exit code 7" by showing what the health endpoint actually returned. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in 84bd8a4. Both scripts now:
This provides full diagnostic info when gateway health checks fail, helping diagnose errors like "Process completed with exit code 7". |
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident... |
|
💀 Blimey! Changeset Generator failed and walked the plank! No treasure today, matey! ☠️ |
|
💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges... |
|
📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident... |
|
💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges... |
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
💀 Blimey! Changeset Generator failed and walked the plank! No treasure today, matey! ☠️ |
Fixed in 59dadc8. Added curl retry options to both health check scripts: curl -s --retry 3 --retry-delay 1 --retry-connrefusedBenefits:
This makes health checks more resilient to gateway startup delays and transient network issues, addressing the CI reliability concerns. |
|
📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident... |
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges... |
|
🎉 Yo ho ho! Changeset Generator found the treasure and completed successfully! ⚓💰 |
|
@copilot health should retry for a total of 120s, use exponential back off. Use curl options |
- Increase total retry time from 30s to 120s - Implement exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s (capped at 32s) - Use curl --retry-max-time to respect remaining time per attempt - Track elapsed time and stop when 120s limit reached - Quick retries initially (1-4s) for fast failures - Longer waits later (16-32s) for slow gateway startup Timeline: ~8 attempts over 120s vs previous 30 attempts over 30s Benefits: - Matches gateway startup requirements (40-50s typical) - Efficient retry pattern for both fast and slow scenarios - curl --retry-max-time prevents hanging beyond time limit Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in af591a3. Implemented 120s timeout with exponential backoff: Configuration:
Timeline example: Benefits:
|
|
@copilot keep it simple and configure curl to try 120 attempts at 1s interval |
- Remove exponential backoff shell logic - Use curl built-in retry: --retry 120 --retry-delay 1 - Total timeout: 120 seconds (120 attempts × 1s = 120s) - Added --retry-all-errors to retry on any error condition - Much simpler implementation: removed 145 lines, added 40 lines - Same 120s total timeout, but using curl's native retry mechanism Benefits: - Simpler code, easier to understand and maintain - Relies on curl's proven retry logic - Consistent 1s delays throughout (no complexity) - Same total timeout as before (120s) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in 00496dc. Simplified to use curl's built-in retry mechanism: curl -s --retry 120 --retry-delay 1 --retry-connrefused --retry-all-errorsConfiguration:
Benefits:
|
|
✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟 |
|
🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨ |
|
🎉 Yo ho ho! Changeset Generator found the treasure and completed successfully! ⚓💰 |
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
Smoke Test Results - ClaudeLast 2 Merged PRs:
✅ GitHub MCP: Retrieved PR data Overall: PARTIAL PASS (5/6 tests passed - safeinputs-gh unavailable)
|
The health check scripts accepted HTTP 204 (No Content), violating MCP Gateway Specification v1.3.0 which requires
/healthto return HTTP 200 with JSON body containingspecVersionandgatewayVersionfields. HTTP 204 has no body by definition.Changes
verify_mcp_gateway_health.sh: Changed fromgrep -q "200\|204"to exact match[ "$http_code" = "200" ]start_mcp_gateway.sh: Replacedcurl -f(accepts any 2xx) with explicit HTTP 200 check plus non-empty response validation--retry 120 --retry-delay 1 --retry-connrefused --retry-all-errorsfor 120-second total timeoutBefore:
After:
Logging improvements:
Simple retry mechanism:
Aligns with specification section 8.1.1 and compliance tests T-HLT-006 through T-HLT-009. The enhanced logging helps diagnose failures like "Process completed with exit code 7" by showing exactly what endpoint is being called and what the health endpoint returns, making connection issues (wrong hostname, port, or path) immediately visible. The 120s timeout using curl's built-in retry mechanism (120 attempts × 1s = 120s) matches gateway startup requirements (40-50s typical) and provides reliable retry handling with simple, maintainable code. The implementation uses curl's proven retry logic with
--retry-all-errorsfor automatic recovery from transient network issues, improving CI reliability.Code Simplification
The final implementation achieves the 120-second timeout requirement using curl's native retry mechanism instead of custom shell logic, resulting in a 70% code reduction (removed 145 lines, added 40 lines) while maintaining the same functionality and reliability.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.