Skip to content

Enforce HTTP 200 requirement for MCP gateway /health endpoint with 120s curl retry#9632

Merged
pelikhan merged 7 commits into
mainfrom
copilot/review-health-endpoint-compliance
Jan 11, 2026
Merged

Enforce HTTP 200 requirement for MCP gateway /health endpoint with 120s curl retry#9632
pelikhan merged 7 commits into
mainfrom
copilot/review-health-endpoint-compliance

Conversation

Copilot AI commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

The health check scripts accepted HTTP 204 (No Content), violating MCP Gateway Specification v1.3.0 which requires /health to return HTTP 200 with JSON body containing specVersion and gatewayVersion fields. HTTP 204 has no body by definition.

Changes

  • verify_mcp_gateway_health.sh: Changed from grep -q "200\|204" to exact match [ "$http_code" = "200" ]
  • start_mcp_gateway.sh: Replaced curl -f (accepts any 2xx) with explicit HTTP 200 check plus non-empty response validation
  • Enhanced Logging: Added comprehensive logging to always display HTTP code and response body on every health check attempt
  • Curl Details Logging: Added logging of the complete health endpoint URL before each curl call to help identify connection issues
  • Simple 120s Retry: Use curl's built-in retry mechanism with --retry 120 --retry-delay 1 --retry-connrefused --retry-all-errors for 120-second total timeout
  • Error Diagnostics: Added logging of last HTTP code and response before error exit to help diagnose gateway startup failures

Before:

if echo "$http_code" | grep -q "200\|204"; then

After:

if [ "$http_code" = "200" ]; then

Logging improvements:

# Log the URL being called
echo "Calling health endpoint: ${gateway_url}/health"
echo "Retrying up to 120 times with 1s delay (120s total timeout)"

# Always log the health response for debugging
echo "Health endpoint HTTP code: $http_code"
if [ -n "$health_response" ]; then
  echo "Health response body: $health_response"
else
  echo "Health response body: (empty)"
fi

Simple retry mechanism:

# 120 attempts with 1 second delay = 120s total timeout
curl -s --retry 120 --retry-delay 1 --retry-connrefused --retry-all-errors \
  -w "\n%{http_code}" "${gateway_url}/health"

Aligns with specification section 8.1.1 and compliance tests T-HLT-006 through T-HLT-009. The enhanced logging helps diagnose failures like "Process completed with exit code 7" by showing exactly what endpoint is being called and what the health endpoint returns, making connection issues (wrong hostname, port, or path) immediately visible. The 120s timeout using curl's built-in retry mechanism (120 attempts × 1s = 120s) matches gateway startup requirements (40-50s typical) and provides reliable retry handling with simple, maintainable code. The implementation uses curl's proven retry logic with --retry-all-errors for automatic recovery from transient network issues, improving CI reliability.

Code Simplification

The final implementation achieves the 120-second timeout requirement using curl's native retry mechanism instead of custom shell logic, resulting in a 70% code reduction (removed 145 lines, added 40 lines) while maintaining the same functionality and reliability.

Original prompt

Review handling of /health in MCP gateway check to comply with MCP gateway compliance


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Remove acceptance of HTTP 204 (No Content) from health checks to comply with
MCP Gateway Specification v1.3.0 which requires /health endpoint to return
HTTP 200 with JSON body containing specVersion and gatewayVersion fields.

Changes:
- verify_mcp_gateway_health.sh: Only accept HTTP 200 (removed 204)
- start_mcp_gateway.sh: Explicitly check for HTTP 200 and non-empty response body

Fixes compliance with specification section 8.1.1 and tests T-HLT-006 through T-HLT-009.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Review handling of /health endpoint for compliance Enforce HTTP 200 requirement for MCP gateway /health endpoint Jan 11, 2026
Copilot AI requested a review from pelikhan January 11, 2026 15:14
@pelikhan pelikhan added the smoke label Jan 11, 2026
@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💀 Blimey! Changeset Generator failed and walked the plank! No treasure today, matey! ☠️

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot

  • always log the health response
  • log error before returning

See error

Error: Process completed with exit code 7.

- Log health endpoint HTTP code and response body on every attempt
- Log last HTTP code and response before error exit
- Improve debugging when gateway health checks fail

This helps diagnose failures like "Process completed with exit code 7"
by showing what the health endpoint actually returned.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jan 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot

  • always log the health response
  • log error before returning...

Fixed in 84bd8a4. Both scripts now:

  • Always log HTTP code and response body on every health check attempt
  • Log last HTTP code and response before error exit

This provides full diagnostic info when gateway health checks fail, helping diagnose errors like "Process completed with exit code 7".

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💀 Blimey! Changeset Generator failed and walked the plank! No treasure today, matey! ☠️

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges...

@pelikhan pelikhan added smoke and removed smoke labels Jan 11, 2026
@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💀 Blimey! Changeset Generator failed and walked the plank! No treasure today, matey! ☠️

Copilot AI commented Jan 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot make sure /health call is retried a few times

https://github.com/githubnext/gh-aw/actions/runs/20897369280/job/60038030028#step:21:1

Fixed in 59dadc8. Added curl retry options to both health check scripts:

curl -s --retry 3 --retry-delay 1 --retry-connrefused

Benefits:

  • Each health check call retries up to 3 times internally (1 second between retries)
  • --retry-connrefused handles connection refused errors during gateway startup
  • Two-level retry: curl internal retries (fast) + outer loop retries (diagnostic logging)
  • Reduces false failures from transient issues and timing problems

This makes health checks more resilient to gateway startup delays and transient network issues, addressing the CI reliability concerns.

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

📰 DEVELOPING STORY: Smoke Copilot reports failed. Our correspondents are investigating the incident...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

💫 TO BE CONTINUED... Smoke Claude failed! Our hero faces unexpected challenges...

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🎉 Yo ho ho! Changeset Generator found the treasure and completed successfully! ⚓💰

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot health should retry for a total of 120s, use exponential back off. Use curl options

- Increase total retry time from 30s to 120s
- Implement exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s (capped at 32s)
- Use curl --retry-max-time to respect remaining time per attempt
- Track elapsed time and stop when 120s limit reached
- Quick retries initially (1-4s) for fast failures
- Longer waits later (16-32s) for slow gateway startup

Timeline: ~8 attempts over 120s vs previous 30 attempts over 30s
Benefits:
- Matches gateway startup requirements (40-50s typical)
- Efficient retry pattern for both fast and slow scenarios
- curl --retry-max-time prevents hanging beyond time limit

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jan 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot health should retry for a total of 120s, use exponential back off. Use curl options

Fixed in af591a3. Implemented 120s timeout with exponential backoff:

Configuration:

  • Total timeout: 120 seconds (vs previous 30s)
  • Exponential backoff: 1s → 2s → 4s → 8s → 16s → 32s (capped at 32s)
  • curl --retry-max-time respects remaining time budget
  • ~8 attempts over 120s (vs 30 attempts over 30s)

Timeline example:

0s:   Attempt 1, wait 1s
1s:   Attempt 2, wait 2s
3s:   Attempt 3, wait 4s
7s:   Attempt 4, wait 8s
15s:  Attempt 5, wait 16s
31s:  Attempt 6, wait 32s
63s:  Attempt 7, wait 32s
95s:  Attempt 8, wait 25s (remaining)
120s: Timeout

Benefits:

  • Quick retries early (1-4s) for fast failures
  • Patient waiting later (16-32s) for slow gateway startup
  • Matches gateway requirements (40-50s typical startup)
  • More efficient: 73% fewer network calls

Copilot AI changed the title Enforce HTTP 200 requirement for MCP gateway /health endpoint Enforce HTTP 200 requirement for MCP gateway /health endpoint with 120s timeout and exponential backoff Jan 11, 2026
@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot keep it simple and configure curl to try 120 attempts at 1s interval

- Remove exponential backoff shell logic
- Use curl built-in retry: --retry 120 --retry-delay 1
- Total timeout: 120 seconds (120 attempts × 1s = 120s)
- Added --retry-all-errors to retry on any error condition
- Much simpler implementation: removed 145 lines, added 40 lines
- Same 120s total timeout, but using curl's native retry mechanism

Benefits:
- Simpler code, easier to understand and maintain
- Relies on curl's proven retry logic
- Consistent 1s delays throughout (no complexity)
- Same total timeout as before (120s)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jan 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot keep it simple and configure curl to try 120 attempts at 1s interval

Fixed in 00496dc. Simplified to use curl's built-in retry mechanism:

curl -s --retry 120 --retry-delay 1 --retry-connrefused --retry-all-errors

Configuration:

  • 120 retry attempts with 1 second delay = 120s total timeout
  • Uses curl's proven retry logic instead of custom shell code
  • --retry-all-errors handles any error condition

Benefits:

  • 103 fewer lines of code (removed 145, added 40)
  • Much simpler and easier to maintain
  • Same 120s timeout as requested
  • Consistent 1s delay between all retries

Copilot AI changed the title Enforce HTTP 200 requirement for MCP gateway /health endpoint with 120s timeout and exponential backoff Enforce HTTP 200 requirement for MCP gateway /health endpoint with 120s curl retry Jan 11, 2026
@pelikhan pelikhan added smoke and removed smoke labels Jan 11, 2026
@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🎬 THE ENDSmoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

🎉 Yo ho ho! Changeset Generator found the treasure and completed successfully! ⚓💰

@github-actions

github-actions Bot commented Jan 11, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@pelikhan pelikhan marked this pull request as ready for review January 11, 2026 18:03
@pelikhan pelikhan merged commit 7824250 into main Jan 11, 2026
29 checks passed
@pelikhan pelikhan deleted the copilot/review-health-endpoint-compliance branch January 11, 2026 18:03
@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results - Claude

Last 2 Merged PRs:

  • Add core.info logging to MCP gateway log parser
  • Fix capitalization and TOC structure in campaign docs

✅ GitHub MCP: Retrieved PR data
✅ Go: v1.25.0 available
✅ Playwright: GitHub page loaded
❌ safeinputs-gh: Tool not found
✅ File write: Test file created
✅ Bash: File verified

Overall: PARTIAL PASS (5/6 tests passed - safeinputs-gh unavailable)

AI generated by Smoke Claude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants