-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: refine reconnect strategy in azure asr extension #1927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sunxilin
commented
Dec 29, 2025
Pull Request Review: Refine Reconnect Strategy in Azure ASR ExtensionSummaryThis PR transitions the Azure ASR extension from a fixed 5-attempt retry limit to an unlimited reconnection strategy with exponential backoff capped at 2.4 seconds. This is a significant architectural change that improves resilience for transient network issues. Code Quality & Best Practices✅ Strengths
Potential Issues & Concerns
|
Pull Request Review: Refine Reconnect Strategy in Azure ASR ExtensionSummaryThis PR removes the 5-attempt limit for reconnection retries in the Azure ASR extension, implementing an unlimited retry strategy with exponential backoff capped at 2.4 seconds. The changes are well-structured with good test coverage. ✅ Strengths1. Clear Intent and Good Motivation
2. Well-Structured Code Changes
3. Excellent Test Coverage
🔍 Issues & ConcernsCRITICAL: Potential Infinite Loop RiskLocation: Issue: The current implementation has no mechanism to stop retrying. If Azure ASR is permanently unavailable (service outage, invalid credentials, etc.), the extension will retry indefinitely. Impact:
Recommendations:
ISSUE: Error Reporting SemanticsLocation: Problem: Using
Recommendation: Use MINOR: Test Case Issues1. Test Configuration InconsistencyLocation: Question: Is this change related to the reconnection strategy? The commit message and PR description don't explain this modification. If this is a separate bugfix, it should be documented or split into a separate commit. 2. Integration Test File ReferenceLocation: Question: Why change from the hotwords test file to the standard file? This appears unrelated to reconnection logic. MINOR: Documentation IssuesLocation: Issue: The docstring phrase "or max attempts are reached" is now incorrect since there's no maximum. 🔒 Security Considerations✅ No new security issues introduced
🎯 Performance Considerations✅ Performance Impact: Minimal to Positive Pros:
Cons:
🧪 Test Coverage✅ Good Test Coverage Strengths:
📋 Recommendations SummaryMust Fix (Critical):
Should Fix (Important):
Nice to Have (Minor):
🎓 Code Quality Score: B+Breakdown:
Overall: This is a well-intentioned change with good code quality, but needs safeguards to prevent indefinite retries in production. 💬 Questions for Author
Review completed by: Claude Code (Automated Code Review) |
Pull Request Review: Refine reconnect strategy in azure asr extensionSummaryThis PR removes the reconnection retry limit from the Azure and Tencent ASR extensions, implementing an unlimited retry strategy with exponential backoff. The changes allow the ASR services to continuously attempt reconnection until successful, rather than giving up after 5 attempts. ✅ Strengths1. Well-Tested Changes
2. Clean API Design
3. Clear Documentation
4. Consistent Cross-Extension Changes
|
Code Review: PR #1927 - Refine Reconnect Strategy in Azure ASR ExtensionSummaryThis PR refactors the Azure ASR extension's reconnection strategy from a limited retry model (5 attempts) to an unlimited retry model with exponential backoff. The changes also align the Tencent ASR extension with similar unlimited reconnection parameters. Positive Aspects✅ Well-tested change: The new test validates the unlimited reconnection beyond the old 5-attempt limit with 8 failures before success. ✅ Clear intent: The change addresses a real user pain point where ASR connections would fail permanently after 5 attempts, requiring manual intervention. ✅ Good documentation: Updated docstrings clearly explain the unlimited retry strategy and exponential backoff with capping. ✅ Consistent approach: Both Azure and Tencent ASR extensions now use unlimited reconnection, providing consistent behavior across providers. Issues and ConcernsCritical: Resource Exhaustion RiskLocation: reconnect_manager.py:51-111 The unlimited retry strategy with no circuit breaker or maximum timeout poses significant risks:
Recommendation: Consider one of these approaches:
Moderate: Error Handling Semantics ChangedLocation: extension.py:523-530 The removal of the can_retry() check in _handle_reconnect() means error codes sent via send_asr_error() are now marked as FATAL_ERROR on every attempt, but the extension continues retrying. This is semantically inconsistent. Recommendation: Change to ModuleErrorCode.NON_FATAL_ERROR.value or create a new error code like RECONNECTION_ATTEMPT_FAILED to better reflect the transient nature. Moderate: Missing Configuration ValidationLocation: test_invalid_params.py:63-68 The test was updated to include a key parameter with an empty string. However, with unlimited retries, an empty/invalid key will cause infinite reconnection attempts. The extension should validate critical configuration parameters (key, region) at initialization and fail fast with a clear error rather than entering an infinite retry loop. Recommendation: Add validation in on_init() to check for empty/invalid credentials before attempting any connection. Minor: Inconsistent Max Delay ValuesAzure max_delay=2.4s vs Tencent reconnect_max_delay=4s. Consider documenting why different values are appropriate for each provider. Minor: Test File Name Changetest_multi_language.py:306 - Changed from 16k_zh_cn_hotwords.pcm to 16k_zh_cn.pcm. This appears unrelated to the reconnection changes. Performance Considerations
Security Concerns
Test CoverageGood Coverage:
Missing Test Cases:
Code Quality✅ Clean, readable code with clear comments Recommendations SummaryBefore Merge:
Post-Merge (Nice to Have):
Overall AssessmentThis is a good improvement that addresses a real operational issue. The code quality is high and the testing demonstrates the feature works as intended. However, the unlimited retry strategy needs safeguards to prevent resource exhaustion and infinite loops in edge cases. Recommendation: Request changes to add at least a maximum total retry duration before merging. Review generated with Claude Code |
Pull Request Review: Refine Reconnect Strategy in Azure ASR ExtensionSummaryThis PR changes the Azure ASR extension's reconnection strategy from a limited retry approach (5 attempts) to an unlimited retry approach with exponential backoff. Similar changes are applied to the Tencent ASR extension. Positive Aspects1. Improved Resilience
2. Good Test Coverage
3. Code Quality
Critical Concerns1. Lack of Circuit Breaker / Escape MechanismSeverity: HIGH The unlimited retry strategy has no escape mechanism for persistent failures. Consider these scenarios:
Recommendation: Implement a circuit breaker pattern to detect persistent failures and alert operators. 2. Error Reporting IssuesSeverity: MEDIUM In reconnect_manager.py:102-109, every reconnection failure reports a FATAL_ERROR. This is problematic because FATAL_ERROR is semantically incorrect if the system will keep retrying. This may flood error monitoring systems with false alarms and makes it difficult to distinguish between temporary and permanent failures. Recommendation:
3. Missing Graceful Shutdown MechanismSeverity: MEDIUM With unlimited retries, there is no clean way to stop the reconnection loop. A shutdown flag should be added to allow clean exit from retry loops. 4. Memory Leak RiskSeverity: LOW-MEDIUM The attempts counter in reconnect_manager.py:70 grows indefinitely without bounds. While an integer overflow is unlikely in Python, this could cause issues in long-running processes. Recommendation: Cap the counter or reset it periodically. Code-Specific Issues5. Inconsistent Visibility ChangeIn reconnect_manager.py:33, reset_counter() was changed from public to private. This changes the public API without clear justification. If external code depends on reset_counter(), this is a breaking change. 6. Outdated Comment in extension.pyThe method comment at azure_asr_python/extension.py:520 mentions "max attempts are reached" which is now outdated. It should be updated to reflect unlimited retries. Architecture & Design Concerns7. Inconsistency with Other ASR ExtensionsSeverity: MEDIUM The codebase has 12+ other ASR extensions with ReconnectManager classes. This PR only updates Azure and Tencent extensions. Questions:
Recommendation: Consider extracting ReconnectManager to a shared module and applying the unlimited retry strategy consistently across all ASR extensions. 8. Configuration FlexibilitySeverity: LOW The retry parameters are hardcoded (base_delay = 0.5, max_delay = 4.0). For the Tencent extension these are passed as parameters, which is good. For Azure, they use defaults. Recommendation: Consider making these configurable via manifest/config files for operational flexibility. Testing Gaps9. Missing Test CasesThe new test file test_unlimited_reconnect.py is good but could be improved with additional scenarios:
Security Considerations10. Resource ExhaustionSeverity: MEDIUM Unlimited retries could be exploited where an attacker causing continuous connection failures could keep the system in a retry loop indefinitely. This consumes CPU, memory, and logging resources. Recommendation:
Minor Issues11. Test File Change in test_multi_language.pyThe audio file name change from "16k_zh_cn_hotwords.pcm" to "16k_zh_cn.pcm" seems unrelated to the PR's main purpose. If this is a bug fix, mention it in the PR description. 12. Documentation ClarityIn reconnect_manager.py:13, consider clarifying "0.5s, 1s, 2s, 4s (capped)" to "0.5s, 1s, 2s, 4s, 4s, 4s..." to make it clear the delay stays at 4s. Recommendations SummaryHigh Priority:
Medium Priority:
Low Priority:
Final VerdictOverall Assessment: The PR improves resilience but introduces operational risks due to the lack of escape mechanisms for persistent failures. Recommendation: Request Changes - the PR needs circuit breaker logic and proper error classification before merging to production. The core idea is sound, but production systems need safeguards against infinite retry loops. With the suggested improvements, this would be a solid enhancement to the ASR extensions. |