Skip to content

Conversation

@anexpn
Copy link
Contributor

@anexpn anexpn commented Jan 7, 2026

No description provided.

@claude
Copy link

claude bot commented Jan 7, 2026

Code Review: Expose Confidence of Soniox Tokens

Thank you for this contribution! The implementation adds confidence score averaging to Soniox ASR results. Here's my detailed review:

Strengths

  1. Well-Structured Implementation: The _calculate_average_confidence method is clean, focused, and handles edge cases properly (extension.py:607-614)

  2. Excellent Test Coverage: The unit tests in test_confidence.py are comprehensive:

    • All tokens with confidence
    • Some tokens missing confidence
    • All tokens missing confidence
    • Empty token list
    • Single token cases
    • Boundary values (0.0 and 1.0)
  3. Proper Null Handling: Correctly filters out None confidence values before averaging

  4. Consistent with Framework Patterns: The metadata structure follows the pattern used by other ASR extensions (e.g., Speechmatics)

🔍 Observations & Suggestions

1. Metadata Structure Consistency

The code adds confidence to metadata["asr_info"] (extension.py:639). While this works, I noticed:

  • Other ASR extensions (Deepgram, Google) don't currently populate metadata with confidence
  • Speechmatics uses a flat metadata dict approach

Recommendation: Consider documenting the asr_info structure or making it consistent across ASR extensions. This nested structure is fine, but consistency helps downstream consumers.

2. Integration Test Gap

The unit tests are excellent, but there's no integration test verifying:

  • That confidence values flow through the full ASR pipeline
  • That metadata reaches the asr_result data output
  • How the extension behaves with real Soniox tokens containing confidence

Looking at test_soniox_asr.py:130-191, the integration test creates tokens but doesn't include confidence values.

Recommendation: Add an integration test case similar to:

token = SonioxTranscriptToken(
    text="hello world",
    start_ms=0,
    end_ms=1000,
    is_final=True,
    language="en",
    confidence=0.95  # Add confidence
)

Then verify the metadata contains the expected confidence in the output.

3. Empty Metadata Object Creation

When there's no confidence, the code still creates an empty dict (extension.py:637-639):

metadata = {}
if avg_confidence is not None:
    metadata["asr_info"] = {"confidence": avg_confidence}

Minor suggestion: Consider only creating metadata when needed:

metadata = (
    {"asr_info": {"confidence": avg_confidence}}
    if avg_confidence is not None
    else {}
)

This is purely stylistic and the current approach is perfectly acceptable.

4. Potential Edge Case: Zero Confidence

The test at line 70-76 of test_confidence.py correctly tests boundary values (0.0 and 1.0). However, 0.0 confidence is a valid value but might indicate low reliability. Consider if downstream systems need to distinguish between:

  • No confidence data available (None)
  • Very low confidence (0.0 or near-zero)

Note: This is likely fine as-is since the averaging properly handles 0.0, but worth considering for API documentation.

🎯 Code Quality

  • Type Hints: Excellent use of Optional[float] return type
  • Documentation: Good docstring explaining the None-skipping behavior
  • Readability: Clear and maintainable code
  • No Security Concerns: Pure data transformation with no external dependencies
  • No Performance Issues: O(n) complexity is appropriate for token processing

📋 Minor Nitpicks

  1. Test Import Location (test_confidence.py:3): The relative import from ..extension import SonioxASRExtension works but consider if from soniox_asr_python.extension import SonioxASRExtension would be more consistent with the test framework patterns.

  2. Float Comparison Precision (test_confidence.py:24): Using abs(result - expected) < 0.0001 is good practice. Consider if the test framework has a standard epsilon constant.

🚀 Overall Assessment

This is a solid, production-ready implementation. The code is clean, well-tested at the unit level, and integrates smoothly into the existing architecture.

Approval Status: ✅ Approved with minor suggestions for enhancement

The main recommendation is to add an integration test to verify the confidence flows through the complete pipeline. Everything else is either optional refinement or documentation suggestions.

Great work!


Review generated by Claude Code

wangyoucao577
wangyoucao577 previously approved these changes Jan 7, 2026
@claude
Copy link

claude bot commented Jan 7, 2026

Pull Request Review: feat: expose confidence of soniox tokens

Summary

This PR adds confidence score calculation to the Soniox ASR extension by computing the average confidence from token-level confidence values and exposing it in the ASR result metadata. The implementation is well-structured with comprehensive unit tests.


✅ Code Quality & Best Practices

Strengths:

  1. Clean separation of concerns: The _calculate_average_confidence() method is properly isolated and follows single responsibility principle
  2. Good documentation: Clear docstring explaining the function's behavior
  3. Proper error handling: Gracefully handles None confidence values
  4. Type hints: Function signature includes proper type annotations (List[SonioxTranscriptToken], Optional[float])
  5. Follows repository patterns: Implementation matches the metadata pattern used elsewhere in the codebase

Minor Suggestions:

  1. Metadata initialization location (extension.py:637-639): Consider initializing metadata as empty dict earlier if you plan to add more metadata fields in the future, though current approach is fine for single field:

    # Current approach is acceptable, but if more fields are added:
    metadata = {}
    avg_confidence = self._calculate_average_confidence(tokens)
    if avg_confidence is not None:
        metadata["asr_info"] = {"confidence": avg_confidence}
    # Could add more metadata fields here
  2. Version bump: Version changed from 0.3.6 to 0.3.8 (skipping 0.3.7). This is fine if intentional, but ensure this aligns with your versioning strategy.


✅ Potential Bugs & Edge Cases

No bugs identified. The implementation handles all edge cases properly:

✓ Empty token list returns None
✓ All tokens with None confidence returns None
✓ Mixed None and valid confidence values - correctly filters out None
✓ Division by zero prevented by checking empty list before division
✓ No mutation of input data

One consideration:

  • Empty tokens list: In _create_single_asr_result() at extension.py:621-623, there's an assumption that tokens list is non-empty:
    start_ms = tokens[0].start_ms  # Would raise IndexError if tokens is empty
    end_ms = tokens[-1].end_ms
    The _calculate_average_confidence() handles empty lists gracefully, but the calling function doesn't. This appears to be a pre-existing assumption in the codebase, so not introduced by this PR, but worth noting.

⚡ Performance Considerations

Excellent performance characteristics:

  1. Single pass through tokens: O(n) time complexity for confidence calculation
  2. List comprehension: Efficient filtering using list comprehension
  3. No unnecessary allocations: Minimal memory overhead
  4. Lazy evaluation: Only creates metadata dict when confidence exists

No performance concerns.


🔒 Security Considerations

No security issues identified.

  • No external input directly used in calculations
  • No injection risks
  • Confidence values are numeric and validated by Soniox API
  • No logging of sensitive data in new code

✅ Test Coverage

Excellent test coverage! The test file covers all critical scenarios:

Comprehensive test cases:

  1. ✓ All tokens have confidence (test_all_tokens_have_confidence)
  2. ✓ Some tokens missing confidence (test_some_tokens_missing_confidence)
  3. ✓ All tokens missing confidence (test_all_tokens_missing_confidence)
  4. ✓ Empty token list (test_empty_token_list)
  5. ✓ Single token with confidence (test_single_token_with_confidence)
  6. ✓ Single token without confidence (test_single_token_without_confidence)
  7. ✓ Boundary values 0.0 and 1.0 (test_confidence_range_boundaries)

Test quality observations:

  • Uses proper floating-point comparison with tolerance (abs(result - expected) < 0.0001)
  • Follows repository test patterns (setup_method, pytest style)
  • Clear test names and docstrings
  • Good use of fixtures

Missing integration test:
The unit tests are excellent, but consider adding an integration test that verifies:

  • The confidence metadata appears in the final asr_result Data message
  • The metadata structure matches expectations downstream

You could add to test_soniox_asr.py:

# In the on_data handler, after line 115:
if "metadata" in data_dict:
    metadata = data_dict["metadata"]
    if "asr_info" in metadata and "confidence" in metadata["asr_info"]:
        confidence = metadata["asr_info"]["confidence"]
        self.stop_test_if_checking_failed(
            ten_env_tester,
            isinstance(confidence, (int, float)) and 0.0 <= confidence <= 1.0,
            f"confidence should be a number between 0 and 1: {confidence}",
        )

📋 Additional Recommendations

  1. Documentation: Consider updating any user-facing documentation or API docs to mention the new metadata.asr_info.confidence field

  2. Changelog: If you maintain a changelog, document this new feature

  3. Property schema: Check if manifest.json API definitions need updating to document the metadata field structure


🎯 Overall Assessment

Status: ✅ APPROVED with minor suggestions

This is a well-implemented feature with:

  • Clean, maintainable code
  • Comprehensive test coverage
  • No bugs or security issues
  • Good performance characteristics
  • Follows repository conventions

The code is ready to merge. The only suggestion is to consider adding an integration test to verify the metadata propagates correctly through the full ASR pipeline.

Great work! 🚀

@claude
Copy link

claude bot commented Jan 7, 2026

Pull Request Review: Expose Confidence of Soniox Tokens

Summary

This PR adds confidence score calculation and exposure for Soniox ASR results by computing the average confidence from individual tokens and including it in the result metadata.

✅ Positive Aspects

Code Quality

  • Clean implementation: The _calculate_average_confidence method is well-structured and follows Python best practices
  • Proper type hints: Good use of Optional[float] return type and List[SonioxTranscriptToken] parameter
  • Clear logic: The filtering of None values and handling of empty lists is correct
  • Good documentation: The docstring clearly explains the method's behavior

Test Coverage

  • Comprehensive unit tests: The test file test_confidence.py covers all important scenarios:
    • All tokens with confidence values
    • Mixed tokens (some with, some without confidence)
    • All tokens missing confidence
    • Empty token list
    • Single token cases
    • Boundary values (0.0 and 1.0)
  • Proper test structure: Using pytest and class-based organization
  • Good test naming: Test names clearly describe what they're testing

🔍 Issues and Concerns

1. Metadata Structure Inconsistency (High Priority)

Issue: The metadata structure used in this PR doesn't follow the established patterns in the codebase.

In extension.py:637-639:

metadata = {}
if avg_confidence is not None:
    metadata["asr_info"] = {"confidence": avg_confidence}

Problem:

  • Other ASR extensions in the codebase don't pass metadata in this nested structure
  • The ASRResult metadata field is primarily used for session_id tracking (as evidenced by integration tests)
  • No other ASR extension uses asr_info as a nested key in metadata
  • This could potentially conflict with or overwrite existing metadata like session_id

Current code behavior:

# Your PR creates metadata like:
metadata = {"asr_info": {"confidence": 0.85}}

# But existing code expects:
metadata = {"session_id": "user_session_123"}

Recommendation: Consider these alternatives:

Option A - Add as top-level metadata field:

metadata = {}
if avg_confidence is not None:
    metadata["confidence"] = avg_confidence

Option B - Preserve existing metadata and merge:

metadata = tokens[0].metadata if hasattr(tokens[0], 'metadata') else {}
if avg_confidence is not None:
    metadata["confidence"] = avg_confidence

Option C - Use the top-level confidence field:
Based on research, ASRResult has a top-level optional confidence field. Consider using:

return ASRResult(
    text=text,
    final=is_final,
    start_ms=self._adjust_timestamp(start_ms),
    duration_ms=duration_ms,
    language=language,
    words=words,
    confidence=avg_confidence,  # Use top-level field instead
)

Files to check:

  • extension.py:641-649 - ASRResult instantiation

2. Missing Integration Test (Medium Priority)

Issue: While unit tests are excellent, there's no integration test verifying that confidence values flow through the entire system.

Recommendation: Add an integration test to test_soniox_asr.py that:

  1. Creates tokens with confidence values
  2. Verifies the confidence appears correctly in the asr_result Data output
  3. Checks the metadata structure matches expectations

Example addition to test_soniox_asr.py:

def test_confidence_in_asr_result(patch_soniox_ws):
    # Similar to test_asr_result but with confidence values
    token = SonioxTranscriptToken(
        text="hello world",
        start_ms=0,
        end_ms=1000,
        is_final=True,
        language="en",
        confidence=0.9
    )
    # ... verify confidence appears in result

Files to check:

  • tests/test_soniox_asr.py:130-191 - Add confidence test case

3. Version Bump Seems Large (Low Priority)

Issue: Version jumped from 0.3.6 to 0.3.8, skipping 0.3.7.

Question: Was this intentional? If not, consider using 0.3.7 instead.

Files to check:

  • manifest.json:4 - Version number

🔒 Security Considerations

No security issues identified. The confidence calculation operates on already-validated float values.

⚡ Performance Considerations

  • The average confidence calculation is O(n) where n is the number of tokens, which is acceptable
  • The list comprehension for filtering is efficient
  • No performance concerns identified

📋 Recommendations Summary

Must Address:

  1. Clarify metadata structure: Decide whether to use nested asr_info or top-level confidence field. The top-level field seems more consistent with ASRResult's design.

Should Address:

  1. Add integration test: Ensure confidence values flow through the entire system correctly

Nice to Have:

  1. Verify version number: Confirm the skip from 0.3.6 to 0.3.8 is intentional

🎯 Overall Assessment

The code quality is good with solid unit test coverage. However, there's a potential architectural mismatch with how metadata is structured. I recommend clarifying the intended metadata structure before merging, preferably by using the top-level confidence field of ASRResult if available.

Recommendation: Request changes to clarify metadata structure.

@claude
Copy link

claude bot commented Jan 7, 2026

Code Review

Summary

This PR adds confidence score exposure for Soniox ASR tokens by calculating the average confidence across all tokens in a result and including it in the metadata. The implementation is solid with comprehensive test coverage.

✅ Strengths

  1. Clean Implementation: The _calculate_average_confidence method is well-designed:

    • Properly handles None confidence values by filtering them out
    • Returns None when no valid confidence values exist
    • Simple and efficient averaging logic
  2. Excellent Test Coverage: The test file (test_confidence.py) is comprehensive with 7 test cases covering:

    • All tokens with confidence
    • Mixed tokens (some with/without confidence)
    • Edge cases (empty list, all None values)
    • Single token scenarios
    • Boundary values (0.0 and 1.0)
  3. Backward Compatibility: The metadata structure is additive-only:

    • Only adds asr_info.confidence when confidence is available
    • Existing consumers won't break if they don't expect this field
    • Follows existing metadata pattern used elsewhere in the codebase
  4. Type Safety: Proper use of Optional[float] return type clearly indicates when confidence might not be available

🔍 Observations

  1. Version Bump: The manifest version was appropriately bumped from 0.3.7 to 0.3.8 ✓

  2. Metadata Structure: The confidence is nested under metadata.asr_info.confidence, which creates a namespace for future ASR-related metadata

  3. Documentation Gap: The README.md shows a metadata example but doesn't mention the new confidence field. Consider updating it in a follow-up to help users discover this feature:

    "metadata": {
      "session_id": "session_identifier",
      "asr_info": {
        "confidence": 0.95
      }
    }

🤔 Minor Considerations

  1. Integration Test: While unit tests are excellent, consider adding an integration test that verifies the confidence field appears correctly in the actual asr_result data sent through the TEN framework pipeline (in a future PR if not already covered by existing tests)

  2. Confidence Range Documentation: Consider documenting the expected range (0.0 to 1.0) in the docstring or README

🎯 Verdict

Approved - This is a clean, well-tested feature addition that enhances the ASR extension without breaking existing functionality. The code quality is high and follows best practices.


Review conducted by Claude Code

@claude
Copy link

claude bot commented Jan 7, 2026

Pull Request Review: feat: expose confidence of soniox tokens

Thank you for this contribution! This PR adds confidence score calculation to the Soniox ASR extension. Overall, the implementation is solid and well-tested. Here are my findings:

Strengths

  1. Clean Implementation: The _calculate_average_confidence method is well-designed with proper handling of None values
  2. Excellent Test Coverage: The test file (test_confidence.py) is comprehensive with 7 test cases covering edge cases including:
    • All tokens with confidence
    • Mixed tokens (some with/without confidence)
    • Empty lists
    • Boundary values (0.0, 1.0)
  3. Consistent with Codebase: The metadata pattern follows existing conventions (e.g., speechmatics_asr_python/tests/mock.py:40)
  4. Proper Version Bump: Updated manifest version from 0.3.7 to 0.3.8

📝 Code Quality Observations

extension.py:614-621 - The confidence calculation logic is clean and efficient:

  • Properly filters out None values
  • Returns None when no valid confidence values exist
  • Simple arithmetic mean calculation

extension.py:642-646 - Good defensive coding:

  • Only adds metadata when confidence is available
  • Uses nested structure {"asr_info": {"confidence": ...}} for extensibility

test_confidence.py - Excellent test structure:

  • Follows pytest conventions
  • Uses class-based test organization
  • Includes descriptive docstrings
  • Tests use appropriate float comparison (abs(result - expected) < 0.0001)

🔍 Potential Considerations

  1. Metadata Structure Documentation: Consider documenting the metadata["asr_info"]["confidence"] structure so downstream consumers know how to access it. However, this may already be documented elsewhere in the codebase.

  2. Test Execution: The tests look solid but I couldn't verify they pass in the CI environment. Ensure they run successfully with the full test suite.

  3. Integration Testing: While unit tests are excellent, consider if integration tests exist or are needed to verify the confidence scores flow correctly through the entire ASR pipeline.

🔒 Security & Performance

  • ✅ No security concerns identified
  • ✅ Performance impact is minimal (simple arithmetic on small token lists)
  • ✅ No potential for memory leaks or resource issues

📊 Summary

This is a well-implemented feature with excellent test coverage. The code follows Python best practices and integrates cleanly with the existing codebase. The changes are focused and don't introduce unnecessary complexity.

Recommendation: ✅ Approve - This PR is ready to merge pending successful CI tests.


Note: This review assumes the ASRResult class from ten_ai_base.asr already supports the metadata parameter, which appears to be the case based on other ASR extensions in the codebase.

@plutoless plutoless merged commit 0e991ca into main Jan 8, 2026
34 checks passed
@plutoless plutoless deleted the push-kkyllkpvuvpv branch January 8, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants