feat: expose confidence of soniox tokens #1956

anexpn · 2026-01-07T08:55:56Z

No description provided.

claude · 2026-01-07T09:00:23Z

Code Review: Expose Confidence of Soniox Tokens

Thank you for this contribution! The implementation adds confidence score averaging to Soniox ASR results. Here's my detailed review:

✅ Strengths

Well-Structured Implementation: The _calculate_average_confidence method is clean, focused, and handles edge cases properly (extension.py:607-614)
Excellent Test Coverage: The unit tests in test_confidence.py are comprehensive:
- All tokens with confidence
- Some tokens missing confidence
- All tokens missing confidence
- Empty token list
- Single token cases
- Boundary values (0.0 and 1.0)
Proper Null Handling: Correctly filters out None confidence values before averaging
Consistent with Framework Patterns: The metadata structure follows the pattern used by other ASR extensions (e.g., Speechmatics)

🔍 Observations & Suggestions

1. Metadata Structure Consistency

The code adds confidence to metadata["asr_info"] (extension.py:639). While this works, I noticed:

Other ASR extensions (Deepgram, Google) don't currently populate metadata with confidence
Speechmatics uses a flat metadata dict approach

Recommendation: Consider documenting the asr_info structure or making it consistent across ASR extensions. This nested structure is fine, but consistency helps downstream consumers.

2. Integration Test Gap

The unit tests are excellent, but there's no integration test verifying:

That confidence values flow through the full ASR pipeline
That metadata reaches the asr_result data output
How the extension behaves with real Soniox tokens containing confidence

Looking at test_soniox_asr.py:130-191, the integration test creates tokens but doesn't include confidence values.

Recommendation: Add an integration test case similar to:

token = SonioxTranscriptToken(
    text="hello world",
    start_ms=0,
    end_ms=1000,
    is_final=True,
    language="en",
    confidence=0.95  # Add confidence
)

Then verify the metadata contains the expected confidence in the output.

3. Empty Metadata Object Creation

When there's no confidence, the code still creates an empty dict (extension.py:637-639):

metadata = {}
if avg_confidence is not None:
    metadata["asr_info"] = {"confidence": avg_confidence}

Minor suggestion: Consider only creating metadata when needed:

metadata = (
    {"asr_info": {"confidence": avg_confidence}}
    if avg_confidence is not None
    else {}
)

This is purely stylistic and the current approach is perfectly acceptable.

4. Potential Edge Case: Zero Confidence

The test at line 70-76 of test_confidence.py correctly tests boundary values (0.0 and 1.0). However, 0.0 confidence is a valid value but might indicate low reliability. Consider if downstream systems need to distinguish between:

No confidence data available (None)
Very low confidence (0.0 or near-zero)

Note: This is likely fine as-is since the averaging properly handles 0.0, but worth considering for API documentation.

🎯 Code Quality

Type Hints: Excellent use of Optional[float] return type
Documentation: Good docstring explaining the None-skipping behavior
Readability: Clear and maintainable code
No Security Concerns: Pure data transformation with no external dependencies
No Performance Issues: O(n) complexity is appropriate for token processing

📋 Minor Nitpicks

Test Import Location (test_confidence.py:3): The relative import from ..extension import SonioxASRExtension works but consider if from soniox_asr_python.extension import SonioxASRExtension would be more consistent with the test framework patterns.
Float Comparison Precision (test_confidence.py:24): Using abs(result - expected) < 0.0001 is good practice. Consider if the test framework has a standard epsilon constant.

🚀 Overall Assessment

This is a solid, production-ready implementation. The code is clean, well-tested at the unit level, and integrates smoothly into the existing architecture.

Approval Status: ✅ Approved with minor suggestions for enhancement

The main recommendation is to add an integration test to verify the confidence flows through the complete pipeline. Everything else is either optional refinement or documentation suggestions.

Great work!

Review generated by Claude Code

claude · 2026-01-07T11:13:19Z

Pull Request Review: feat: expose confidence of soniox tokens

Summary

This PR adds confidence score calculation to the Soniox ASR extension by computing the average confidence from token-level confidence values and exposing it in the ASR result metadata. The implementation is well-structured with comprehensive unit tests.

✅ Code Quality & Best Practices

Strengths:

Clean separation of concerns: The _calculate_average_confidence() method is properly isolated and follows single responsibility principle
Good documentation: Clear docstring explaining the function's behavior
Proper error handling: Gracefully handles None confidence values
Type hints: Function signature includes proper type annotations (List[SonioxTranscriptToken], Optional[float])
Follows repository patterns: Implementation matches the metadata pattern used elsewhere in the codebase

Minor Suggestions:

Metadata initialization location (extension.py:637-639): Consider initializing metadata as empty dict earlier if you plan to add more metadata fields in the future, though current approach is fine for single field:

# Current approach is acceptable, but if more fields are added:
metadata = {}
avg_confidence = self._calculate_average_confidence(tokens)
if avg_confidence is not None:
    metadata["asr_info"] = {"confidence": avg_confidence}
# Could add more metadata fields here

Version bump: Version changed from 0.3.6 to 0.3.8 (skipping 0.3.7). This is fine if intentional, but ensure this aligns with your versioning strategy.

✅ Potential Bugs & Edge Cases

No bugs identified. The implementation handles all edge cases properly:

✓ Empty token list returns None
✓ All tokens with None confidence returns None
✓ Mixed None and valid confidence values - correctly filters out None
✓ Division by zero prevented by checking empty list before division
✓ No mutation of input data

One consideration:

Empty tokens list: In _create_single_asr_result() at extension.py:621-623, there's an assumption that tokens list is non-empty:
```
start_ms = tokens[0].start_ms  # Would raise IndexError if tokens is empty
end_ms = tokens[-1].end_ms
```
The _calculate_average_confidence() handles empty lists gracefully, but the calling function doesn't. This appears to be a pre-existing assumption in the codebase, so not introduced by this PR, but worth noting.

⚡ Performance Considerations

Excellent performance characteristics:

Single pass through tokens: O(n) time complexity for confidence calculation
List comprehension: Efficient filtering using list comprehension
No unnecessary allocations: Minimal memory overhead
Lazy evaluation: Only creates metadata dict when confidence exists

No performance concerns.

🔒 Security Considerations

No security issues identified.

No external input directly used in calculations
No injection risks
Confidence values are numeric and validated by Soniox API
No logging of sensitive data in new code

✅ Test Coverage

Excellent test coverage! The test file covers all critical scenarios:

Comprehensive test cases:

✓ All tokens have confidence (test_all_tokens_have_confidence)
✓ Some tokens missing confidence (test_some_tokens_missing_confidence)
✓ All tokens missing confidence (test_all_tokens_missing_confidence)
✓ Empty token list (test_empty_token_list)
✓ Single token with confidence (test_single_token_with_confidence)
✓ Single token without confidence (test_single_token_without_confidence)
✓ Boundary values 0.0 and 1.0 (test_confidence_range_boundaries)

Test quality observations:

Uses proper floating-point comparison with tolerance (abs(result - expected) < 0.0001)
Follows repository test patterns (setup_method, pytest style)
Clear test names and docstrings
Good use of fixtures

Missing integration test:
The unit tests are excellent, but consider adding an integration test that verifies:

The confidence metadata appears in the final asr_result Data message
The metadata structure matches expectations downstream

You could add to test_soniox_asr.py:

# In the on_data handler, after line 115:
if "metadata" in data_dict:
    metadata = data_dict["metadata"]
    if "asr_info" in metadata and "confidence" in metadata["asr_info"]:
        confidence = metadata["asr_info"]["confidence"]
        self.stop_test_if_checking_failed(
            ten_env_tester,
            isinstance(confidence, (int, float)) and 0.0 <= confidence <= 1.0,
            f"confidence should be a number between 0 and 1: {confidence}",
        )

📋 Additional Recommendations

Documentation: Consider updating any user-facing documentation or API docs to mention the new metadata.asr_info.confidence field
Changelog: If you maintain a changelog, document this new feature
Property schema: Check if manifest.json API definitions need updating to document the metadata field structure

🎯 Overall Assessment

Status: ✅ APPROVED with minor suggestions

This is a well-implemented feature with:

Clean, maintainable code
Comprehensive test coverage
No bugs or security issues
Good performance characteristics
Follows repository conventions

The code is ready to merge. The only suggestion is to consider adding an integration test to verify the metadata propagates correctly through the full ASR pipeline.

Great work! 🚀

claude · 2026-01-07T11:16:00Z

Pull Request Review: Expose Confidence of Soniox Tokens

Summary

This PR adds confidence score calculation and exposure for Soniox ASR results by computing the average confidence from individual tokens and including it in the result metadata.

✅ Positive Aspects

Code Quality

Clean implementation: The _calculate_average_confidence method is well-structured and follows Python best practices
Proper type hints: Good use of Optional[float] return type and List[SonioxTranscriptToken] parameter
Clear logic: The filtering of None values and handling of empty lists is correct
Good documentation: The docstring clearly explains the method's behavior

Test Coverage

Comprehensive unit tests: The test file test_confidence.py covers all important scenarios:
- All tokens with confidence values
- Mixed tokens (some with, some without confidence)
- All tokens missing confidence
- Empty token list
- Single token cases
- Boundary values (0.0 and 1.0)
Proper test structure: Using pytest and class-based organization
Good test naming: Test names clearly describe what they're testing

🔍 Issues and Concerns

1. Metadata Structure Inconsistency (High Priority)

Issue: The metadata structure used in this PR doesn't follow the established patterns in the codebase.

In extension.py:637-639:

metadata = {}
if avg_confidence is not None:
    metadata["asr_info"] = {"confidence": avg_confidence}

Problem:

Other ASR extensions in the codebase don't pass metadata in this nested structure
The ASRResult metadata field is primarily used for session_id tracking (as evidenced by integration tests)
No other ASR extension uses asr_info as a nested key in metadata
This could potentially conflict with or overwrite existing metadata like session_id

Current code behavior:

# Your PR creates metadata like:
metadata = {"asr_info": {"confidence": 0.85}}

# But existing code expects:
metadata = {"session_id": "user_session_123"}

Recommendation: Consider these alternatives:

Option A - Add as top-level metadata field:

metadata = {}
if avg_confidence is not None:
    metadata["confidence"] = avg_confidence

Option B - Preserve existing metadata and merge:

metadata = tokens[0].metadata if hasattr(tokens[0], 'metadata') else {}
if avg_confidence is not None:
    metadata["confidence"] = avg_confidence

Option C - Use the top-level confidence field:
Based on research, ASRResult has a top-level optional confidence field. Consider using:

return ASRResult(
    text=text,
    final=is_final,
    start_ms=self._adjust_timestamp(start_ms),
    duration_ms=duration_ms,
    language=language,
    words=words,
    confidence=avg_confidence,  # Use top-level field instead
)

Files to check:

extension.py:641-649 - ASRResult instantiation

2. Missing Integration Test (Medium Priority)

Issue: While unit tests are excellent, there's no integration test verifying that confidence values flow through the entire system.

Recommendation: Add an integration test to test_soniox_asr.py that:

Creates tokens with confidence values
Verifies the confidence appears correctly in the asr_result Data output
Checks the metadata structure matches expectations

Example addition to test_soniox_asr.py:

def test_confidence_in_asr_result(patch_soniox_ws):
    # Similar to test_asr_result but with confidence values
    token = SonioxTranscriptToken(
        text="hello world",
        start_ms=0,
        end_ms=1000,
        is_final=True,
        language="en",
        confidence=0.9
    )
    # ... verify confidence appears in result

Files to check:

tests/test_soniox_asr.py:130-191 - Add confidence test case

3. Version Bump Seems Large (Low Priority)

Issue: Version jumped from 0.3.6 to 0.3.8, skipping 0.3.7.

Question: Was this intentional? If not, consider using 0.3.7 instead.

Files to check:

manifest.json:4 - Version number

🔒 Security Considerations

No security issues identified. The confidence calculation operates on already-validated float values.

⚡ Performance Considerations

The average confidence calculation is O(n) where n is the number of tokens, which is acceptable
The list comprehension for filtering is efficient
No performance concerns identified

📋 Recommendations Summary

Must Address:

Clarify metadata structure: Decide whether to use nested asr_info or top-level confidence field. The top-level field seems more consistent with ASRResult's design.

Should Address:

Add integration test: Ensure confidence values flow through the entire system correctly

Nice to Have:

Verify version number: Confirm the skip from 0.3.6 to 0.3.8 is intentional

🎯 Overall Assessment

The code quality is good with solid unit test coverage. However, there's a potential architectural mismatch with how metadata is structured. I recommend clarifying the intended metadata structure before merging, preferably by using the top-level confidence field of ASRResult if available.

Recommendation: Request changes to clarify metadata structure.

claude · 2026-01-07T14:17:26Z

Code Review

Summary

This PR adds confidence score exposure for Soniox ASR tokens by calculating the average confidence across all tokens in a result and including it in the metadata. The implementation is solid with comprehensive test coverage.

✅ Strengths

Clean Implementation: The _calculate_average_confidence method is well-designed:
- Properly handles None confidence values by filtering them out
- Returns None when no valid confidence values exist
- Simple and efficient averaging logic
Excellent Test Coverage: The test file (test_confidence.py) is comprehensive with 7 test cases covering:
- All tokens with confidence
- Mixed tokens (some with/without confidence)
- Edge cases (empty list, all None values)
- Single token scenarios
- Boundary values (0.0 and 1.0)
Backward Compatibility: The metadata structure is additive-only:
- Only adds asr_info.confidence when confidence is available
- Existing consumers won't break if they don't expect this field
- Follows existing metadata pattern used elsewhere in the codebase
Type Safety: Proper use of Optional[float] return type clearly indicates when confidence might not be available

🔍 Observations

Version Bump: The manifest version was appropriately bumped from 0.3.7 to 0.3.8 ✓
Metadata Structure: The confidence is nested under metadata.asr_info.confidence, which creates a namespace for future ASR-related metadata
Documentation Gap: The README.md shows a metadata example but doesn't mention the new confidence field. Consider updating it in a follow-up to help users discover this feature:
```
"metadata": {
  "session_id": "session_identifier",
  "asr_info": {
    "confidence": 0.95
  }
}
```

🤔 Minor Considerations

Integration Test: While unit tests are excellent, consider adding an integration test that verifies the confidence field appears correctly in the actual asr_result data sent through the TEN framework pipeline (in a future PR if not already covered by existing tests)
Confidence Range Documentation: Consider documenting the expected range (0.0 to 1.0) in the docstring or README

🎯 Verdict

Approved - This is a clean, well-tested feature addition that enhances the ASR extension without breaking existing functionality. The code quality is high and follows best practices.

Review conducted by Claude Code

claude · 2026-01-07T14:32:07Z

Pull Request Review: feat: expose confidence of soniox tokens

Thank you for this contribution! This PR adds confidence score calculation to the Soniox ASR extension. Overall, the implementation is solid and well-tested. Here are my findings:

✅ Strengths

Clean Implementation: The _calculate_average_confidence method is well-designed with proper handling of None values
Excellent Test Coverage: The test file (test_confidence.py) is comprehensive with 7 test cases covering edge cases including:
- All tokens with confidence
- Mixed tokens (some with/without confidence)
- Empty lists
- Boundary values (0.0, 1.0)
Consistent with Codebase: The metadata pattern follows existing conventions (e.g., speechmatics_asr_python/tests/mock.py:40)
Proper Version Bump: Updated manifest version from 0.3.7 to 0.3.8

📝 Code Quality Observations

extension.py:614-621 - The confidence calculation logic is clean and efficient:

Properly filters out None values
Returns None when no valid confidence values exist
Simple arithmetic mean calculation

extension.py:642-646 - Good defensive coding:

Only adds metadata when confidence is available
Uses nested structure {"asr_info": {"confidence": ...}} for extensibility

test_confidence.py - Excellent test structure:

Follows pytest conventions
Uses class-based test organization
Includes descriptive docstrings
Tests use appropriate float comparison (abs(result - expected) < 0.0001)

🔍 Potential Considerations

Metadata Structure Documentation: Consider documenting the metadata["asr_info"]["confidence"] structure so downstream consumers know how to access it. However, this may already be documented elsewhere in the codebase.
Test Execution: The tests look solid but I couldn't verify they pass in the CI environment. Ensure they run successfully with the full test suite.
Integration Testing: While unit tests are excellent, consider if integration tests exist or are needed to verify the confidence scores flow correctly through the entire ASR pipeline.

🔒 Security & Performance

✅ No security concerns identified
✅ Performance impact is minimal (simple arithmetic on small token lists)
✅ No potential for memory leaks or resource issues

📊 Summary

This is a well-implemented feature with excellent test coverage. The code follows Python best practices and integrates cleanly with the existing codebase. The changes are focused and don't introduce unnecessary complexity.

Recommendation: ✅ Approve - This PR is ready to merge pending successful CI tests.

Note: This review assumes the ASRResult class from ten_ai_base.asr already supports the metadata parameter, which appears to be the case based on other ASR extensions in the codebase.

anexpn requested review from halajohn and plutoless as code owners January 7, 2026 08:55

wangyoucao577 previously approved these changes Jan 7, 2026

View reviewed changes

feat: expose confidence of soniox tokens

00a5910

anexpn dismissed wangyoucao577’s stale review via 00a5910 January 7, 2026 11:10

anexpn force-pushed the push-kkyllkpvuvpv branch from 89cd16f to 00a5910 Compare January 7, 2026 11:10

Merge branch 'main' into push-kkyllkpvuvpv

5bd9d68

Merge branch 'main' into push-kkyllkpvuvpv

5f16ec4

plutoless approved these changes Jan 7, 2026

View reviewed changes

Merge branch 'main' into push-kkyllkpvuvpv

636f2b1

plutoless approved these changes Jan 7, 2026

View reviewed changes

plutoless merged commit 0e991ca into main Jan 8, 2026
34 checks passed

plutoless deleted the push-kkyllkpvuvpv branch January 8, 2026 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: expose confidence of soniox tokens #1956

feat: expose confidence of soniox tokens #1956

Uh oh!

anexpn commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: expose confidence of soniox tokens #1956

feat: expose confidence of soniox tokens #1956

Uh oh!

Conversation

anexpn commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Code Review: Expose Confidence of Soniox Tokens

✅ Strengths

🔍 Observations & Suggestions

1. Metadata Structure Consistency

2. Integration Test Gap

3. Empty Metadata Object Creation

4. Potential Edge Case: Zero Confidence

🎯 Code Quality

📋 Minor Nitpicks

🚀 Overall Assessment

Uh oh!

claude bot commented Jan 7, 2026

Pull Request Review: feat: expose confidence of soniox tokens

Summary

✅ Code Quality & Best Practices

✅ Potential Bugs & Edge Cases

⚡ Performance Considerations

🔒 Security Considerations

✅ Test Coverage

📋 Additional Recommendations

🎯 Overall Assessment

Uh oh!

claude bot commented Jan 7, 2026

Pull Request Review: Expose Confidence of Soniox Tokens

Summary

✅ Positive Aspects

Code Quality

Test Coverage

🔍 Issues and Concerns

1. Metadata Structure Inconsistency (High Priority)

2. Missing Integration Test (Medium Priority)

3. Version Bump Seems Large (Low Priority)

🔒 Security Considerations

⚡ Performance Considerations

📋 Recommendations Summary

Must Address:

Should Address:

Nice to Have:

🎯 Overall Assessment

Uh oh!

claude bot commented Jan 7, 2026

Code Review

Summary

✅ Strengths

🔍 Observations

🤔 Minor Considerations

🎯 Verdict

Uh oh!

claude bot commented Jan 7, 2026

Pull Request Review: feat: expose confidence of soniox tokens

✅ Strengths

📝 Code Quality Observations

🔍 Potential Considerations

🔒 Security & Performance

📊 Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants