An implementation of audio comparison using mel-spectrograms and the Longest Common Subsequence (LCS) algorithm, demonstrated through three modification cases.
-
Feature Extraction
- Mel-spectrogram conversion
- Parameters:
- n_mels: 128 (frequency bins)
- hop_length: 512 (frame shift)
- win_length: 2048 (frame size)
- Sample rate: 16kHz
-
Distance Calculation
- Frame-to-frame Euclidean distance
- Distance matrix computation
- Adaptive thresholding
-
Sequence Alignment
- LCS algorithm implementation
- Dynamic programming approach
- Matching path identification
Purpose: Demonstrate word substitution while maintaining context
Implementation:
- Original: "the big cat"
- Modified: "the small cat"
- Process:
- Generate audio segments
- Add 200ms silence between words
- Compare sequences
- Identify replaced segment
Expected Results:
- Matching segments at start/end
- Different segment in middle
- Preserved timing structure
!python create_demo_replacement.py
Purpose: Demonstrate word addition between existing words
Implementation:
- Original: "the cat"
- Modified: "the big cat"
- Process:
- Generate base audio
- Insert new word
- Add silence buffers
- Analyze timeline expansion
Expected Results:
- Matching segments before/after insertion
- New segment identified
- Timeline shift detection
!python create_demo_insertion.py
Purpose: Demonstrate word removal from sequence
Implementation:
- Original: "the big cat"
- Modified: "the cat"
- Process:
- Generate full sequence
- Remove middle word
- Maintain silence buffers
- Analyze timeline compression
Expected Results:
- Matching segments preserved
- Deleted segment identified
- Timeline compression detected
!python create_demo_deletion.py
Purpose: Show temporal alignment
- Dual waveform display
- Matching point connections
- Color-coded segments
- Green: Matching
- Red: Modified
- Gray: Connections
Purpose: Display similarity patterns
- Frame-level distances
- Alignment path
- Modification regions
Purpose: Detailed matching visualization
- Temporal correspondence
- Segment boundaries
- Modification highlights
Purpose: Threshold analysis
- Distance distribution
- Similarity patterns
- Threshold selection