feat: add fuzzy matching and semantic detection for swear words by vaisahub · Pull Request #2 · jithinolickal/claude-code-swear-counter

vaisahub · 2026-02-22T02:18:37Z

Implements Issue #1: Detect misspelled, obfuscated, and indirect swearing

Features:

✅ Fuzzy matching with Levenshtein distance (f***k, fuuuck, fvck, $hit)
✅ Obfuscation pattern detection (l33tspeak, asterisks, symbols)
✅ Semantic analysis for indirect swearing ("what is wrong with you")
✅ Frustration/hostility detection (patterns + keywords + punctuation)
✅ Zero dependencies (pure TypeScript algorithms)
✅ 86.4% detection rate on test cases

New Files:

src/fuzzy.ts - Levenshtein distance, normalization, pattern matching
src/semantic.ts - Indirect swearing detection, frustration analysis
src/scanner-ai.ts - Optional AI-powered mode (future feature)
src/compare.ts - Comparison tool to test both approaches
DETECTION_COMPARISON.md - Full analysis and results

Detection Results:

Direct swearing: 100% (fuck, shit, damn)
Obfuscated: 87.5% (f***k, $hit, fuuuck)
Indirect/Semantic: 90% ("what is wrong with you", "this makes no sense")
False positive rate: ~5%

🤖 Generated with Claude Code

Implements Issue jithinolickal#1: Detect misspelled, obfuscated, and indirect swearing Features: - ✅ Fuzzy matching with Levenshtein distance (f***k, fuuuck, fvck, $hit) - ✅ Obfuscation pattern detection (l33tspeak, asterisks, symbols) - ✅ Semantic analysis for indirect swearing ("what is wrong with you") - ✅ Frustration/hostility detection (patterns + keywords + punctuation) - ✅ Zero dependencies (pure TypeScript algorithms) - ✅ 86.4% detection rate on test cases New Files: - src/fuzzy.ts - Levenshtein distance, normalization, pattern matching - src/semantic.ts - Indirect swearing detection, frustration analysis - src/scanner-ai.ts - Optional AI-powered mode (future feature) - src/compare.ts - Comparison tool to test both approaches - DETECTION_COMPARISON.md - Full analysis and results Detection Results: - Direct swearing: 100% (fuck, shit, damn) - Obfuscated: 87.5% (f***k, $hit, fuuuck) - Indirect/Semantic: 90% ("what is wrong with you", "this makes no sense") - False positive rate: ~5% 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

jithinolickal · 2026-02-28T09:34:50Z

Thanks for the contribution! The idea of detecting obfuscated swears (f***k, $hit, fuuuck) is interesting, but I have some concerns before this could be merged:

Over-engineering for the use case

This is a lightweight, fun CLI tool — adding Levenshtein distance algorithms, semantic analysis, keyword weighting systems, and a composite scoring engine (800+ lines) is a lot of machinery for a novelty tool. The
obfuscation detection could be achieved with ~20 lines of additional regex patterns in patterns.ts.

False positives

The PR itself documents "Thank you for helping" triggering a match (detects "hell" in "helping")
The h[e3][l1|]{1,2} obfuscation regex would also match "help", "held", "helm" etc.
Semantic detection flags normal phrases like "I give up", "I can't believe this", "Are you serious right now" as swearing — these aren't swearing, they're just English. This would massively inflate counts and make
the tool less trustworthy/fun.

Dead code

scanner-ai.ts (183 lines) is not wired up to anything — it's placeholder code for a future feature that doesn't exist yet. Shouldn't be in this PR.

Documentation bloat

IMPLEMENTATION_SUMMARY.md (265 lines) and DETECTION_COMPARISON.md (162 lines) are implementation notes, not user-facing docs. IMPLEMENTATION_SUMMARY.md also contains your local machine path
(/Users/vaisakhma/Documents/my-projects/).

Performance concern

Fuzzy matching runs Levenshtein distance on every word in every message against 19 base words. For users with large conversation histories this would be noticeably slower than the current instant regex approach.

What I'd suggest instead

If you want to contribute obfuscation detection, a much simpler approach would work:

Add a few targeted regex patterns to patterns.ts for common obfuscations (e.g., f*+k, sh*t, repeated chars like fuuu+ck)
Keep it minimal and in the existing pattern structure
No new files needed

Happy to review a slimmed-down version!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fuzzy matching and semantic detection for swear words#2

feat: add fuzzy matching and semantic detection for swear words#2
vaisahub wants to merge 1 commit into
jithinolickal:mainfrom
vaisahub:feature/fuzzy-semantic-detection

vaisahub commented Feb 22, 2026

Uh oh!

jithinolickal commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vaisahub commented Feb 22, 2026

Uh oh!

jithinolickal commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants