Skip to content

Conversation

@tychy
Copy link
Owner

@tychy tychy commented Dec 30, 2025

Summary

  • PDFのToUnicodeマップで「邉」が私用領域コード(U+E157)にマッピングされていたため、「渡邉」が「渡」だけにパースされていた問題を修正
  • normalizeKanji関数にコード57687→「邉」のマッピングを追加

Test plan

  • make test passed
  • sample1523.pdfで「渡邉紘平」が正しくパースされることを確認

Add mapping for private use area code 57687 (U+E157) to 邉 in
normalizeKanji function. This fixes the issue where the name
渡邉 was incorrectly parsed as just 渡.
@tychy tychy merged commit 669c09c into main Dec 30, 2025
2 checks passed
@tychy tychy deleted the fix/parse-watanabe-kanji branch December 30, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants