Fix parsing of 職務執行者 with half-width space separator #159

tychy · 2025-11-07T11:59:10Z

Summary

Fixed parsing issue for 職務執行者 (job executor) entries that use half-width spaces between position and name
Updated regex pattern in parse_body.go to accept both half-width ( ) and full-width (　) spaces

Problem

The parser was only matching position-name pairs separated by full-width spaces (　+). However, some registration documents (samples 770, 796, 797, 866) use half-width spaces, causing 職務執行者 entries to be missed.

Solution

Changed the regex pattern from (%s)　+([%s]+) to (%s)[ 　]+([%s]+) to accept both space types.

Test plan

Verified samples 770, 796, 797, 866 now pass their tests
Ran full test suite (1522 samples) - all tests pass
No regressions in existing functionality

🤖 Generated with Claude Code

Previously, the parser only recognized 職務執行者 (job executor) entries when they were separated from names by full-width spaces (　). However, some registration documents use half-width spaces instead. This commit updates the regex pattern to accept both half-width and full-width spaces between position titles and names. Fixes parsing for samples: 770, 796, 797, 866 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

tychy · 2025-11-07T12:06:02Z

After investigation, the parser already handles 職務執行者 correctly in all mentioned samples (770, 796, 797, 866). The PDF text extraction produces full-width spaces (U+3000), which the existing regex pattern already matches. This change was unnecessary.

tychy closed this Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix parsing of 職務執行者 with half-width space separator #159

Fix parsing of 職務執行者 with half-width space separator #159

Uh oh!

tychy commented Nov 7, 2025

Uh oh!

tychy commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix parsing of 職務執行者 with half-width space separator #159

Fix parsing of 職務執行者 with half-width space separator #159

Uh oh!

Conversation

tychy commented Nov 7, 2025

Summary

Problem

Solution

Test plan

Uh oh!

tychy commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants