Refactor: Improve space insertion logic for Pinyin conversion #29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous approach to adding spaces was overly mechanical, indiscriminately inserting spaces without considering the context of surrounding characters. This resulted in unexpected spaces in the output.
This commit refactors the space insertion logic to be context-aware. It now checks if adjacent characters belong to unicode.Punct or unicode.Symbol categories. Spaces are only inserted if the neighboring characters are not punctuation or symbols. This eliminates the need for a separate replacement step to remove redundant spaces added by the previous mechanical approach.
Additionally, the "allowed characters" setting has been removed. This ensures that all content from the original text is displayed in the Pinyin output, preventing the loss of characters such as book titles marks like 《》 and French characters, which were previously excluded by the character filtering mechanism.