Fix middle-word-em interfering with strongs (#637) #639
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #637.
The issue came down to the fact that the extra would run before the italics and bold stage. It would attempt to ignore the
<strong>and then process strict<em>s and middle-word-ems. The problem is that the syntax for strongs and ems are very similar, and trying to craft a regex that can differentiate is tough.The way this extra worked previously was to process valid
<em>syntax and then hash anything that looks like<em>syntax but isn't quite valid.The new approach is simply to find any
_or*character in the middle of a word and hash it. This way, the regular italics and bold stage don't have to worry about them and we can keep the regexes simple.The hash we use is basically the same that you find in
self._escape_tableexcept we prefix the extra's name to the input text to prevent interference with escaped/hashed chars from other stages