Fix a number of em/strong issues (#641, #642, #643) #644
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #641, fixes #642 and fixes #643.
I put all these fixes in the same PR because I wanted to make sure they were compatible with each other, but that does mean it's a bit messy.
Middle word em breaking
(*em*)(#641)The middle word em regex would check for
*_chars that aren't preceded by another em char or a whitespace. The text(*em*)matches that, as the leading em char is not preceded by a space. It would then prevent this text from being processed as a valid em.I've updated the regex to look for ems that aren't preceded by non-word chars (instead of whitespace) and that fixed this issue.
The result is that we process this as expected:
Improve handling for leading underscores (#642)
In this issue we had what looked like an
<em>span, but it was straddling two other<strong>spans:This is not a valid em. Spans can be nested but they shouldn't stay open after the parent span closes.
I added some additional logic in the italics and bold stage that will check to see if the matched strong/em has any nested spans and that those spans are balanced and closed. If not, the strong/em is deemed invalid.
The result is that we process the strongs here, but not the em:
Consecutive strong/em can overlap (#643)
The strong/em regexes were starting their matches early as possible, and including as much text in the span as possible. This lead to the following text being processed like so:
**strong***em***strong**<strong>strong*</strong>em<strong>*strong</strong><strong>strong<em></strong>em<strong></em>strong</strong>This renders fine in most browsers, but is invalid html.
To fix this, I modified the strong regex to try to ignore as many leading
*_chars as possible to try to get the opening<strong>tag as close to the actual contents as possible, and try to close the</strong>as soon as possible.Previously the strong regex would process
***abc***as<strong>*abc*</strong>but now it will do*<strong>abc</strong>The effect of this is when we have consecutive strong and ems, they won't overlap anymore.
The unfortunate side effect is Github will process
***abc**as<strong>*abc</strong>, but we will output*<strong>abc</strong>instead, omitting that first em char from the span.