Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize method of stripping validation response of affixed HTML comments #6021

Merged

Conversation

pierlon
Copy link
Contributor

@pierlon pierlon commented Mar 24, 2021

Summary

Fixes #6011

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@pierlon pierlon self-assigned this Mar 24, 2021
@pierlon pierlon added the WS:Core Work stream for Plugin core label Mar 24, 2021
@pierlon pierlon added this to the v2.1 milestone Mar 24, 2021
@codecov
Copy link

codecov bot commented Mar 24, 2021

Codecov Report

Merging #6021 (219a7ef) into develop (d955987) will increase coverage by 0.00%.
The diff coverage is 88.88%.

❗ Current head 219a7ef differs from pull request most recent head b222c5d. Consider uploading reports for the commit b222c5d to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #6021   +/-   ##
==========================================
  Coverage      75.26%   75.27%           
- Complexity      5703     5709    +6     
==========================================
  Files            218      218           
  Lines          17275    17283    +8     
==========================================
+ Hits           13002    13009    +7     
- Misses          4273     4274    +1     
Flag Coverage Δ Complexity Δ
javascript 80.05% <ø> (ø) 0.00 <ø> (ø)
php 75.06% <88.88%> (+<0.01%) 0.00 <0.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...s/amp-validation-status/revalidate-notification.js 80.00% <ø> (ø) 0.00 <0.00> (?)
...nents/amp-validation-status/status-notification.js 100.00% <ø> (ø) 0.00 <0.00> (?)
...block-validation/components/error/error-content.js 93.54% <ø> (ø) 0.00 <0.00> (?)
...k-validation/components/error/error-panel-title.js 100.00% <ø> (ø) 0.00 <0.00> (?)
...ock-validation/components/error/error-type-icon.js 100.00% <ø> (ø) 0.00 <0.00> (?)
...idation/components/error/get-error-source-title.js 93.75% <ø> (ø) 0.00 <0.00> (?)
...ets/src/block-validation/components/error/index.js 100.00% <ø> (ø) 0.00 <0.00> (?)
...sets/src/block-validation/components/icon/index.js 100.00% <ø> (ø) 0.00 <0.00> (?)
...alidation/components/sidebar-notification/index.js 100.00% <ø> (ø) 0.00 <0.00> (?)
...dation/components/with-amp-toolbar-button/index.js 100.00% <ø> (ø) 0.00 <0.00> (?)
... and 5 more

@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2021

Plugin builds for b222c5d are ready 🛎️!

@@ -1848,7 +1848,7 @@ public static function validate_url( $url ) {
$response = ltrim( $response );

// Strip HTML comments that may have been injected at the end of the response (e.g. by a caching plugin).
$response = preg_replace( '/<!--.*?-->\s*$/s', '', $response );
$response = preg_replace( '/}\s*?<!--.*?-->\s*$/', '}', $response );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the s will cause a problem in that it will not match a comment like this:

</body></html>
<!--
	generated 2 seconds ago
	generated in 1.134 seconds
	served from batcache in 0.003 seconds
	expires in 298 seconds
-->

I've added a failing test case to demonstrate this: 9f7ef9c.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the s flag in b7c7e29.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an alternative approach is needed, perhaps atomic grouping? This would seem to be avoidable if the regex engine started searching from the end of the string rather than the beginning.

@pierlon
Copy link
Contributor Author

pierlon commented Mar 30, 2021

I think an alternative approach is needed, perhaps atomic grouping? This would seem to be avoidable if the regex engine started searching from the end of the string rather than the beginning.

I've tried several different variants of atomic grouping, but all either failed to find a match or led to catastrophic backtracing, unfortunately. I've also looked into searching from the end of the string, but PCRE doesn't support that feature.

One flaw I just noticed in the current regex is that it is possible for catastrophic backtracing to occur if the HTML comment being matched contains too much text, as demonstrated here. I'm not sure how to resolve that as yet.

@schlessera
Copy link
Collaborator

How about not using a regex, but rather fseek() or similar from the end and cycle through the string?

I'm not entirely sure what that means in terms of performance, but:

  • You can bail very early if the response ends with a } and not a -->. So, most of the times, it might just take a look at the last char and bail.
  • It cannot lead to catastrophic backtracking. Worst case is cycling over the entire string character by character once.

@schlessera
Copy link
Collaborator

Actually, I'm not sure it is even possible to cycle backwards char-by-char when you can have multibyte chars. I don't think you can detect that when scanning from back to front.

@schlessera
Copy link
Collaborator

schlessera commented Mar 30, 2021

How about something like this: https://3v4l.org/orG5Y

@pierlon
Copy link
Contributor Author

pierlon commented Mar 31, 2021

How about something like this: 3v4l.org/orG5Y

That seems like a great alternative. Any objections @westonruter?

@westonruter
Copy link
Member

Yes, that looks good. I made a small tweak to make sure that the $length is large enough to subtract from: https://3v4l.org/BTeIn

Co-authored-by: Alain Schlesser <[email protected]>
Co-authored-by: Weston Ruter <[email protected]>
@pierlon pierlon requested a review from westonruter March 31, 2021 04:48
@pierlon pierlon changed the title Optimize regex used to strip validation response of affixed HTML comments Optimize method of stripping validation response of affixed HTML comments Mar 31, 2021
@westonruter
Copy link
Member

I love this:

image

Thanks for the co-authorship attribution!

@westonruter westonruter merged commit b9bf0d2 into develop Mar 31, 2021
@westonruter westonruter deleted the fix/6011-catastrophic-backtrace-during-validation branch March 31, 2021 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WS:Core Work stream for Plugin core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validating a URL that has a massive number of validation errors causes validation request to fail
3 participants