You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-6
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ In browsers:
50
50
- A modern regex baseline so you don't need to continually opt-in to best practices.
51
51
- Always-on flag <kbd>v</kbd> gives you the best level of Unicode support, extra features, and strict errors.
52
52
- Always-on implicit flag <kbd>x</kbd> allows you to freely add whitespace and comments to your regexes.
53
-
- Always-on implicit flag <kbd>n</kbd> (*no auto capture* mode) improves the readability and efficiency of your regexes.
53
+
- Always-on implicit flag <kbd>n</kbd> (*named capture only* mode) improves the readability and efficiency of your regexes.
54
54
- No unreadable escaped backslashes `\\\\` since it's a raw string template tag.
55
55
- Atomic groups via `(?>…)` that can dramatically improve performance and prevent ReDoS.
56
56
- Context-aware and safe interpolation of regexes, strings, and partial patterns.
@@ -89,7 +89,7 @@ Due to years of legacy and backward compatibility, regular expression syntax in
89
89
4. UnicodeSets mode with flag <kbd>v</kbd>, an upgrade to <kbd>u</kbd> which improves case-insensitive matching and changes escaping rules within character classes, in addition to adding new features/syntax.
90
90
</details>
91
91
92
-
Additionally, JavaScript regex syntax is hard to write and even harder to read and refactor. But it doesn't have to be that way! With a few key features — raw template strings, insignificant whitespace, comments, no auto capture, and interpolation (coming soon: definition blocks and subexpressions as subroutines) — even long and complex regexes can be beautiful, grammatical, and easy to understand.
92
+
Additionally, JavaScript regex syntax is hard to write and even harder to read and refactor. But it doesn't have to be that way! With a few key features — raw template strings, insignificant whitespace, comments, *named capture only* mode, and interpolation (coming soon: definition blocks and subexpressions as subroutines) — even long and complex regexes can be beautiful, grammatical, and easy to understand.
93
93
94
94
`Regex.make` adds all of these features and returns native `RegExp` instances. It always uses flag <kbd>v</kbd> (already a best practice for new regexes) so you never forget to turn it on and don't have to worry about the differences in other parsing modes. It supports atomic groups via `(?>…)` to help you improve the performance of your regexes and avoid catastrophic backtracking. And it gives you best-in-class, context-aware interpolation of `RegExp` instances, escaped strings, and partial patterns.
95
95
@@ -193,12 +193,12 @@ const date = Regex.make`
193
193
194
194
### Flag `n`
195
195
196
-
Flag <kbd>n</kbd> gives you *no auto capture* mode, which turns `(…)`into a non-capturing group but preserves named capture. It's always implicitly on, though it doesn't extend into interpolated `RegExp` instances (to avoid changing their meaning).
196
+
Flag <kbd>n</kbd> gives you *named capture only* mode, which prevents the grouping metacharacters `(…)`from capturing. It's always implicitly on, though it doesn't extend into interpolated `RegExp` instances (to avoid changing their meaning).
197
197
198
-
Motivation: Requiring the syntactically clumsy `(?:…)` where you could just use `(…)` hurts readability and encourages adding unneeded captures (which hurt efficiency and refactoring). Flag <kbd>n</kbd> fixes this, making your regexes more readable.
198
+
Requiring the syntactically clumsy `(?:…)` where you could just use `(…)` hurts readability and encourages adding unneeded captures (which hurt efficiency and refactoring). Flag <kbd>n</kbd> fixes this, making your regexes more readable.
199
199
200
200
> [!NOTE]
201
-
> Flag <kbd>n</kbd> is based on .NET, C++, PCRE, Perl, and XRegExp, which share the `n` flag letter but call it *explicit capture*, *no auto capture*, or *nosubs*. In `Regex.make`, the implicit flag <kbd>n</kbd> also disables numbered backreferences to named groups in the outer regex, which follows the behavior of C++. Referring to named groups by number is a footgun, and the way that named groups are numbered is inconsistent across regex flavors.
201
+
> Flag <kbd>n</kbd> is based on .NET, C++, PCRE, Perl, and XRegExp, which share the `n` flag letter but call it *explicit capture*, *no auto capture*, or *nosubs*. In `Regex.make`, the implicit flag <kbd>n</kbd> also prevents using numbered backreferences to named groups in the outer regex, which follows the behavior of C++. Referring to named groups by number is a footgun, and the way that named groups are numbered is inconsistent across regex flavors.
202
202
203
203
> Aside: Flag <kbd>n</kbd>'s behavior also enables `Regex.make` to emulate atomic groups and recursion.
204
204
@@ -263,7 +263,7 @@ Some examples of where context awareness comes into play:
263
263
- A `~` is not escaped at the top level, but it must be escaped within character classes in case it's immediately followed by another `~` (in or outside of the interpolation) which would turn it into a reserved UnicodeSets double punctuator.
264
264
- Leading digits must be escaped if they're preceded by a numbered backreference or `\0`, else `RegExp` throws (or in Unicode-unaware mode they might turn into octal escapes).
265
265
- Letters `A`-`Z` and `a`-`z` must be escaped if preceded by uncompleted token `\c`, else they'll convert what should be an error into a valid token that probably doesn't match what you expect.
266
-
- You can't escape your way out of protecting against a preceding unescaped `\`. Doing nothing could turn e.g. `w` into `\w` and introduce a bug, but then escaping the first character (e.g. with a hex code) wouldn't prevent the `\` from mangling it, and if you escaped the preceding `\` elsewhere in your code you'd change its meaning.
266
+
- You can't escape your way out of protecting against a preceding unescaped `\`. Doing nothing could turn e.g. `w` into `\w` and introduce a bug, but then escaping the first character wouldn't prevent the `\` from mangling it, and if you escaped the preceding `\` elsewhere in your code you'd change its meaning.
267
267
268
268
These and other issues (including the effects of current and future flags like `x`) make escaping without context unsafe to use at arbitrary positions in a regex, or at least complicated to get right. The existing popular regex escaping libraries are all pretty bad at giving you something you can use reliably.
269
269
@@ -443,6 +443,7 @@ The above descriptions of interpolation might feel complex. But there are three
443
443
444
444
-*Atomized* means that that something is treated as a complete unit; it isn't related to the *atomic groups* feature. Example: In default context, `${x}*` matches any number of the value specified by `x`, and not just its last token. In character class context, set operators (union, subtraction, intersection) apply to the entire atom.
445
445
-*Sandboxed* means that the value can't change the meaning or error status of characters outside of the interpolation, and vice versa.
446
+
- Character classes have a sub-context on the borders of ranges. Only one character node (ex: `a` or `\u0061`) can be interpolated at these positions. This is explained in [*Interpolating partial patterns*](#interpolating-partial-patterns).
446
447
447
448
> The implementation details vary for how `Regex.make` accomplishes sandboxing and atomization, based on the details of the specific pattern. But the concepts should always hold up.
0 commit comments