Commit 64c8942
committed
fix(es/ast): Fix unicode lone surrogates handling (#10987)
**Description:**
This PR fixed an issue related to lone surrogates handling in Rust.
This fix's credits all go to Oxc team
#10978 (comment).
What I'm doing is porting the fix that was made in Oxc and make it
working under SWC.
### Problem:
The problem is related to the fundamental difference between how Rust
and JavaScript handle Unicode, especially lone surrogates.
**JavaScript's Unicode Model**
```javascript
// JavaScript allows this - lone surrogates are stored in UTF-16
let str = "\uD800"; // High surrogate alone - technically invalid Unicode
let obj = { "\uD800": "value" }; // Works fine in JS
```
JavaScript uses UTF-16 internally and tolerates invalid Unicode
sequences:
- Strings are UTF-16 code unit sequences, not Unicode scalar sequences
- Lone surrogates (U+D800-U+DFFF) are allowed and preserved
- No validation that surrogates come in proper high/low pairs
- Engine just stores the raw UTF-16 code units
**Rust's Unicode Model**
```rust
// This CANNOT exist in Rust:
let s = "\u{D800}"; // ❌ COMPILE ERROR - not a valid Unicode scalar
let c: char = '\u{D800}'; // ❌ COMPILE ERROR - char excludes surrogates
```
Rust enforces strict Unicode validity:
- String is UTF-8 and must contain valid Unicode scalar values
- char represents Unicode scalar values (U+0000-U+D7FF, U+E000-U+10FFFF)
- Surrogate code points (U+D800-U+DFFF) are explicitly excluded
- No way to represent lone surrogates in Rust's standard string types
### Key Changes:
1. AST Structure: Added `lone_surrogates: bool` field to `Str` and
`TplElement` structs to track when strings contain lone surrogates
2. Encoding Strategy: Lone surrogates are encoded using \u{FFFD}
(replacement character) followed by the original hex digits for internal
representation
3. Code Generation: Modified string output to properly escape lone
surrogates back to \uXXXX format during codegen
4. Test: Also fixed some cases related to member expression
optimizations and string concatenation optimizations
### TODOs:
1. Add support for serializing and deserializing literals with lone
surrogates in `swc_estree_compat`
2. Reflect AST changes in `binding` crates
### Breaking changes:
Breaks the AST by adding `lone_surrogates` field to `Str` and
`TplElement` and breaks the `value` and `cooked` respectly in `Str` and
`TplElement`. Both of the field is using `\u{FFFD}` (Replacement
Character) as an escape if `lone_surrogates` set to `true`.
To consume the real value, you need to first check if `lone_surrogates`
is `true`, then unescape it by removing the char and construct it with
the four trailing hexs(from `\u{FFFD}D800` to `\uD800`).
**Related issue:**
- Closes #10978
- Closes #10353
Fixed a regression of #76781 parent ed2fdce commit 64c8942
File tree
2,237 files changed
+30470
-15079
lines changed- .changeset
- bindings/binding_core_node/src
- crates
- swc_bundler
- examples
- src
- tests
- swc_ecma_ast/src
- swc_ecma_codegen
- src
- tests/fixture
- issue-10978
- string
- template-literal
- vercel/2
- swc_ecma_compat_es2015/src
- block_scoping
- classes
- swc_ecma_compat_es2018/src
- swc_ecma_compat_es2022/src/class_properties
- swc_ecma_compat_es3/src
- swc_ecma_lexer/src
- common
- lexer
- parser
- lexer
- swc_ecma_minifier
- src
- compress
- optimize
- pure
- option
- tests
- benches-full
- fixture
- issues/2257/full
- next/wrap-contracts
- terser/compress/ascii/ascii_only_true_identifier_es5
- swc_ecma_parser
- src
- lexer
- parser
- expr
- jsx
- tests
- common
- jsx/basic
- 18
- 19
- 3
- 4
- 7
- custom
- issue-614
- issue-615
- tpl-space
- tpl
- unary-paren
- unary
- fragment-6
- issue-10729
- issue-6522
- string
- template
- js
- deferred-import-evaluation
- attributes-declaration
- attributes-expression
- defer-as-default
- dynamic-import-no-createImportExpressions
- dynamic-import
- import-defer
- explicit-resource-management
- valid-using-as-identifier-for-in
- valid-using-as-identifier-for-of
- import-assertions-with-keyword
- dynamic-import-with-valid-syntax
- string-literal
- trailing-comma-dynamic
- trailing-comma
- valid-empty-assertion
- valid-export-variable
- valid-export-without-from
- valid-string-assertion-key
- valid-syntax-export-star-as-with-assertions
- valid-syntax-export-star-with-assertions
- valid-syntax-export-with-and-assertions-multiple-lines
- valid-syntax-export-with-assertions-and-value
- valid-syntax-export-with-assertions
- valid-syntax-export-with-invalid-value
- valid-syntax-export-with-no-type-assertion
- valid-syntax-export-with-object-method-assertion
- valid-syntax-export-without-assertions
- valid-syntax-with-assertions-and-value
- valid-syntax-with-assertions-multiple-lines
- valid-syntax-with-assertions
- valid-syntax-with-invalid-value
- valid-syntax-with-no-type-assertion
- valid-syntax-with-object-method-assertion
- valid-syntax-without-assertions
- without-plugin
- import-assertions
- dynamic-import-with-valid-syntax
- import-assert-call-expression
- string-literal
- trailing-comma-dynamic
- trailing-comma
- valid-empty-assertion
- valid-export-variable
- valid-export-without-from
- valid-string-assertion-key
- valid-syntax-export-star-as-with-attributes
- valid-syntax-export-star-with-attributes
- valid-syntax-export-with-and-attributes-multiple-lines
- valid-syntax-export-with-attributes-and-value
- valid-syntax-export-with-attributes
- valid-syntax-export-with-invalid-value
- valid-syntax-export-with-no-type-attribute
- valid-syntax-export-with-object-method-attribute
- valid-syntax-export-without-attributes
- valid-syntax-with-attributes-and-value
- valid-syntax-with-attributes-multiple-lines
- valid-syntax-with-attributes
- valid-syntax-with-invalid-value
- valid-syntax-with-no-type-attribute
- valid-syntax-with-object-method-attribute
- valid-syntax-without-attributes
- without-plugin
- import-attributes-deprecatedAssertKeyword
- _deprecated-syntax-not-enabled
- dynamic-import-with-valid-syntax
- import-assert-call-expression
- incorrect-arity
- string-literal
- trailing-comma-dynamic
- trailing-comma
- valid-empty-attribute
- valid-export-variable
- valid-export-without-from
- valid-string-attribute-key
- valid-syntax-export-star-as-with-attributes
- valid-syntax-export-star-with-attributes
- valid-syntax-export-with-and-attributes-multiple-lines
- valid-syntax-export-with-attributes-and-value
- valid-syntax-export-with-attributes
- valid-syntax-export-with-invalid-value
- valid-syntax-export-with-no-type-attribute
- valid-syntax-export-with-object-method-attribute
- valid-syntax-export-without-attributes
- valid-syntax-with-attributes-and-value
- valid-syntax-with-attributes-multiple-lines
- valid-syntax-with-attributes
- valid-syntax-with-invalid-value
- valid-syntax-with-no-type-attribute
- valid-syntax-with-object-method-attribute
- valid-syntax-without-attributes
- without-plugin
- import-attributes
- dynamic-import-with-valid-syntax
- string-literal
- trailing-comma-dynamic
- trailing-comma
- valid-empty-attribute
- valid-export-variable
- valid-export-without-from
- valid-string-attribute-key
- valid-syntax-export-star-as-with-attributes
- valid-syntax-export-star-with-attributes
- valid-syntax-export-with-and-attributes-multiple-lines
- valid-syntax-export-with-attributes-and-value
- valid-syntax-export-with-attributes
- valid-syntax-export-with-invalid-value
- valid-syntax-export-with-no-type-attribute
- valid-syntax-export-with-object-method-attribute
- valid-syntax-export-without-attributes
- valid-syntax-with-attributes-and-value
- valid-syntax-with-attributes-multiple-lines
- valid-syntax-with-attributes
- valid-syntax-with-invalid-value
- valid-syntax-with-no-type-attribute
- valid-syntax-with-object-method-attribute
- valid-syntax-without-attributes
- without-plugin
- issue-4176/1
- issue-8482
- source-phase-imports
- attributes-declaration
- attributes-expression
- dynamic-import-comments
- dynamic-import-createImportExpressions-false
- dynamic-import-createImportExpressions-true
- dynamic-import-no-createImportExpressions-babel7
- dynamic-import-options-comments
- dynamic-import-options
- dynamic-import
- import-default-binding-source
- import-source-binding-from
- import-source-binding-source
- import-source-comments
- import-source
- test262-error-references/fail
- tsc
- typescript
- amaro-194
- class
- method-return-type
- property-declare
- property-private
- cts
- custom
- arrow/complex-tsc
- default-followed-by-type
- arrow
- function
- dynamic-import-expr-ctx
- dynamic-import-top-level
- import-type/typeof
- as
- simple
- issue-259
- issue-327-1
- issue-327-2
- issue-374
- issue-401-const-assertion-literal-ts
- issue-401-const-assertion-literal
- issue-401-const-assertion-object-ts
- issue-401-const-assertion-object
- issue-461
- issue-535
- issue-623
- issue-716
- tsx-unary
- type-only/import
- aliased
- default
- specific
- type-only-specifier
- deno-9620/case1
- deno/dso/reflect
- enum/members-strings
- estree-compat/shorthand-ambient-module
- export/namespace-from
- import-assertions
- dynamic-import
- expr
- top
- export
- test1
- test2
- test3
- stmt
- test1
- test2
- import
- equals-require
- equals-type-only
- export-import-require
- export-import-type-require
- not-top-level
- instantiation-expr
- base
- relational
- issue-1505
- case1
- case2
- issue-1512/case1
- issue-1549
- issue-1708/case1
- issue-1862/case1
- issue-2417
- issue-2853
- issue-2896
- issue-3236
- issue-3241/1
- issue-3337
- issue-4178
- 1
- 2
- 3
- issue-4296/1
- issue-4911/3
- issue-6430
- issue-6601
- issue-7042
- case1
- case2
- case3
- issue-814/case1
- issue-8308
- 1
- 2
- 3
- issue-8526
- issue-913
- issue-915
- issue-944
- issue-9802
- module-namespace
- declare-shorthand
- global-in-module
- head-declare
- head
- mts
- next
- 0001
- stack-overflow/1
- object/getter-prop
- optional-chaining/optional-tagged-template-literals
- regression/member-expr-assign
- stack-size
- stc/0001
- template-literal-type
- ts-import-type
- type-arguments
- tagged-template-no-asi
- tagged-template
- types
- conditional-infer-extends/basic
- literal-string
- variance-annotations
- 1
- with_jsx
- vercel/web-875
- swc_ecma_preset_env/src
- swc_ecma_quote_macros/src/ast
- swc_ecma_transforms_base
- src/helpers
- tests
- swc_ecma_transforms_classes/src
- swc_ecma_transforms_module/src
- swc_ecma_transforms_optimization/src
- simplify/expr
- swc_ecma_transforms_proposal/src
- decorators
- legacy
- swc_ecma_transforms_react/src
- display_name
- jsx_src
- jsx
- refresh
- swc_ecma_utils/src
- swc_ecma_visit/src
- swc_estree_compat/src/swcify
- swc_node_bundler
- src/loaders
- tests
- swc_plugin_backend_tests/tests/fixture/swc_internal_plugin/src
- swc_plugin_backend_wasmer/src
- swc_typescript/src/fast_dts
- swc/tests
- fixture/issues-7xxx/7678/output
- tsc-references
- vercel/full/utf8-1/output
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
2,237 files changed
+30470
-15079
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
281 | 281 | | |
282 | 282 | | |
283 | 283 | | |
| 284 | + | |
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
0 commit comments