Unescape XHTML entities in JSX literals (moved from parser) #1514

syranide · 2014-05-11T12:00:49Z

We discussed this briefly some time ago, because   is unescaped in the parser and JSX receives no information about it, it ends up being identical to just (a regular space) which is trimmed by JSX if appearing left/right-most on a line.

Here's a simple proposal that moves the unescaping from the parser to JSX, I also made the unescaping routine a bit more accurate while also making it lenient like browsers, so XJS & JSX is valid (XJS & JSX is quite cumbersome and annoying).

Alternative implementation could have the parser parse both XJSText and XJSEntity for instance, and introducing XJSStringLiteral which would then consist of one or more XJSText/XJSEntity. This is perhaps the neater solution, but also more invasive (I don't mind though).

Feel free to agree, disagree, reject, etc.

Depends on facebookarchive/esprima#19

syranide · 2014-05-11T12:02:01Z

cc @jeffmo

facebook-github-bot · 2014-05-18T18:51:52Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

RReverser · 2014-08-05T10:06:53Z

We discussed this briefly some time ago, because is unescaped in the parser and JSX receives no information about it ...

That's not completely true. "raw" property for string literals in Esprima, Acorn etc. is provided exactly for reason of advanced handling, while literal's value stays preprocessed.

Currently JSX literals have this property as well - so transformer can use it for advanced operations while JSX parser will produce general-purpose output (preprocessed value in value + raw value in raw).

I'm not sure if there is much sense to make JSX literals behave differently from simple string literals in sake of React's post-processing.

syranide · 2014-08-05T10:12:16Z

@RReverser Hmm? Not sure if I understand, parsing XHTMLEntities in esprima is an issue as information is lost for React/JSX. Sure, esprima could still parse XHTMLEntities, but we wouldn't use them anyway for React/JSX so it makes no sense for esprima to keep doing it. Or?

RReverser · 2014-08-05T10:24:40Z

@syranide Feels like we don't understand each other.

Not sure if I understand, parsing XHTMLEntities in esprima is an issue as information is lost for React/JSX.

What information is lost? As I've told in previous comment, you have all the needed information returned from the parser - both pre-processed and raw strings. Check out https://github.com/facebook/esprima/blob/b697889d421178976dde9284c2dadc27a1828d26/test/fbtest.js#L422-L431 for example.

What I'm saying, if we want to keep JSX not hardly bound with React (and, AFAIK, we do want this), we shouldn't change current parser's behavior and instead we should teach React's transformer to use raw property whenever it needs (i.e., for distinguishing between   and (space)).

This way:

We preserve structure compatibility between XJS literals and native ones (preprocessed string in value, raw string in raw).
Any code that is fine with using pre-processed string literal and doesn't care about original entities, can simply use string from value, while any code that does make a difference between entities and produced values, also has everything needed in raw property.

Anyway, better to wait for @jeffmo's comments on this.

syranide · 2014-08-05T10:31:14Z

@RReverser Ah perhaps, I don't disagree, but AFAIK React/JSX would never want to use the decoded value (it also seems like a bad idea to mix use of "decode raw" and "value as-is"), so what use is it there in keeping it? Are there other projects that depend on it? That's where I'm confused :)

RReverser · 2014-08-05T10:36:41Z

React/JSX would never want to use the decoded value (it also seems like a bad idea to mix use of "decode raw" and "value as-is")

Currently, React uses decoded value everywhere, and trimming feels to me like a very specific task where raw value might be needed (and, as far as I understood from diff, your PR relies on this assumption as well).

syranide · 2014-08-05T10:42:34Z

@RReverser Right, it currently uses the decoded value everywhere, but I see no reason to keep doing it if we have to do some of the decoding in JSX anyway, there will be no benefit to reading the decoded value instead of always decoding the raw value. So we would have to maintain two separate decoding implementations in separate projects for no actual benefit?

RReverser · 2014-08-05T11:02:10Z

@syranide If it's just about trimming, you don't even need to decode values again - instead you can count spaces on the beginning and ending of raw string and cut value correspondingly.

Btw, do you know why the hell attribute values should be trimmed at all?

syranide · 2014-08-05T11:09:30Z

@RReverser Sure, we can count/cut/etc, but that seems super fragile and just decoding in JSX seems like the logical choice.

Quite a while since I last looked/touched that code, but IIRC there's quite a number of things about attributes that are kind of weird. But no, attributes values definitely shouldn't be trimmed, perhaps it's a multi-line thing? (JSX attribute strings can be multiline I think ... which is really weird)

RReverser · 2014-08-05T11:17:01Z

just decoding in JSX seems like the logical choice

I'll still disagree since any string decoding feels like parser's task that should not be moved out from it. In the worst case, I'd rather expose method for entity decoding from parser and reuse it in React, but definitely not remove it from parser itself.

Quite a while since I last looked/touched that code, but IIRC there's quite a number of things about attributes that is kind of weird.

Yeah, I think we should better figure out reasons for existing of this feature before arguing about solutions for bug introduced by it.

syranide · 2014-08-05T11:51:26Z

I'll still disagree since any string decoding feels like parser's task that should not be moved out from it. In the worst case, I'd rather expose method for entity decoding from parser and reuse it in React, but definitely not remove it from parser itself.

I'm inclined to agree, but I think there's a slight subtlety to decoding XHTMLEntities vs strings in the parser. Perhaps what should be done is to move some of the JSX inline text trimming to esprima... then esprima becomes a "true" XHTML parser and it suddenly makes sense, currently it's only a JS parser IMO (it expects JSX to process the values) and as such decoding XHTMLEntities doesn't make sense to me (because information is lost).

Not sure if I'm making any sense, but I think decoding should move from esprima to JSX or trimming should move from JSX to esprima (so that JSX doesn't doesn't do any "interpretation/processing" of the value). The latter probably makes a lot more sense the more I think about it, as that's how attributes are handled.

Yeah, I think we should better figure out reasons for existing of this feature before arguing about solutions for bug introduced by it.

The trimming I'm referring to applies to inline text, not attribute values. Or did I misunderstand?

RReverser · 2014-08-05T12:07:05Z

trimming should move from JSX to esprima (so that JSX doesn't doesn't do any "interpretation/processing" of the value)

Personally I like this idea.

The trimming I'm referring to applies to inline text, not attribute values. Or did I misunderstand?

Well, it's inside renderXJSString so it's applied to both. And while trimming inline text might make some sense (still not much IMO), trimming attribute values seems completely useless and probably even wrong to me:

> console.log(require('react-tools').transform('/** @jsx DOM */<x a="1" a="2\n3"> 1 2 </x>'))
/** @jsx DOM */x({a: "1", a: "2" + ' ' +
"3"}, " 1 2 ")

Maybe it's desired behavior, but anyway I don't see benefits from it.

syranide · 2014-08-05T12:13:39Z

Personally I like this idea.

Awesome, I think it makes a lot more sense than this PR the more I think about, and it seems like we both agree 👍

Maybe it's desired behavior, but anyway I don't see benefits from it.

Ah yes, that rings a bell. IIRC most of the attribute code uses non-attribute specific helpers, which kind of works, but for attributes has many small weird edge-cases. So unless @jeffmo thinks otherwise, it seems to me that attributes really should use their own logic where it makes sense.

RReverser · 2014-08-05T12:36:20Z

Awesome, I think it makes a lot more sense than this PR the more I think about, and it seems like we both agree 👍

🍰

RReverser · 2014-08-05T12:38:12Z

I see you've already closed facebookarchive/esprima#19, probably makes sense to close this one, too? (since it's linked)

Then new PR with trimming can be created in esprima. (with dependency on facebookarchive/esprima#32)

syranide · 2014-08-05T15:02:16Z

@RReverser Yep, 👍 for facebookarchive/esprima#32 and it seems there's a pretty straight-forward goal, closing :)

syranide · 2014-08-06T12:09:18Z

Hmm, quickly thinking about this again, I'm not sure how one would go about doing this. It's straight-forward enough to move the trimming to the parser, but React/JSX should preserve line-numbers and necessary information is again lost in translation... or?

...or, perhaps we simply don't care about 1:1 mapping with inline text and just replicate the number of newlines. Practically everything is the same, but lose a bit of familiarity when looking at the JSX output.

RReverser · 2014-08-06T12:22:30Z

React/JSX should preserve line-numbers and necessary information is again lost in translation... or?

Or it uses location info from loc properties (as it already does), which is a proper way for preserving line numbers, generating source maps etc.

syranide · 2014-08-06T13:02:11Z

@RReverser but loc only gives the starting line, not where each individual newline is in the "value". Unless we generate an XJSText for every line I suppose... hmm.

RReverser · 2014-08-06T13:45:09Z

...or, perhaps we simply don't care about 1:1 mapping with inline text and just replicate the number of newlines

Ah, so you mean visual "similarity" between JSX and generated JS where each JSX line corresponds to compiled JS line?

syranide · 2014-08-06T14:13:06Z

@RReverser Yeah, that's an intended feature today AFAIK.

RReverser · 2014-08-06T14:53:53Z

@syranide Well, current code in React's vendor/fbtransform/transforms/xjs anyway artificially looks for first/last empty lines and inserts corresponding count of those in JS, so we could simply teach it to use difference between loc.starts instead of counting them again using regexps.

Unescape XHTML entities in JSX literals (moved from parser)

0ba7875

syranide mentioned this pull request May 11, 2014

Do not unescape XHTML entities in XJSText tokens (moved to JSX) facebookarchive/esprima#19

Open

syranide mentioned this pull request Jun 10, 2014

Unrecognized entities give strange undefined #1667

Closed

syranide closed this Aug 5, 2014

syranide deleted the jsxent branch August 5, 2014 15:02

syranide mentioned this pull request Sep 29, 2014

Newlines as whitespace text facebook/jsx#19

Open

Unescape XHTML entities in JSX literals (moved from parser) #1514

Unescape XHTML entities in JSX literals (moved from parser) #1514

Uh oh!

Conversation

syranide commented May 11, 2014

Uh oh!

syranide commented May 11, 2014

Uh oh!

facebook-github-bot commented May 18, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

RReverser commented Aug 5, 2014

Uh oh!

syranide commented Aug 5, 2014

Uh oh!

syranide commented Aug 6, 2014

Uh oh!

RReverser commented Aug 6, 2014

Uh oh!

syranide commented Aug 6, 2014

Uh oh!

RReverser commented Aug 6, 2014

Uh oh!

syranide commented Aug 6, 2014

Uh oh!

RReverser commented Aug 6, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants