-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Implement RFC 3503: frontmatters #140035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement RFC 3503: frontmatters #140035
Conversation
This comment has been minimized.
This comment has been minimized.
I'll do a review pass next Monday/Tuesday, but in the meantime, I'll roll this to Wesley who I think have more context on the general vibes. r? @wesleywiser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the implementation! I left some (adversarial) review notes under the impression based on reading back #137193 that frontmatter is a Rust language syntax. However, the existing diagnostics here I find is already quite nice.
I especially appreciate using error annotations in between frontmatter openers/closers, I found that very clever (and cute) 😆
#![feature(frontmatter)] | ||
|
||
--- | ||
//~^ ERROR: expected item, found `-` | ||
// FIXME(frontmatter): make this diagnostic better | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: yeah, I'm not sure how to detect this properly without making some convoluted logic, even though I imagine this would be one of the more common mistakes to make. At least this does error, maybe we could provide some kind of contextual HELP like if we see ---
(3+ starting dashes)
error: expected item, found `-`
--> $DIR/frontmatter-after-tokens.rs:3:1
|
LL | ---
| ^ expected item
|
= note: for a full list of items that can appear in modules, see <https://doc.rust-lang.org/reference/items.html>
= help: if you meant to write a frontmatter, the frontmatter must come after an optional shebang but before any regular source code
(With some better wording, I find it tricky to explain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Words are hard 🙂
If the file doesn't start with a shebang line, maybe we could just say something like
= help: if you meant to write a frontmatter, the frontmatter must appear first in the file
Shebangs are not super commonly used so if we can leave that detail out, it would simplify the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this can be achieved not at the lexer/parser level but at the tokens to AST level. That code would not reside in places this PR touches so we can do this as a separate followup! :)
☔ The latest upstream changes (presumably #140180) made this pull request unmergeable. Please resolve the merge conflicts. |
#![feature(frontmatter)] | ||
|
||
--- | ||
//~^ ERROR: expected item, found `-` | ||
// FIXME(frontmatter): make this diagnostic better | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Words are hard 🙂
If the file doesn't start with a shebang line, maybe we could just say something like
= help: if you meant to write a frontmatter, the frontmatter must appear first in the file
Shebangs are not super commonly used so if we can leave that detail out, it would simplify the error message.
cc @epage for awareness 🙂 |
Thanks for your work on this! |
compiler/rustc_lexer/src/cursor.rs
Outdated
#[cfg(debug_assertions)] | ||
prev: char, | ||
} | ||
|
||
pub(crate) const EOF_CHAR: char = '\0'; | ||
|
||
impl<'a> Cursor<'a> { | ||
pub fn new(input: &'a str) -> Cursor<'a> { | ||
pub fn new(input: &'a str, frontmatter_allowed: bool) -> Cursor<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have an implementation using the lexer, instead of stripping (#137193), what are your thoughts on the trade offs?
The cases I can think of
- General code design. I can't speak to this
- Other tools (r-a, rustfmt, etc). I suspect this will need slightly more work for them to passively support the new syntax. For richer features (processing the content, mutating), either they'll still need their own parser or the token's capabilities will need to be expanded which is where we were at with Add unstable frontmatter support #137193
- Other uses for the token existing? I can't think of any but maybe you have some in mind.
- Diagnostics: Provide helpful diagnostics for shebang lookalikes #137619 shows that stripping can still have decent diagnostics. I can't speak to how bad it would be to implement each side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#137619 still does error reporting in the parsing step, which still suggests we can't report errors in the lexer level. As far as my knowledge of the compiler goes, parsing it as a token and reporting it when the token gets "cooked" in the parsing step is pretty much the best way. That is how the implementation for c"str"
literals worked and it was stabilized with that implementation.
f9112c6
to
4f62442
Compare
rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer |
This comment has been minimized.
This comment has been minimized.
4f62442
to
d979912
Compare
This comment has been minimized.
This comment has been minimized.
d979912
to
54e4711
Compare
☔ The latest upstream changes (presumably #140529) made this pull request unmergeable. Please resolve the merge conflicts. |
54e4711
to
70a2a0f
Compare
@rustbot ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage is looking quite nice! I updated the tracking issue #136889 to reflect of some of the changes we made here, feel free to also edit.
The compiler changes looks good to me. I left a few more discussions, and a... cursed edge case for us to consider (but not necessarily change in this initial impl PR).
// copied from `eat_identifier`, but allows `.` in infostring to allow something like | ||
// `---Cargo.toml` as a valid opener | ||
if is_id_start(self.first()) { | ||
self.bump(); | ||
self.eat_while(|c| is_id_continue(c) || c == '.'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: to double-check, the infostring grammar we're going with is (EBNF-esque syntax):
infostring = XID_Start , { (XID_Continue | '.') } * ;
As this would prevent the infostring from starting with -
, preventing the ambiguity w/ the opener. This may be further relaxed in the future, but this is what we're going with in this initial impl PR.
Furthermore, we no longer permit 0+ whitespaces between opener (dashes) and infostring, so the opener + infostring grammar is sth like:
frontmatter_start = dashes , infostring , { whitespace } * , '\n' ;
dashes = "---" , { '-' } * ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd support this syntax as it makes the most sense to me. Curious as to whether @epage has any thoughts about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, we no longer permit 0+ whitespaces between opener (dashes) and infostring, so the opener + infostring grammar is sth like:
Why was that change made? The syntax should be
frontmatter_start = dashes , { whitespace } * , infostring , { whitespace } * , '\n' ;
frontmatter_end = dashes , { whitespace } * , '\n' ;
infostring = XID_Start , { (XID_Continue | '.') } * ;
dashes = "---" , { '-' } * ;
(with start/end dashes
matching in length)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benefit would allowing whitespace before the infostring give? I can implement the behavior to allow whitespace easily, but this feels like a very arbitrary decision. I've never seen any code blocks in markdown with spaces before the specifier or any of the sort. Are there other tools that allow this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown allows it which was the primary source of inspiration for this syntax. Granted, we made changes but that was when there was a practical, motivating, reason (e.g. the character was changed with one reason being the difficulty of then embedding these in markdown which is made worse by how many markdown parsers incorrectly handle escaping).
Allowing it would also be consistent with Rust which allows arbitrary whitespace (which also includes newlines and comments) between pieces of syntax, even pieces that people don't expect to be separate syntax like #
and [
for attributes.
When we changed the infostring, that was because we ran into a technical problem with the last minute RFC change. I would expect other grammar changes to be a similar bar (or "this could get in the way of a future design"). I'm not seeing either of those here and instead this is seeming to be more of a personal preference at which point I would say we should stick to the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion (mostly for rustdoc maintainers, and most likely not for this PR): I... I came up with another cursed example. I'm sorry-
Somewhat surprisingly, this works (these are functional rustdoc-ui
tests):
//@ check-pass
//@ compile-flags: --test --test-args=--test-threads=1
//@ normalize-stdout: "tests/rustdoc-ui" -> "$$DIR"
//@ normalize-stdout: "finished in \d+\.\d+s" -> "finished in $$TIME"
//@ normalize-stdout: ".rs:\d+:\d+" -> ".rs:$$LINE:$$COL"
/// ```
/// ---hurts_my_head
/// ---
/// #![feature(frontmatter)]
/// fn main() {}
/// ```
pub struct ExplicitMain;
But these don't:
Version | Test |
---|---|
Implicit main, stmt, inner feature attr |
//@ check-fail
//@ compile-flags: --test --test-args=--test-threads=1
//@ normalize-stdout: "tests/rustdoc-ui" -> "$$DIR"
//@ normalize-stdout: "finished in \d+\.\d+s" -> "finished in $$TIME"
//@ normalize-stdout: ".rs:\d+:\d+" -> ".rs:$$LINE:$$COL"
// This doesn't work, as one might expect?
/// ```
/// ---hurts_my_head
/// ---
/// #![feature(frontmatter)]
/// assert_eq!(1, 1);
/// ```
pub struct ImplicitWrapMain; |
Implicit main, stmt, outside doctest source crate-level feature attr |
//@ check-fail
//@ compile-flags: --test --test-args=--test-threads=1
//@ normalize-stdout: "tests/rustdoc-ui" -> "$$DIR"
//@ normalize-stdout: "finished in \d+\.\d+s" -> "finished in $$TIME"
//@ normalize-stdout: ".rs:\d+:\d+" -> ".rs:$$LINE:$$COL"
#![feature(frontmatter)]
// This doesn't work either
/// ```
/// ---hurts_my_head
/// ---
/// assert_eq!(1, 1);
/// ```
pub struct ImplicitWrapMain; |
In the second case, even hoisting the #![feature(frontmatter)]
into sources (outside of doctest)
doesn't work, because I want to say it's interpreted as an expression (triple negation...) because
of doctest implicit expr main wrapping behavior? But then the generated doctest binary source file doesn't pass on the frontmatter? Idk
Additionally, I have no idea if 2024 doctest merging behavior can introduce some other obscure edge cases here.
I don't work on rustdoc, but my 2 cents is that we might need to ban frontmatters from doctests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of doctest implicit expr main wrapping behavior
I'm pretty sure doctests try to detect whether a fn main
exists and if it doesn't it will wrap an implicit fn main() {}
to the statements within the doctest. I can look into whether there is an easy fix to ban frontmatters in doctests entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah s/expr/stmts/, not sure why I said expr 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For detecting the main fn, nowadays we wrap the doctest in a function (so we don't need to reparse item ctxt vs stmt ctxt (which we used to do)), parse the fn as an item (duh) and finally traverse the AST nodes to see if all stmts are items or … or …. This all happens in doctest/make.rs
.
I'm surprised that it permits frontmatters tho, since we'd be parsing both code blocks as fn f(){\n---hurts_my_head…
(f
is the wrapper fn) and frontmatter ought not be valid inside fn bodies. It's very early for me, my brain hasn't fully booted yet, so I don't know what's going on (for comparison, shebangs always lead to a syntax error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from make.rs
, we also have warn-by-default lint rustdoc::invalid_rust_codeblocks
which does not parse the doctest source (as it advertises) but only lexes it essentially, well it parses it into a TokenStream. Not sure if that'd need any updating or not, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual code lives in https://doc.rust-lang.org/stable/nightly-rustc/src/rustdoc/doctest/make.rs.html#137-251 which does not wrap the code if the doctest meets certain conditions (one of them is containing a main function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure whether we need to just ban frontmatters in doctests completely, or at least in this initial PR. This should be figured out in stabilization (maybe detect whether a file is compiled as a doctest then error if a frontmatter token is reached?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah like I said, probably not for this initial PR, but would probably be a stabilization blocker.
EDIT: let me just write this down in the tracking issue, we can leave it for follow-ups.
EDIT 2: I added this as an unresolved question on the tracking issue, not a blocker for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: https://doc.rust-lang.org/stable/nightly-rustc/src/rustdoc/doctest/make.rs.html#137-251 is not necessarily up to date with nightly (it's the stable/
distribution), check https://doc.rust-lang.org/nightly/nightly-rustc/src/rustdoc/doctest/make.rs.html
(one of them is containing a main function)
Yes, but I was talking about parse_source
in make.rs
which determines whether there was a main
fn and that code wraps everything in fn f() {…}
which should lead to a syntax error in theory and should lead to rustdoc generating a dummy doctest (DocTestBuilder::invalid
). If it doesn't, then that'd be interesting to know but I might be missing something, I'm still waking up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I don't want to burden you. I also don't consider this blocking.
@epage I took a quick look, and from what I can tell the frontmatter isn't parsed as an AST node (let me know if I've missed anything there), so these changes shouldn't impact rustfmt at all. That said, I'll double check that's the case sometime over the next couple days. |
Well, |
Tracking issue: #136889
Supercedes #137193. This implements RFC 3503.
This might break rust-analyzer. Will look into how to fix that.
Suggestions welcome for how to improve diagnostics.