Skip to content

Implement RFC 3503: frontmatters #140035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fee1-dead
Copy link
Member

@fee1-dead fee1-dead commented Apr 19, 2025

Tracking issue: #136889

Supercedes #137193. This implements RFC 3503.

This might break rust-analyzer. Will look into how to fix that.

Suggestions welcome for how to improve diagnostics.

@rustbot
Copy link
Collaborator

rustbot commented Apr 19, 2025

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Apr 19, 2025
@rust-log-analyzer

This comment has been minimized.

@jieyouxu jieyouxu added the F-frontmatter `#![feature(frontmatter)]` label Apr 19, 2025
@jieyouxu
Copy link
Member

I'll do a review pass next Monday/Tuesday, but in the meantime, I'll roll this to Wesley who I think have more context on the general vibes.

r? @wesleywiser

@rustbot rustbot assigned wesleywiser and unassigned jieyouxu Apr 19, 2025
@jieyouxu jieyouxu self-assigned this Apr 19, 2025
Copy link
Member

@jieyouxu jieyouxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the implementation! I left some (adversarial) review notes under the impression based on reading back #137193 that frontmatter is a Rust language syntax. However, the existing diagnostics here I find is already quite nice.

I especially appreciate using error annotations in between frontmatter openers/closers, I found that very clever (and cute) 😆

Comment on lines +1 to +6
#![feature(frontmatter)]

---
//~^ ERROR: expected item, found `-`
// FIXME(frontmatter): make this diagnostic better
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: yeah, I'm not sure how to detect this properly without making some convoluted logic, even though I imagine this would be one of the more common mistakes to make. At least this does error, maybe we could provide some kind of contextual HELP like if we see --- (3+ starting dashes)

error: expected item, found `-`
  --> $DIR/frontmatter-after-tokens.rs:3:1
   |
LL | ---
   | ^ expected item
   |
   = note: for a full list of items that can appear in modules, see <https://doc.rust-lang.org/reference/items.html>
   = help: if you meant to write a frontmatter, the frontmatter must come after an optional shebang but before any regular source code

(With some better wording, I find it tricky to explain)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Words are hard 🙂

If the file doesn't start with a shebang line, maybe we could just say something like

   = help: if you meant to write a frontmatter, the frontmatter must appear first in the file

Shebangs are not super commonly used so if we can leave that detail out, it would simplify the error message.

Copy link
Member Author

@fee1-dead fee1-dead Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this can be achieved not at the lexer/parser level but at the tokens to AST level. That code would not reside in places this PR touches so we can do this as a separate followup! :)

@bors
Copy link
Collaborator

bors commented Apr 23, 2025

☔ The latest upstream changes (presumably #140180) made this pull request unmergeable. Please resolve the merge conflicts.

Comment on lines +1 to +6
#![feature(frontmatter)]

---
//~^ ERROR: expected item, found `-`
// FIXME(frontmatter): make this diagnostic better
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Words are hard 🙂

If the file doesn't start with a shebang line, maybe we could just say something like

   = help: if you meant to write a frontmatter, the frontmatter must appear first in the file

Shebangs are not super commonly used so if we can leave that detail out, it would simplify the error message.

@jieyouxu jieyouxu added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 25, 2025
@wesleywiser
Copy link
Member

cc @epage for awareness 🙂

@epage
Copy link
Contributor

epage commented Apr 30, 2025

Thanks for your work on this!

#[cfg(debug_assertions)]
prev: char,
}

pub(crate) const EOF_CHAR: char = '\0';

impl<'a> Cursor<'a> {
pub fn new(input: &'a str) -> Cursor<'a> {
pub fn new(input: &'a str, frontmatter_allowed: bool) -> Cursor<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have an implementation using the lexer, instead of stripping (#137193), what are your thoughts on the trade offs?

The cases I can think of

  • General code design. I can't speak to this
  • Other tools (r-a, rustfmt, etc). I suspect this will need slightly more work for them to passively support the new syntax. For richer features (processing the content, mutating), either they'll still need their own parser or the token's capabilities will need to be expanded which is where we were at with Add unstable frontmatter support #137193
  • Other uses for the token existing? I can't think of any but maybe you have some in mind.
  • Diagnostics: Provide helpful diagnostics for shebang lookalikes #137619 shows that stripping can still have decent diagnostics. I can't speak to how bad it would be to implement each side

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#137619 still does error reporting in the parsing step, which still suggests we can't report errors in the lexer level. As far as my knowledge of the compiler goes, parsing it as a token and reporting it when the token gets "cooked" in the parsing step is pretty much the best way. That is how the implementation for c"str" literals worked and it was stabilized with that implementation.

@epage
Copy link
Contributor

epage commented Apr 30, 2025

@ytmimi as this doesn't update rustfmt for frontmatter support (unsure if that is intended for this PR or a follow up), any feedback on how well this will work for rustfmt, at least in terms of tolerating them (ie feature parity with #137193)

@rustbot
Copy link
Collaborator

rustbot commented Apr 30, 2025

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented May 1, 2025

☔ The latest upstream changes (presumably #140529) made this pull request unmergeable. Please resolve the merge conflicts.

@fee1-dead fee1-dead force-pushed the push-oszwkkvmpkks branch from 54e4711 to 70a2a0f Compare May 1, 2025 03:36
@fee1-dead
Copy link
Member Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 1, 2025
@jieyouxu jieyouxu mentioned this pull request May 1, 2025
11 tasks
Copy link
Member

@jieyouxu jieyouxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is looking quite nice! I updated the tracking issue #136889 to reflect of some of the changes we made here, feel free to also edit.

The compiler changes looks good to me. I left a few more discussions, and a... cursed edge case for us to consider (but not necessarily change in this initial impl PR).

Comment on lines +533 to +538
// copied from `eat_identifier`, but allows `.` in infostring to allow something like
// `---Cargo.toml` as a valid opener
if is_id_start(self.first()) {
self.bump();
self.eat_while(|c| is_id_continue(c) || c == '.');
}
Copy link
Member

@jieyouxu jieyouxu May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: to double-check, the infostring grammar we're going with is (EBNF-esque syntax):

infostring = XID_Start , { (XID_Continue | '.') } * ;

As this would prevent the infostring from starting with -, preventing the ambiguity w/ the opener. This may be further relaxed in the future, but this is what we're going with in this initial impl PR.

Furthermore, we no longer permit 0+ whitespaces between opener (dashes) and infostring, so the opener + infostring grammar is sth like:

frontmatter_start = dashes , infostring , { whitespace } * , '\n' ;
dashes = "---" , { '-' } * ;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd support this syntax as it makes the most sense to me. Curious as to whether @epage has any thoughts about this.

Copy link
Contributor

@epage epage May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, we no longer permit 0+ whitespaces between opener (dashes) and infostring, so the opener + infostring grammar is sth like:

Why was that change made? The syntax should be

frontmatter_start = dashes , { whitespace } * , infostring , { whitespace } * , '\n' ;
frontmatter_end = dashes , { whitespace } * , '\n' ;
infostring = XID_Start , { (XID_Continue | '.') } * ;
dashes = "---" , { '-' } * ;

(with start/end dashes matching in length)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What benefit would allowing whitespace before the infostring give? I can implement the behavior to allow whitespace easily, but this feels like a very arbitrary decision. I've never seen any code blocks in markdown with spaces before the specifier or any of the sort. Are there other tools that allow this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown allows it which was the primary source of inspiration for this syntax. Granted, we made changes but that was when there was a practical, motivating, reason (e.g. the character was changed with one reason being the difficulty of then embedding these in markdown which is made worse by how many markdown parsers incorrectly handle escaping).

Allowing it would also be consistent with Rust which allows arbitrary whitespace (which also includes newlines and comments) between pieces of syntax, even pieces that people don't expect to be separate syntax like # and [ for attributes.

When we changed the infostring, that was because we ran into a technical problem with the last minute RFC change. I would expect other grammar changes to be a similar bar (or "this could get in the way of a future design"). I'm not seeing either of those here and instead this is seeming to be more of a personal preference at which point I would say we should stick to the RFC.

Copy link
Member

@jieyouxu jieyouxu May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion (mostly for rustdoc maintainers, and most likely not for this PR): I... I came up with another cursed example. I'm sorry-

Somewhat surprisingly, this works (these are functional rustdoc-ui tests):

//@ check-pass
//@ compile-flags: --test --test-args=--test-threads=1
//@ normalize-stdout: "tests/rustdoc-ui" -> "$$DIR"
//@ normalize-stdout: "finished in \d+\.\d+s" -> "finished in $$TIME"
//@ normalize-stdout: ".rs:\d+:\d+" -> ".rs:$$LINE:$$COL"

/// ```
/// ---hurts_my_head
/// ---
/// #![feature(frontmatter)]
/// fn main() {}
/// ```
pub struct ExplicitMain;

But these don't:

Version Test
Implicit main, stmt, inner feature attr
//@ check-fail
//@ compile-flags: --test --test-args=--test-threads=1
//@ normalize-stdout: "tests/rustdoc-ui" -> "$$DIR"
//@ normalize-stdout: "finished in \d+\.\d+s" -> "finished in $$TIME"
//@ normalize-stdout: ".rs:\d+:\d+" -> ".rs:$$LINE:$$COL"

// This doesn't work, as one might expect?
/// ```
/// ---hurts_my_head
/// ---
/// #![feature(frontmatter)]
/// assert_eq!(1, 1);
/// ```
pub struct ImplicitWrapMain;
Implicit main, stmt, outside doctest source crate-level feature attr
//@ check-fail
//@ compile-flags: --test --test-args=--test-threads=1
//@ normalize-stdout: "tests/rustdoc-ui" -> "$$DIR"
//@ normalize-stdout: "finished in \d+\.\d+s" -> "finished in $$TIME"
//@ normalize-stdout: ".rs:\d+:\d+" -> ".rs:$$LINE:$$COL"

#![feature(frontmatter)]

// This doesn't work either
/// ```
/// ---hurts_my_head
/// ---
/// assert_eq!(1, 1);
/// ```
pub struct ImplicitWrapMain;

In the second case, even hoisting the #![feature(frontmatter)] into sources (outside of doctest)
doesn't work, because I want to say it's interpreted as an expression (triple negation...) because
of doctest implicit expr main wrapping behavior? But then the generated doctest binary source file doesn't pass on the frontmatter? Idk

Additionally, I have no idea if 2024 doctest merging behavior can introduce some other obscure edge cases here.

I don't work on rustdoc, but my 2 cents is that we might need to ban frontmatters from doctests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of doctest implicit expr main wrapping behavior

I'm pretty sure doctests try to detect whether a fn main exists and if it doesn't it will wrap an implicit fn main() {} to the statements within the doctest. I can look into whether there is an easy fix to ban frontmatters in doctests entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah s/expr/stmts/, not sure why I said expr 😄

Copy link
Member

@fmease fmease May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For detecting the main fn, nowadays we wrap the doctest in a function (so we don't need to reparse item ctxt vs stmt ctxt (which we used to do)), parse the fn as an item (duh) and finally traverse the AST nodes to see if all stmts are items or … or …. This all happens in doctest/make.rs.

I'm surprised that it permits frontmatters tho, since we'd be parsing both code blocks as fn f(){\n---hurts_my_head… (f is the wrapper fn) and frontmatter ought not be valid inside fn bodies. It's very early for me, my brain hasn't fully booted yet, so I don't know what's going on (for comparison, shebangs always lead to a syntax error).

Copy link
Member

@fmease fmease May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from make.rs, we also have warn-by-default lint rustdoc::invalid_rust_codeblocks which does not parse the doctest source (as it advertises) but only lexes it essentially, well it parses it into a TokenStream. Not sure if that'd need any updating or not, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual code lives in https://doc.rust-lang.org/stable/nightly-rustc/src/rustdoc/doctest/make.rs.html#137-251 which does not wrap the code if the doctest meets certain conditions (one of them is containing a main function)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure whether we need to just ban frontmatters in doctests completely, or at least in this initial PR. This should be figured out in stabilization (maybe detect whether a file is compiled as a doctest then error if a frontmatter token is reached?)

Copy link
Member

@jieyouxu jieyouxu May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah like I said, probably not for this initial PR, but would probably be a stabilization blocker.

EDIT: let me just write this down in the tracking issue, we can leave it for follow-ups.
EDIT 2: I added this as an unresolved question on the tracking issue, not a blocker for this PR.

Copy link
Member

@fmease fmease May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: https://doc.rust-lang.org/stable/nightly-rustc/src/rustdoc/doctest/make.rs.html#137-251 is not necessarily up to date with nightly (it's the stable/ distribution), check https://doc.rust-lang.org/nightly/nightly-rustc/src/rustdoc/doctest/make.rs.html

(one of them is containing a main function)

Yes, but I was talking about parse_source in make.rs which determines whether there was a main fn and that code wraps everything in fn f() {…} which should lead to a syntax error in theory and should lead to rustdoc generating a dummy doctest (DocTestBuilder::invalid). If it doesn't, then that'd be interesting to know but I might be missing something, I'm still waking up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I don't want to burden you. I also don't consider this blocking.

@ytmimi
Copy link
Contributor

ytmimi commented May 1, 2025

@ytmimi as this doesn't update rustfmt for frontmatter support (unsure if that is intended for this PR or a follow up), any feedback on how well this will work for rustfmt, at least in terms of tolerating them (ie feature parity with #137193)

@epage I took a quick look, and from what I can tell the frontmatter isn't parsed as an AST node (let me know if I've missed anything there), so these changes shouldn't impact rustfmt at all. That said, I'll double check that's the case sometime over the next couple days.

@fmease
Copy link
Member

fmease commented May 1, 2025

Well, rustfmt ought not strip frontmatter but reproduce it verbatim similar to shebangs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-frontmatter `#![feature(frontmatter)]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants