Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generate AST boilerplate #15544

Merged
merged 32 commits into from
Jan 17, 2025
Merged

Auto-generate AST boilerplate #15544

merged 32 commits into from
Jan 17, 2025

Conversation

dcreager
Copy link
Member

This PR replaces most of the hard-coded AST definitions with a generation script, similar to what happens in rust_python_formatter. I've replaced every "rote" definition that I could find, where the content is entirely boilerplate and only depends on what syntax nodes there are and which groups they belong to.

This is a pretty massive diff, but it's entirely a refactoring. It should make absolutely no changes to the API or implementation. In particular, this required adding some configuration knobs that let us override default auto-generated names where they don't line up with types that we created previously by hand.

Test plan

There should be no changes outside of the rust_python_ast crate, which verifies that there were no API changes as a result of the auto-generation. Aggressive cargo clippy and uvx pre-commit runs after each commit in the branch.

@dcreager dcreager added the internal An internal refactor or improvement label Jan 17, 2025
Copy link
Contributor

github-actions bot commented Jan 17, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for doing this!

Unrelated to this PR, Biome does something similar (https://github.com/biomejs/biome/tree/main/xtask/codegen).

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved
crates/ruff_python_ast/generate.py Show resolved Hide resolved
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is great.

I think my long term preference would be to use something like ungrammar that would also allow us to e.g. generate the source order visitor but I think this is a great start.

crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved
crates/ruff_python_ast/ast.toml Outdated Show resolved Hide resolved
Comment on lines 24 to 28
root = check_output(["git", "rev-parse", "--show-toplevel"], text=True).strip()
ast_path = Path(root).joinpath("crates", "ruff_python_ast", "ast.toml")
out_path = Path(root).joinpath("crates", "ruff_python_ast", "src", "generated.rs")
with ast_path.open("rb") as ast_file:
ast = tomllib.load(ast_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the code a bit hard to read because it mixes top level statement with class and declarations. I'd prefer to instead have a main function that's called in if __name__ == "__main__".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 74 to 77
out = ["""
// This is a generated file. Don't modify it by hand!
// Run `crates/ruff_python_ast/generate.py` to re-generate the file.
"""] # fmt: skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use one of:

out = [
    textwrap.dedent("""
        // This is a generated file. Don't modify it by hand!
        // Run `crates/ruff_python_ast/generate.py` to re-generate the file.
    """)
]


out = [
    textwrap.dedent(
        """
        // This is a generated file. Don't modify it by hand!
        // Run `crates/ruff_python_ast/generate.py` to re-generate the file.
        """
    )
]

Or use the auto formatted code. IMO, it's roughly equally esthetic as the manually formatted array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moot as part of reorganizing into functions.

Comment on lines 79 to 83
# ------------------------------------------------------------------------------
# Owned enum
#
# For each group, we create an enum that contains an owned copy of a syntax
# node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think each of those blocks could be functions instead where the inline comment becomes a docstring.

It would also be nice if the docstring could contain. code snippet of what it generates:

Generates the owned enum for each group, e.g Stmt, Expr etc. For each, generate:

  • The enum
  • Implement From<Variant>
  • Implement Ranged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// See also [stmt](https://docs.python.org/3/library/ast.html#ast.stmt)
#[derive(Clone, Debug, PartialEq, is_macro::Is)]
pub enum Stmt {
#[is(name = "function_def_stmt")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but I think it would be nice if we generated the is implementations as well. The is macro doesn't always work well with ide's and macros are also more expensive to compile.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Python nitpicks ;)

crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved
crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved
crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved
crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved
Comment on lines 84 to 85
--extend-select,
I,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to put this config in the tool.ruff section of our pyproject.toml file at the directory root?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it would! Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth considering moving the ruff config in the scripts dir into the root dir pyproject.toml. Isort and some other common rules are already enabled there, so it may be useful to enable them for the whole project if scripts are being added outwith the scripts dir.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I agree with @calumy's comment at #15544 (comment), but that could be done as a followup!

crates/ruff_python_ast/generate.py Outdated Show resolved Hide resolved
* main:
  [red-knot] Inline `SubclassOfType::as_instance_type_of_metaclass()` (#15556)
  [`flake8-comprehensions`] strip parentheses around generators in `unnecessary-generator-set` (`C401`) (#15553)
  [`pylint`] Implement `redefined-slots-in-subclass` (`W0244`) (#9640)
  [`flake8-bugbear`] Do not raise error if keyword argument is present and target-python version is less or equals than 3.9 (`B903`) (#15549)
  [red-knot] `type[T]` is disjoint from `type[S]` if the metaclass of `T` is disjoint from the metaclass of `S` (#15547)
  [red-knot] Pure instance variables declared in class body (#15515)
  Update snapshots of #15507 with new annotated snipetts rendering (#15546)
  [`pylint`] Do not report methods with only one `EM101`-compatible `raise` (`PLR6301`) (#15507)
  Fix unstable f-string formatting for expressions containing a trailing comma (#15545)
  Support `knot.toml` files in project discovery (#15505)
  Add support for configuring knot in `pyproject.toml` files (#15493)
  Fix bracket spacing for single-element tuples in f-string expressions (#15537)
  [`flake8-simplify`] Do not emit diagnostics for expressions inside string type annotations (`SIM222`, `SIM223`) (#15405)
  [`flake8-pytest-style`] Do not emit diagnostics for empty `for` loops (`PT012`, `PT031`) (#15542)
@dcreager dcreager merged commit 8e3633f into main Jan 17, 2025
39 checks passed
@dcreager dcreager deleted the dcreager/generate-ast branch January 17, 2025 19:23
@calumy calumy mentioned this pull request Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants