Skip to content

Enhanced AST #44

Closed
Closed
@willcrichton

Description

@willcrichton

For my inlining tool, there's a number of features I need that aren't present in the standard AST module. I'm thinking about the design of an "enhanced" AST that enables these features, and I'd like to know what use cases y'all have as well.

Requirements

  1. Preserve comments and whitespace: when e.g. inlining a function, because I'm generating human-readable code, it's useful to preserve comments and whitespace for low-level legibility and higher-level comprehension. ast.parse drops this information after tokenization.

  2. Generate string from AST, and preserve line number mapping to AST nodes: to test whether a line of code is executed in a program, I turn the AST into a string. Then I execute the program using Frame.f_trace_lines to track executed lines. Then, I need to map this information back to an AST node. The only way to do that currently is to re-generate the AST from the created source file. However, this throws away any in-memory information attached to the AST, unless it somehow is explicitly dumped into the generated Python source. Ideally, the AST would not be re-generated, and instead the AST-to-string routine can remember the AST node <-> line number mapping.

  3. Annotate AST nodes. It would be useful to annotate AST nodes with information like a source map, or a history of transformations applied to it. This is possible right now by just attaching members to the AST node object, although that usually messes with something, e.g. structural AST comparisons.

Prior work

There are two main projects that have a subset of these features:

  1. baron: this is a seemingly popular tool designed for source-preserving refactoring. It completely circumvents the CPython AST facilities, having its own AST, grammar, visitor framework, so on. It was specifically designed such that to_string(parse(code)) == code, i.e. 1-to-1 source mapping. However, we probably wouldn't want to tie ourselves to an AST framework that won't interoperate with all other AST tools.

  2. horast: this is a small tool that just preserves comments in the AST (not whitespace). It's a lightweight extension on-top of CPython's ast library. It seems like a good reference implementation for comment-preservation even if we don't literally use the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions