Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced AST #44

Closed
willcrichton opened this issue Mar 13, 2020 · 7 comments
Closed

Enhanced AST #44

willcrichton opened this issue Mar 13, 2020 · 7 comments

Comments

@willcrichton
Copy link
Contributor

For my inlining tool, there's a number of features I need that aren't present in the standard AST module. I'm thinking about the design of an "enhanced" AST that enables these features, and I'd like to know what use cases y'all have as well.

Requirements

  1. Preserve comments and whitespace: when e.g. inlining a function, because I'm generating human-readable code, it's useful to preserve comments and whitespace for low-level legibility and higher-level comprehension. ast.parse drops this information after tokenization.

  2. Generate string from AST, and preserve line number mapping to AST nodes: to test whether a line of code is executed in a program, I turn the AST into a string. Then I execute the program using Frame.f_trace_lines to track executed lines. Then, I need to map this information back to an AST node. The only way to do that currently is to re-generate the AST from the created source file. However, this throws away any in-memory information attached to the AST, unless it somehow is explicitly dumped into the generated Python source. Ideally, the AST would not be re-generated, and instead the AST-to-string routine can remember the AST node <-> line number mapping.

  3. Annotate AST nodes. It would be useful to annotate AST nodes with information like a source map, or a history of transformations applied to it. This is possible right now by just attaching members to the AST node object, although that usually messes with something, e.g. structural AST comparisons.

Prior work

There are two main projects that have a subset of these features:

  1. baron: this is a seemingly popular tool designed for source-preserving refactoring. It completely circumvents the CPython AST facilities, having its own AST, grammar, visitor framework, so on. It was specifically designed such that to_string(parse(code)) == code, i.e. 1-to-1 source mapping. However, we probably wouldn't want to tie ourselves to an AST framework that won't interoperate with all other AST tools.

  2. horast: this is a small tool that just preserves comments in the AST (not whitespace). It's a lightweight extension on-top of CPython's ast library. It seems like a good reference implementation for comment-preservation even if we don't literally use the library.

@cdonovick
Copy link
Collaborator

cdonovick commented Mar 13, 2020

I also want "enhanced AST". Although my desire are pretty different from yours.

I want it to be immutable, so that nodes can be compared / hashed. But this means I want to destroy line information etc. on the nodes themselves. This would also allow nodes to be reused without deep copying them . Source maps could be maintained for nodes based on identity but this would run into issue if nodes are used in multiple places without deep copying.

I want a version independent AST, so passes are more portable. My vision for this would basically be what ever the bleeding edge AST is, with older versions only allowing a subset of nodes. Perhaps the most notable recent change that I want to back port is unified constant nodes, instead having different nodes for every type of literal.

Finally I want to a distinct node for Stmt*, so that it is possible to visit_block.

Also persevering and parsing comments would be nice but is lower priority for me than 3 concerns above.

@cdonovick
Copy link
Collaborator

Was just chatting with @willcrichton want to move this a discussion here and get @leonardt opinion. Debian has moved to python 3.8 which breaks ast_tools so I have more motivation to either make my own alt ast or migrate to an existing one.

Will uses LibCST and plans on continuing to use it. However, it has the downside of not supporting returning List[Stmt] to replace Stmt in a transformer (which is necessary for many of our transformations). Will does however have a mixin which would allow us to get the that behavior.

Right now am I leaning toward moving to LibCST however this would some engineering effort as all of AST tools would need to be rewritten.

@leonardt are there features in AST that you don't have in the standard module which you would like for silica and fault? We should do a review of our requirements before a major rewrite.

@leonardt
Copy link
Owner

Right now magma is using the pass infrastructure (begin/end rewrite). It would be nice if we could do some brainstorming to see if we could come up with a slightly more ergonomic interface (at some point I think we liked the idea of @ast_tools.rewrite(passes=[...]) or something of that nature (basically a list of ordered passes with a single decorator, rather than having to wrap the decorators with begin/end). In terms of passes, we use ssa and if_to_phi.

Other than that there's no real immediate feature requirements from me, although I'm sure any useful/reusable code would help clean up the internals. This PR has a good overview of the new comb/seq flow for the AST (a lot simpler now) https://github.com/phanrahan/magma/pull/668/files

@cdonovick
Copy link
Collaborator

Okay I think I am going to start working on this. I think for now I am going to create cst_to_ast / ast_to_cst passes and not do a major rewrite before I am sure that LibCST is what I want to use.

@cdonovick
Copy link
Collaborator

Actually it turns out that making an ast_to_cst pass is somewhat annoying because of how the pass infrastructure works. To convert to CST I need to serialize and reparse but this requires a end_rewrite to strip decorators. So this gives me a good reason to build the apply_passes decorator.

For now there will be separate apply_ast_passes and apply_cst_passes decorators.

@leonardt
Copy link
Owner

I think we can close this now that we've moved to libcst?

@cdonovick
Copy link
Collaborator

agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants