Skip to content

[RFC] Multi-Proc AST-gen + fuzzing #2027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

schilkp
Copy link
Contributor

@schilkp schilkp commented Apr 2, 2025

Hi All!

Because I wanted to dig around the AST generator/fuzzing subsystems a bit (and the other admin stuff I should actually be doing is totally boring), I tried my hand at improving the fuzzing subsystem to cover multi-proc DSLX programs.

I realize I committed the sin of "building a bunch of things without ever asking if you actually want it" - but my motivation for working on this was mostly personal interest. Since I probably won't have much more time to play around with it (the boring admin stuff is getting more pressing), I thought I would open this to get a sense if there are any bits and pieces in here that you would actually want. No worries if you don't. At the very least it forces me to write everything down in case someone ever does want to look into this.

I have marked this as a draft PR because I assume that, if you are interested in some part of this, it will be much easier if I break it out into multiple PRs. The code is also not yet "well-backed" in many places, so I would clean those parts up as well before actually sending them in of course:)

Motivation

From my side: Most of the bugs I have run into have been related to proc interaction/FIFOs (1 2 3) so I got thinking about how difficult it would be to cover some of these paths with the fuzzer.

In general, I feel there is some asymmetry between the test coverage of the "creating procs" and the "connecting procs together" parts of XLS. Hence it seemed potentially worthwhile to work on this a bit.

State

I mostly focused on the AST generator, and building the ground work for generating random hierarchies of procs. I have spent some time running these ASTs through the sample runner and worked on some of the immediate issues, but there are numerous points of friction left - see below.

In short, it is possible to now run the fuzzer reliably with the following flow:

  • Generate random ASTs containing procs that spawn a random "child" proc, with this child being both being connected to the parent proc body and directly being exposed via the parent proc's interface.
  • Generate test vectors for these mutli-proc samples.
  • Have the sample runner perform IR conversion, IR optimization, and codegen.
  • While I have added quite some deadlock mitigation to the AST generator (see below), there are still plenty of deadlocks left. If adding a deadlock known failure, the sample runner can also reliably:
    • Run the DSLX evaluator
    • Run the IR interpreter / JIT on both the un-optimized and optimized IR.

Trying to run simlulation as well causes everything to go haywire. I have not dug into it much.

Implementation

Roughly, my implementation strategy is as follows:

  • Fix some small fuzzer/AST generator bugs (mostly already upstreamed)

  • Add an ast_generator_main tool to allow me to manually drive the AST generator. This made testing it during development a whole bunch easier :)

  • Refactor the AstGenerator to pull all proc-related generation state from AstGenerator itself into a separate ProcContext inside AstGenerator::Context, making it possible for proc generations to nest.

  • Re-work proc generation to make it "signature driven", just like the function AST generation.

    • For functions, a random signature is chosen, then a matching function body is generated.

    • For procs, the state type was pre-determined, but channels were created every time a new ChannelOp was generated, changing the proc's interface.

    • This patch reworks the proc generation to also first generate a proc interface (set of input/output channels), then generates a matching proc body.

    • This makes it much easier to generate proc hierarchies: When generating a child proc we can pick signature we want (as is done for child functions for map calls) instead of having to make a random child proc signature work.

    • This significantly increased the number of channel ops generated, which is how I found the incorrect pipeline stage count propagation through counted for loops - It was result in tens of "too few pipeline stages" known failures a second. It also exposed some other limitations related to combinational proc codegen - see below.

  • Add the ability to generate proc hierarchies;

    • When generating a proc, if enabled, we sometimes generate a random child proc that interacts with the parent proc. To keep things simple I generate at most one child proc. Currently, the "spawn-depth" is limited to one, to maintain sanity.
  • Some initial/WIP small hacks to the sample runner/generator to support multi-proc samples.

The commits mirror this flow, hopefully making them a bit easier to look at.

Wiring the Proc Config Function

Some notes about how the channels are routed in the generated config function, if we are generating a child:

Since the proc interface (the arguments of the proc's config function) are now pre-defined, we send a random portion of these channels to the generated child (chns_io_child). The rest of the channels are used as proc members and are interacted with in the proc next function (chns_io_body).

We also generate a random set of internal channels (chns_child_body) that connect the child proc to the proc members, allowing the us to interact with it from the next function. The required interface for the child proc is therefor the combination of channels connecting it to the parent proc interface and parent proc next function (chns_io_child and chns_io_body).

This scheme is illustrated below:

                          Proc I/O
                   (Config func. params)
                        ▲         ▲
                        │ │       │ │
                   ─────┼─┼───────┼─┼──
                        │ │       │ │
       chns_io_child    │ │       │ │
                        │ │       │ │
                     ┌──┴─▼──┐    │ │
                     │       │    │ │
          child proc │       │    │ │ chns_io_body
                     │       │    │ │
                     └──▲─┬──┘    │ │
                        │ │       │ │
     chns_child_body    │ │       │ │
                        │ │       │ │
                   ─────┼─┼───────┼─┼───
                        │ │       │ │
                          ▼         ▼
                       Proc Members
                 (Channels used in `next`)

I always use the following channel ordering:

Proc I/O: [*chns_io_child, *chns_io_body]

Child I/O: [*chns_io_child, *chns_child_body]

Proc Members: [*chns_child_body, *chns_io_body]

Known Points of Friction

Some notes on places that will likely need attention to "finish" full multi-proc fuzzing.

Deadlocks (ofc..)

The immediate issue with random multi-proc ASTs is the potential for deadlock.

I added the following AST generation restrictions to reduce the chance of deadlock:

    1. The child proc must send first on all channels before receiving.
    1. The child proc may only contain unconditional sends/receives.
    1. The top proc may only interact with the child proc using unconditional sends/receives
    1. The FIFOs connecting the child and top proc have a minimum depth of 1.

At a first glance it seemed to me that these ought to be enough to prevent dead locking of fuzzing samples, but there are still plenty of deadlocks to be had.

For now, I added a known_failure.

There is a discussion to be had about if it might be worth actually fuzzing these deadlocking samples. If, instead of crashing if we can't tick N times, we tick "as many times as we can but at most N times", it might be possible to still check that these samples behave well. The non-greediness of interpreters would make this difficult thought (see below).

The AST generator does not know about "combinational proc codegen"

Related to this discussion here: #1996

It seems that the combinational proc codegen pipeline has more restrictive/different limitations on channel exclusivity.

The proc generator has no notion of these, and the sample runner will attempt to run the combinational backend with incompatible procs.

As far as I can see this is a limitation even without my changes, but the much higher frequency of channel ops being generated has surfaced it more.

For now, I have added this as a "known failure". It is fairly rare and a limitation of the fuzzer and not the codegen afterall.

"The interpreters are not greedy enough"

The interpreters are sensitive to the order of operations, even if the IR is semantically equivalent. This surfaces in "pseudo-deadlocks" (samples that are deadlocked but not detected as such because one proc is running in a circle doing nothing). This results in fuzzing failures because optimization can change the order of operations, causing such pseudo-deadlock samples to behave differently before and after optimization.

For example, consider this snippet:

package repro

chan output(bits[10], id=2, kind=streaming, ops=send_only, flow_control=ready_valid, strictness=proven_mutually_exclusive)
chan always_empty(bits[64], id=10, kind=streaming, ops=send_receive, flow_control=ready_valid, strictness=proven_mutually_exclusive, fifo_depth=1, bypass=true, register_push_outputs=true, register_pop_outputs=false)

top proc __sample__main_0_next() {
  ready: token = after_all(id=4839)
  data: bits[10] = literal(value=0, id=4840)
  x95: (token, bits[64]) = receive(ready, channel=always_empty, id=4845) // A
  x110: token = send(ready, data, channel=output, id=4846)               // B
}

proc never_send() {
  now: token = after_all(id=4841)
  data: bits[64] = literal(value=0, id=4843)
  zero: bits[1] = literal(value=0, id=4842)
  x46: token = send(now, data, predicate=zero, channel=always_empty, id=4844)
}

Note that line A will never complete since the channel is always empty. Semantically, the send on line B is not blocked by this, meaning I would expect the proc to send to the output channel once. I guess this is what the physical circuit would do.

However, the interpreters get "blocked" on line A, and B never executes.

If the two lines are swapped, we see the single send. If optimizations cause this swap to occur, we have a mismatch in behavior between optimized and un-optimized interpreted IR.

I currently work around this also using the deadlock mitigation restrictions:

    1. The child proc must send first on all channels before receiving.
    1. The child proc may only contain unconditional sends/receives.
    1. The top proc may only interact with the child proc using unconditional sends/receives
    1. The FIFOs connecting the child and top proc have a minimum depth of 1.

These restrictions mean that every cycle both procs can activate fully, and there is always data available to the top proc from the child proc.

While troubleshooting this and before realizing that this a fundamental limitation of the interpreters, I had written a unit test that exercises this here: https://github.com/schilkp/xls/tree/schilkp/interpreter_is_not_greedy - Should that be of use to anyone.

Extracting Proc Signature/Logging output channels

When extracting the "signature" of the top proc to generate test vectors, the sample generator previously used the proc members. However, with interface-to-child and child-to-body/members channels,
this is not correct:

  • Input channels to the top proc that are directly connected to the child proc are missed.
  • Channels from the child proc to the top proc are treated as top inputs.
  • ...

Instead I - for now - I grab the set of input channels from the config function parameters. This is "good enough" for the current AST generator, but only works because we give the proc member the same name as the config function parameter that feeds it. The name of the actual input to the proc is the name of the proc member and not the config function parameter after all.

Similarly, the interpreters would dump the values send to any channel as the proc output. I restricted it to only dump channels that are kSendOnly which seems to do the trick but feels a bit hacky.

I guess this might all get easier with proc-scoped channels?


Sorry for another wall-of-text...

Cheers!

schilkp added 5 commits April 2, 2025 19:46
Allows manual driving of the random ast generator. Useful for quick
tests while working on it.
…stable `Context`

Previously, the properties of a generated proc were tracked directly as
a member of `AstGenerator`. This makes it impossible to generate
another supplemental proc during the generation of a top proc.
Previously, the random proc AST generator would start with zero
channels, and, every time a ChannelOp was generated, add a new channel
of the appropriate type.

This updates the generator to first pick a random proc "signature" (i.e.
set of input/output channels) and then generate a matching proc body.
Since all channels must be interacted with, the AST generator now tracks
which channels have already been "used", and will generate additional
ChannelOps for all unused channels at the end of the `next` function if
any are left.

Note that this requires the ability to always generate a `ChannelOp` for
any channel, so the decision logic to pick predicates and values for
sends has been update to generate a new literal value if no matching
value can be found in the environment.

With this change, we can now generate procs of a pre-determined
"signature", just like we can for functions. This will make
generating and fuzzing multi-proc ASTs/procs which span child procs
significantly easier down the road.

In combination, these changes significantly increase the number of channels
per proc and channel-ops seen when fuzzing. This exposes a new AST
generator/fuzzer limitation: The combinational proc codegen backend has
stricter requirements for the exclusivity of output channels that are
not guaranteed by the AST generator, leading to many fuzzer crashes.
This has been added as a "known_failure" since it seems to be a
limitation of the AST generator, not the codegen backend.
When generating a proc, if enabled, we sometimes generate a random child
proc that interacts with the parent proc.

For now, I generate at most one child proc.

Currently, the "spawn-depth" is limited to one.
@ericastor
Copy link
Collaborator

@cdleary @meheff @richmckeever - I think you all might want to take a look at this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants