[RFC] Multi-Proc AST-gen + fuzzing #2027
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi All!
Because I wanted to dig around the AST generator/fuzzing subsystems a bit (and the other admin stuff I should actually be doing is totally boring), I tried my hand at improving the fuzzing subsystem to cover multi-proc DSLX programs.
I realize I committed the sin of "building a bunch of things without ever asking if you actually want it" - but my motivation for working on this was mostly personal interest. Since I probably won't have much more time to play around with it (the boring admin stuff is getting more pressing), I thought I would open this to get a sense if there are any bits and pieces in here that you would actually want. No worries if you don't. At the very least it forces me to write everything down in case someone ever does want to look into this.
I have marked this as a draft PR because I assume that, if you are interested in some part of this, it will be much easier if I break it out into multiple PRs. The code is also not yet "well-backed" in many places, so I would clean those parts up as well before actually sending them in of course:)
Motivation
From my side: Most of the bugs I have run into have been related to proc interaction/FIFOs (1 2 3) so I got thinking about how difficult it would be to cover some of these paths with the fuzzer.
In general, I feel there is some asymmetry between the test coverage of the "creating procs" and the "connecting procs together" parts of XLS. Hence it seemed potentially worthwhile to work on this a bit.
State
I mostly focused on the AST generator, and building the ground work for generating random hierarchies of procs. I have spent some time running these ASTs through the sample runner and worked on some of the immediate issues, but there are numerous points of friction left - see below.
In short, it is possible to now run the fuzzer reliably with the following flow:
Trying to run simlulation as well causes everything to go haywire. I have not dug into it much.
Implementation
Roughly, my implementation strategy is as follows:
Fix some small fuzzer/AST generator bugs (mostly already upstreamed)
Add an
ast_generator_main
tool to allow me to manually drive the AST generator. This made testing it during development a whole bunch easier :)Refactor the
AstGenerator
to pull all proc-related generation state fromAstGenerator
itself into a separateProcContext
insideAstGenerator::Context
, making it possible for proc generations to nest.Re-work proc generation to make it "signature driven", just like the function AST generation.
For functions, a random signature is chosen, then a matching function body is generated.
For procs, the state type was pre-determined, but channels were created every time a new
ChannelOp
was generated, changing the proc's interface.This patch reworks the proc generation to also first generate a proc interface (set of input/output channels), then generates a matching proc body.
This makes it much easier to generate proc hierarchies: When generating a child proc we can pick signature we want (as is done for child functions for
map
calls) instead of having to make a random child proc signature work.This significantly increased the number of channel ops generated, which is how I found the incorrect pipeline stage count propagation through counted for loops - It was result in tens of "too few pipeline stages" known failures a second. It also exposed some other limitations related to combinational proc codegen - see below.
Add the ability to generate proc hierarchies;
Some initial/WIP small hacks to the sample runner/generator to support multi-proc samples.
The commits mirror this flow, hopefully making them a bit easier to look at.
Wiring the Proc Config Function
Some notes about how the channels are routed in the generated config function, if we are generating a child:
Since the proc interface (the arguments of the proc's config function) are now pre-defined, we send a random portion of these channels to the generated child (
chns_io_child
). The rest of the channels are used as proc members and are interacted with in the proc next function (chns_io_body
).We also generate a random set of internal channels (
chns_child_body
) that connect the child proc to the proc members, allowing the us to interact with it from the next function. The required interface for the child proc is therefor the combination of channels connecting it to the parent proc interface and parent proc next function (chns_io_child
andchns_io_body
).This scheme is illustrated below:
I always use the following channel ordering:
Proc I/O:
[*chns_io_child, *chns_io_body]
Child I/O:
[*chns_io_child, *chns_child_body]
Proc Members:
[*chns_child_body, *chns_io_body]
Known Points of Friction
Some notes on places that will likely need attention to "finish" full multi-proc fuzzing.
Deadlocks (ofc..)
The immediate issue with random multi-proc ASTs is the potential for deadlock.
I added the following AST generation restrictions to reduce the chance of deadlock:
At a first glance it seemed to me that these ought to be enough to prevent dead locking of fuzzing samples, but there are still plenty of deadlocks to be had.
For now, I added a
known_failure
.There is a discussion to be had about if it might be worth actually fuzzing these deadlocking samples. If, instead of crashing if we can't tick N times, we tick "as many times as we can but at most N times", it might be possible to still check that these samples behave well. The non-greediness of interpreters would make this difficult thought (see below).
The AST generator does not know about "combinational proc codegen"
Related to this discussion here: #1996
It seems that the combinational proc codegen pipeline has more restrictive/different limitations on channel exclusivity.
The proc generator has no notion of these, and the sample runner will attempt to run the combinational backend with incompatible procs.
As far as I can see this is a limitation even without my changes, but the much higher frequency of channel ops being generated has surfaced it more.
For now, I have added this as a "known failure". It is fairly rare and a limitation of the fuzzer and not the codegen afterall.
"The interpreters are not greedy enough"
The interpreters are sensitive to the order of operations, even if the IR is semantically equivalent. This surfaces in "pseudo-deadlocks" (samples that are deadlocked but not detected as such because one proc is running in a circle doing nothing). This results in fuzzing failures because optimization can change the order of operations, causing such pseudo-deadlock samples to behave differently before and after optimization.
For example, consider this snippet:
Note that line
A
will never complete since the channel is always empty. Semantically, the send on lineB
is not blocked by this, meaning I would expect the proc to send to the output channel once. I guess this is what the physical circuit would do.However, the interpreters get "blocked" on line
A
, andB
never executes.If the two lines are swapped, we see the single send. If optimizations cause this swap to occur, we have a mismatch in behavior between optimized and un-optimized interpreted IR.
I currently work around this also using the deadlock mitigation restrictions:
These restrictions mean that every cycle both procs can activate fully, and there is always data available to the top proc from the child proc.
While troubleshooting this and before realizing that this a fundamental limitation of the interpreters, I had written a unit test that exercises this here: https://github.com/schilkp/xls/tree/schilkp/interpreter_is_not_greedy - Should that be of use to anyone.
Extracting Proc Signature/Logging output channels
When extracting the "signature" of the top proc to generate test vectors, the sample generator previously used the proc members. However, with interface-to-child and child-to-body/members channels,
this is not correct:
Instead I - for now - I grab the set of input channels from the config function parameters. This is "good enough" for the current AST generator, but only works because we give the proc member the same name as the config function parameter that feeds it. The name of the actual input to the proc is the name of the proc member and not the config function parameter after all.
Similarly, the interpreters would dump the values send to any channel as the proc output. I restricted it to only dump channels that are
kSendOnly
which seems to do the trick but feels a bit hacky.I guess this might all get easier with proc-scoped channels?
Sorry for another wall-of-text...
Cheers!