-
Notifications
You must be signed in to change notification settings - Fork 195
modules/zstd: add frame header parser #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f1417d6
to
63bb702
Compare
@proppy we updated the code, please take a look:
C++ tests generate 250 test cases with pseudorandom valid FrameHeaders. Test cases run in parallel through Please do note that this branch is based on #1167 and it also includes commits cherry-picked from #1166. Please ignore those when reviewing this PR. EDIT: I also added next batch of parallelized tests. Those generate random vectors of bytes of length from 0 to 32 bytes and then run those vectors through |
3ae186d
to
e6dc8d2
Compare
strip_prefix = "zstd-1.4.7", | ||
urls = ["https://github.com/facebook/zstd/releases/download/v1.4.7/zstd-1.4.7.tar.gz"], | ||
build_file = "@//dependency_support/com_github_facebook_zstd:bundled.BUILD.bazel", | ||
patches = ["@com_google_xls//dependency_support/com_github_facebook_zstd:decodecorpus.patch"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment about the patch? has it been submitted upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment.
Changes were not submitted upstream yet. The patch modifies decodecorpus
utility from zstd library. Unmodified tool allows generating valid zstd frames with randomized contents and size. With our modifications it is possibile to generate only some parts of the zstd frame. In this case we modify decodecorpus
so that it allows generating only frame headers.
args[3] = "-p" + std::string(output_path); | ||
|
||
XLS_ASSIGN_OR_RETURN(auto result, CallDecodecorpus(args)); | ||
auto raw_data = ReadFileAsRawData(output_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we need to use a real file, rather than handling in memory buffer returned by libstd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not possible in original decodecorpus
and we wanted to provide the minimal changes required for generating only frame headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a "big change" to patch decodecorpus
further so that it exposes its intermediate functions (IIRC they are currently static
)?
We could remove the main from the XLS build and directly call them and consume their outputs (rather than dealing with processes and files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was discussed with you on one of the meetings. We settled on leaving decodecorpus
as it is.
|
||
TEST_P(ZSTDFrameHeaderSeededTest, ParseMultipleFrameHeaders) { | ||
auto seed = GetParam(); | ||
auto frame_header = zstd::GenerateFrameHeader(seed, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this redundant to what we're doing in GenerateRandomFrameHeader
or does it have more coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenerateFrameHeader()
calls decodecorpus in order to generate frame header which is valid in case of zstd frame header specification. The contents and length of those frame headers are randomized but the header is always valid - it should always be correctly parsed both by libzstd and by our decoder (frame header parser).
GenerateRandomFrameHeader()
generates completely random vector of bytes of randomly picked size from range 0 to arbitrary max size of 32 bytes. With this generator we are able to test more 'negative' test cases, meaning:
- Not enough data in buffer for finishing parsing
- Corrupted header - e.g.
reserved
bit inFrame_Header_Descriptor
is set - Handling of buffers larger than bare minimum for parsing frame header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, thanks for the explanation!
Would it makes sense to add more docstring to the tests and helpers so that it's more obvious to future reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more comments
true, true); | ||
} | ||
|
||
std::vector<uint8_t> GenerateRandomFrameHeader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if #476 would help doing this kind of things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of testing approach would be great if we didn't have decodecorpus
which always generate valid zstd frames/frame headers. We could say that decodecorpus has embedded in itself the properties that describe valid frame header.
Without it we would have to manually specify those properties in form of multiple RC_ASSERT()
statements in rapidcheck test cases in order to generate valid test data. I think that when it comes to 'positive' tests (comparing against valid frame headers) it is better to use decodecorpus.
However, I'm not sure how would it look like for 'negative' tests (generating potentially invalid frame header data and comparing parsing through libzstd against simulation of our decoder).
Frame Header parsing should fail basically only in 2 situations:
- there was not enough data to finish the parsing
- reserved bit in frame header descriptor was set
Currently we generate random test vectors in order to check if decoder fails when those situations occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like https://github.com/google/fuzztest could be useful; I'd be curious to know what @ericastor as a migrated a lot of "random generation tests" to fuzztest in 1cfc4cc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that using GoogleFuzzTest here would probably result in better coverage - though it might end up discovering some valid headers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I've replaced generating random frame headers with FuzzTest
. It is true that it can discover valid headers. We can work on constraining the tested domain further to precisely test only invalid frame headers but I believe it is better to do a general check with valid and invalid headers.
600d73f
to
95685e8
Compare
95685e8
to
d7e203c
Compare
77a2ce2
to
c114b98
Compare
Fixed errors in fuzz tests caused by differences in priority of reporting errors between |
c114b98
to
ce6194d
Compare
ce6194d
to
1236ea3
Compare
I've noticed a failure in one of the CI runs caused by Here is the failure log:
I've already investigated the issue and managed to fix it. It was caused by not discarding ZSTD frames with EDIT: Updated the PR with fixes |
1236ea3
to
5c44314
Compare
fb2294d
to
070089f
Compare
070089f
to
bdacd4d
Compare
This commit adds a DSLX Buffer library that provides the Buffer struct, and helper functions that can be used to operate on it. The Buffer is meant to be a storage for data coming from the channel. It acts like a FIFO, allowing data of any length to be put in or popped out of it. Provided DSLX tests verify the correct behaviour of the library. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]>
This commit adds a simple test that shows, how one can use the Buffer struct inside a Proc. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]>
Signed-off-by: Pawel Czarnecki <[email protected]>
This commit adds the library with functions for parsing a magic number and tests that verify its correctness. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]>
This commit adds the library with functions for parsing a frame header. The provided tests verify the correcness of the library. Internal-tag: [#49967] Co-authored-by: Roman Dobrodii <[email protected]> Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
Internal-tag: [#53329] Signed-off-by: Pawel Czarnecki <[email protected]>
Required for expected_status inference in C++ tests for ZSTD decoder components Internal-tag: [#53465] Signed-off-by: Pawel Czarnecki <[email protected]>
Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]>
This commit adds a binary that calls decoding to generate data and loads it into a vector of bytes. Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]>
Internal-tag: [#50967] Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
bdacd4d
to
e25cbe1
Compare
google#1168 modules/zstd: Add library for parsing magic number This commit adds the library with functions for parsing a magic number and tests that verify its correctness. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add library for parsing frame header This commit adds the library with functions for parsing a frame header. The provided tests verify the correcness of the library. Internal-tag: [#49967] Co-authored-by: Roman Dobrodii <[email protected]> Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]> modules/zstd/frame_header: Add benchmarking rules Internal-tag: [#53329] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support/libzstd: Make zstd_errors.h public Required for expected_status inference in C++ tests for ZSTD decoder components Internal-tag: [#53465] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support: Add decodecorpus binary Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add data generator library This commit adds a binary that calls decoding to generate data and loads it into a vector of bytes. Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add zstd frame header tests Internal-tag: [#50967] Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
google#1168 modules/zstd: Add library for parsing magic number This commit adds the library with functions for parsing a magic number and tests that verify its correctness. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add library for parsing frame header This commit adds the library with functions for parsing a frame header. The provided tests verify the correcness of the library. Internal-tag: [#49967] Co-authored-by: Roman Dobrodii <[email protected]> Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]> modules/zstd/frame_header: Add benchmarking rules Internal-tag: [#53329] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support/libzstd: Make zstd_errors.h public Required for expected_status inference in C++ tests for ZSTD decoder components Internal-tag: [#53465] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support: Add decodecorpus binary Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add data generator library This commit adds a binary that calls decoding to generate data and loads it into a vector of bytes. Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add zstd frame header tests Internal-tag: [#50967] Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
should we close this and focus on reviewing #1315 ? |
This PR adds Zstd frame header parsing machinery written in DSLX + tests for it.
NOTE: this is based on #1167 , please ignore commits from that branch when reviewing.