-
Notifications
You must be signed in to change notification settings - Fork 721
Use file content heuristics to decide file reader. #1962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…sed on the magic number.
…ics detection method.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1962 +/- ##
==========================================
+ Coverage 83.41% 83.43% +0.01%
==========================================
Files 311 311
Lines 55002 55197 +195
Branches 12098 12145 +47
==========================================
+ Hits 45878 46051 +173
- Misses 7889 7894 +5
- Partials 1235 1252 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Tests/Pcap++Test/Tests/FileTests.cpp
Outdated
PTF_ASSERT_NOT_NULL(dynamic_cast<pcpp::PcapNgFileReaderDevice*>(genericReader)); | ||
PTF_ASSERT_TRUE(genericReader->open()); | ||
// ------- IFileReaderDevice::createReader() Factory | ||
// TODO: Move to a separate unit test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the following to get more coverage:
- Open a snoop file
- Open a file that is not any of the options
- Open pcap files with different magic numbers
- Assuming we add a version check for snoop and pcap file: create temp files with bogus data that has the magic number but wrong versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3d713ab adds the following tests:
- Pcap, PcapNG, Zst file with correct content + extension
- Pcap, PcanNG file with correct content + wrong extension
- Bogus content file with correct extension (pcap, pcapng, zst)
- Bogus content file with wrong extension (txt)
Haven't found a snoop file to add. Do we have any?
Open pcap files with different magic numbers
Do you mean Pcap content that has just its magic number changed? Because IMO it is reasonable to consider that invalid format and fail as regular bogus data.
Assuming we add a version check for snoop and pcap file: create temp files with bogus data that has the magic number but wrong versions
Pending on #1962 (comment) .
Move it out if it needs to be reused somewhere.
Libpcap supports reading this format since 0.9.1. The heuristics detection will identify such magic number as pcap and leave final support decision to the pcap backend infrastructure.
@Dimi1010 some CI tests fail... |
Tests/Pcap++Test/Tests/FileTests.cpp
Outdated
} | ||
}; | ||
|
||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_Pcap_MicroPrecision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to test a real pcap file, maybe we can add syntethic files that have a different magic number to test all options?
We don't have to put them in PcapExample/file_heuristics
, instead we can create vectors with the content std::vector<uint8_t>
and save them to temp files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what would be the purpose? Just to test that it returns nullptr
?
Doesn't TestIFileReaderDeviceFactory_Invalid
already handle that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the other possible magic numbers of a valid pcap file. Since it's not easy to find such pcap files, we can generate synthetic files that are not actually valid, but will look valid for the sake of the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you want a spoofed pcap sample for just these:
// Libpcap 0.9.1 and later support reading a modified pcap format that contains an extended header.
// Format reference: https://wiki.wireshark.org/Development/LibpcapFileFormat#modified-pcap
0xa1'b2'cd'34, // Alexey Kuznetzov's modified libpcap format
0x34'cd'b2'a1 // Alexey Kuznetzov's modified libpcap format (byte-swapped)
or for the byte swapped versions of micro and nano too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we have spoofed pcap samples for all options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, if we really want to do a unit test on every magic number, this would be easier to do by exposing CaptureFileFormatDetector
in the header under internal
and unit testing on the passed std::istream
content directly than to have a spoofed pcap file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it so hard to create those spoofed pcap files just for the tests? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbf, no. I can have them done.
My idea is that the scenario would essentially test the content detection system and not the factory function creating the devices due to the fact that the devices would be "invalid." If the tests are done through factory function, it can't test to open the device, etc.
Having it done directly on the detection system would remove the requirements for external files as that operates on streams.
Of course, that comes with the tradeoff of having the detection system exposed in the headers as it needs to be referencable.
Updated pcap file detection to return the precice format of Pcap instead of just `true` / `false`. Updated detect format to always retuirn the detected format. Previous responsibility for unsupported zstd archive files has been passed up the call stack to the factory function `createReader`.
…ptimizations and branch pruning.
…on 1 line and doxygen errors when its in 2 lines.
|
||
PTF_TEST_CASE(TestReaderFactory_Snoop) | ||
{ | ||
constexpr const char* SNOOP_FILE_PATH = EXAMPLE_SOLARIS_SNOOP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this variable is not needed, we can just use EXAMPLE_SOLARIS_SNOOP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have it to have a level of detachment from the global macro, if it needs to be changed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add createReader()
to the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for the scenarios that throw. The success branches should be covered by tryCreate
🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can add one or two success cases, but not necessary
The PR adds heuristics based on the file content that is more robust than deciding based on the file extension.
The new decision model scans the start of the file for its magic number signature. It then compares it to the signatures of supported file types [1] and constructs a reader instance based on the result.
A new function
createReader
andtryCreateReader
has been added due to changes in the public API of the factory.The functions differ in the error handling scheme, as
createReader
throws andtryCreateReader
returnsnullptr
on error.Method behaviour changes during erroneous scenarios:
getReader
createReader
tryCreateReader
nullptr
PcapFileDeviceReader
nullptr