-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashtracking for Windows #892
base: main
Are you sure you want to change the base?
Conversation
BenchmarksComparisonBenchmark execution time: 2025-02-27 14:15:50 Comparing candidate commit 0beb9a1 in PR branch Found 4 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 2 unstable metrics. scenario:credit_card/is_card_number/x371413321323331
scenario:credit_card/is_card_number_no_luhn/x371413321323331
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
BaselineOmitted due to size. |
aa6bc16
to
14136df
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #892 +/- ##
==========================================
+ Coverage 71.77% 71.79% +0.02%
==========================================
Files 328 328
Lines 48577 48702 +125
==========================================
+ Hits 34866 34967 +101
- Misses 13711 13735 +24
|
use std::ptr::{addr_of, read_unaligned}; | ||
use windows::core::{w, HRESULT, PCWSTR}; | ||
use windows::Win32::Foundation::{BOOL, ERROR_SUCCESS, E_FAIL, HANDLE, HMODULE, S_OK, TRUE}; | ||
#[cfg(target_arch = "x86_64")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider ARM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future maybe, but for now ARM on Windows Server is not a thing (only the desktop version of Windows runs on ARM)
let mut path = env::temp_dir().join(process_name); | ||
path.set_extension("dll"); | ||
|
||
// Attempt to move it just in case it already exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log something here? Is this unexpected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reused the logic from the trampoline (since the need is the same): https://github.com/DataDog/libdatadog/blob/main/spawn_worker/src/win32.rs#L48
The filename is made of the user SID and the version number, so if multiple instances of PHP are running they will share the same file. I think this is a good thing for crashtracking because we need to add the path to the registry, and I'm afraid we would add a lot of garbage if the path was random.
.file("src/crashtracking_trampoline.cpp") // Path to your C++ file | ||
.warnings(true) | ||
.warnings_into_errors(true) | ||
.flag("/std:c++17") // Set the C++ standard (adjust as needed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does having a C++
binary increase the size of libdatadog vs C
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably but the size is still reasonable. The only reason I used C++ is because it has regex support in the stdlib. The DLL size is 160 KB, I believe it's acceptable (it was ~60 KB in C with manual parsing).
|
||
if (!EnumProcessModules(process, nullptr, 0, &cbNeeded)) | ||
{ | ||
OutputDebugStringW(L"Failed to enumerate process modules (1st)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 1st vs 2nd mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call EnumProcessModules
twice (first to get the number of modules, then to populate them). It's simply to know if we failed in the first or the second call.
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
write!( | ||
f, | ||
"{:08x}{:04x}{:04x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is standard guid formatting on Windows, but without the dashes: https://devblogs.microsoft.com/oldnewthing/20220928-00/?p=107221
let debug_data_dir: IMAGE_DATA_DIRECTORY = if is_pe32 { | ||
let nt_headers32: IMAGE_NT_HEADERS32 = read_memory(process_handle, nt_headers_address)?; | ||
nt_headers32.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_DEBUG.0 as usize] | ||
} else { | ||
let nt_headers64: IMAGE_NT_HEADERS64 = read_memory(process_handle, nt_headers_address)?; | ||
nt_headers64.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_DEBUG.0 as usize] | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation for why this is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a good official documentation about the PE format. It's mostly the definitions in the official headers (https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_nt_headers64) and then a bunch of third-party articles: https://learn.microsoft.com/en-us/archive/msdn-magazine/2002/february/inside-windows-win32-portable-executable-file-format-in-detail https://wiki.osdev.org/PE
For the Rust implementation, I simply converted the C++ code we wrote for crashtracking in .net: https://github.com/DataDog/dd-trace-dotnet/blob/master/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Windows/CrashReportingWindows.cpp#L271
which has proper testing: https://github.com/DataDog/dd-trace-dotnet/blob/master/profiler/test/Datadog.Profiler.Native.Tests/CrashReportingTest.cpp
We probably want to add similar testing in the libdatadog repository.
if thread_entry.th32OwnerProcessID == pid { | ||
thread_ids.push(thread_entry.th32ThreadID); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We loop over every thread on the machine? Could that be expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes. But as crazy as it sounds, this is the normal way.
In "recent" versions of windows (since 2012 R2) there is an alternative way that doesn't require to enumerate all threads (using something called "process snapshotting", which can be thought as Windows' sane version of vfork
). However we would need to confirm that it works correctly in the context of WER, so that requires additional research. Our implementation of crashtracking in .NET uses CreateToolhelp32Snapshot
, so I'd rather rely on this battle-tested solution for now.
What does this PR do?
A brief description of the change being made with this pull request.
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.