-
Notifications
You must be signed in to change notification settings - Fork 13
ekump/APMSP-2151 create ddsketch ffi crate #1135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
BenchmarksComparisonBenchmark execution time: 2025-07-09 23:49:39 Comparing candidate commit 26d59f7 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 2 unstable metrics. CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
BaselineOmitted due to size. |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
c10edbf
to
c7e835c
Compare
c7e835c
to
d10aeef
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1135 +/- ##
========================================
Coverage 71.27% 71.27%
========================================
Files 343 346 +3
Lines 52396 52611 +215
========================================
+ Hits 37347 37501 +154
- Misses 15049 15110 +61
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see ddsketch joining the family! :D
@@ -79,6 +79,7 @@ COPY "ddtelemetry-ffi/Cargo.toml" "ddtelemetry-ffi/" | |||
COPY "datadog-log/Cargo.toml" "datadog-log/" | |||
COPY "datadog-log-ffi/Cargo.toml" "datadog-log-ffi/" | |||
COPY "ddsketch/Cargo.toml" "ddsketch/" | |||
COPY "ddsketch-ffi/Cargo.toml" "ddsketch-ffi/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I see a cargo run --bin release ...
below what doesn't list ddsketch
... Should it?
cargo run --bin release --features profiling,telemetry,data-pipeline,symbolizer,crashtracker,library-config,log --release -- --out | ||
cargo run --bin release --features profiling,telemetry,data-pipeline,symbolizer,crashtracker,library-config,log,ddsketch --release -- --out | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I'm a bit surprised there's no single place we can update with this... Should we maybe replace it with build-profiling-ffi
or something similar?
|
||
[export] | ||
include = ["ddsketch-ffi"] | ||
prefix = "ddog_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ddog_ddsketch
by default?
prefix = "ddog_" | |
prefix = "ddog_ddsketch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or possibly just ddsketch
. Like if we were greenfielding this thing I'd say ddog_sketch
but ddsketch
has been used for years as a name and I believe that includes papers and such.
/// Structure that contains error information that DDSketch FFI API can return. | ||
#[repr(C)] | ||
#[derive(Debug)] | ||
pub struct DDSketchError { | ||
pub code: DDSketchErrorCode, | ||
pub msg: CString, | ||
} | ||
|
||
impl DDSketchError { | ||
pub fn new(code: DDSketchErrorCode, msg: &str) -> Self { | ||
Self { | ||
code, | ||
msg: CString::new_or_empty(msg), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the need for this -- is ddcommon_ffi::Result
and ddcommon_ffi::Error
not usable for the same purpose?
pub unsafe extern "C" fn ddog_ddsketch_error_free(error: Option<Box<DDSketchError>>) { | ||
drop(error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping we may be able to move away from the ddsketch-specific error code; but if not, I think this should be _drop
, not _free
, for consistency with other apis?
/// A bin from a DDSketch containing a value and its weight. | ||
#[repr(C)] | ||
#[derive(Clone, Copy)] | ||
pub struct DDSketchBin { | ||
pub value: f64, | ||
pub weight: f64, | ||
} | ||
|
||
/// Returns the ordered bins from the DDSketch. | ||
/// | ||
/// # Safety | ||
/// | ||
/// The `sketch` parameter must be a valid pointer to a DDSketch instance. | ||
/// The returned bins must be freed with `ddog_ddsketch_bins_drop`. | ||
/// Returns empty bins if sketch is null. | ||
#[no_mangle] | ||
pub unsafe extern "C" fn ddog_ddsketch_ordered_bins( | ||
sketch: Option<&DDSketch>, | ||
) -> ffi::Vec<DDSketchBin> { | ||
let sketch = match sketch { | ||
Some(s) => s, | ||
None => return ffi::Vec::new(), | ||
}; | ||
|
||
let bins = sketch.ordered_bins(); | ||
let result: Vec<DDSketchBin> = bins | ||
.into_iter() | ||
.map(|(value, weight)| DDSketchBin { value, weight }) | ||
.collect(); | ||
|
||
ffi::Vec::from(result) | ||
} | ||
|
||
/// Drops a DDSketchBins instance. | ||
/// | ||
/// # Safety | ||
/// | ||
/// Only pass a valid DDSketchBins instance. | ||
#[no_mangle] | ||
pub unsafe extern "C" fn ddog_ddsketch_bins_drop(bins: ffi::Vec<DDSketchBin>) { | ||
drop(bins); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious -- do we need to expose this part of the API? Is it mostly for tests? E.g. My understanding of ddsketch is that it was mostly a write-only affair -- drop points in, and at the end, get the "gist" of it and report it back.
fn test_ddsketch_bins_manual() { | ||
let bins_vec = vec![ | ||
DDSketchBin { | ||
value: 1.0, | ||
weight: 1.0, | ||
}, | ||
DDSketchBin { | ||
value: 2.0, | ||
weight: 1.0, | ||
}, | ||
]; | ||
|
||
let bins = ffi::Vec::from(bins_vec); | ||
assert_eq!(bins.len(), 2); | ||
assert!(!bins.is_empty()); | ||
|
||
// Test that we can access the data through the slice | ||
let slice = bins.as_slice(); | ||
assert_eq!(slice[0].value, 1.0); | ||
assert_eq!(slice[0].weight, 1.0); | ||
assert_eq!(slice[1].value, 2.0); | ||
assert_eq!(slice[1].weight, 1.0); | ||
|
||
unsafe { | ||
ddog_ddsketch_bins_drop(bins); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this test -- what are we testing here again?
#[test] | ||
fn test_error_with_null_bytes() { | ||
let code = DDSketchErrorCode::InvalidInput; | ||
let error = Box::new(DDSketchError::new(code, "Error with\0null bytes")); | ||
|
||
assert_eq!(error.code, DDSketchErrorCode::InvalidInput); | ||
let msg = error.msg.as_cstr().into_std().to_str().unwrap(); | ||
assert_eq!(msg, ""); // Should fall back to empty string | ||
|
||
unsafe { ddog_ddsketch_error_free(Some(error)) }; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: my comment in ddcommon for the null bytes in the middle of the string, I don't quite understand how this would happen in normal code? 👀
// Clean up the sketch (note: sketch is consumed by ddog_ddsketch_encode) | ||
// ddog_ddsketch_drop is not called here because the sketch was consumed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "the pointer gets consumed" design is a bit error-prone to the caller, which has just been left with a dangling pointer.
One thing we've done in a lot of the profiling apis is to require that the pointer location gets passed in, and set it to NULL when we consume the pointer, so that it doesn't get accidentally reused.
pub struct DDSketchError { | ||
pub code: DDSketchErrorCode, | ||
pub msg: CString, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do keep DDSketchError
(see my note below), I believe it should be renamed ddog_ddsketch_DDSketchError
/ ddog_ddsketch_DDSketchErrorCode
, right now it's missing the prefixes in the .h
files 👀
What does this PR do?
Initial creation of the ddsketch-ffi crate. Exposes public API of ddsketch for FFI.
Motivation
DSM would like to use the libdatadog implementation of ddsketch for dd-trace-rb.
Additional Notes
Also added a helper method for
Cstring
tounwrap_or_default
to an empty string (which also requires an unwrap, even though it's safe) to reduce the need to add clippy allow annotations all over the codebase for something that shouldn't ever panic.How to test the change?
Describe here in detail how the change can be validated.