Skip to content

Conversation

@p-datadog
Copy link
Member

@p-datadog p-datadog commented Dec 5, 2025

What does this PR do?

Adds two C extension functions for dynamic instrumentation:

  1. all_iseqs: returns RubyVM::InstructionSequence objects for loaded code. Primarily useful/needed for instrumenting third-party code with line probes.
  2. exception_message: returns the argument passed to Exception constructor, which is normally the message string for built-in classes.

Motivation:

  1. To set line probes, DI requires an InstructionSequence object (hereafter "iseq") to target the trace point. These objects are generally not available for files that are already loaded, from Ruby code. The VM does have these objects but they are not exposed to Ruby side. The all_iseqs method accesses these objects and exposes them to Ruby code.

  2. One of DI constraints is we do not call customer code. We want to report exception messages to customers since these are generally very useful, however the messages could be overridden (by defining a +message+ method on exception classes) and DI is not allowed to call customer methods.

A workaround for this is to access the basic exception message that is stored in the exception attributes at C level. This is what is returned for built-in classes like NameError. The exception_message method returns this message, ignoring any override of +message+ in Ruby code.

Caveats:

a) exception_message is not required to be a string, or a message at all. This is simply what was passed to the base Exception constructor. Ruby does not enforce the type of this argument and arbitrary objects can be passed and will be returned by exception_message subsequently.

b) A custom exception class also may not take the message as a constructor parameter, instead returning it via the overridden +message+ method for example. These cases aren't handled by exception_message which may return "nonsense" - the first constructor argument. It's still safe to return arbitrary values because they go through DI serializer which is itself safe.

Presently DI does not report any exception messages at all. A follow-up PR willl be to incorporate exception_message into the rest of DI code at which point the messages from base classes will have correct messages reported, and exceptions from custom classes may or may not have correct messages reported depending on the class' implementation.

Change log entry
None

Additional Notes:

This is an innovation week project.

The C code is in libdatadog_api extension, however it does not use libdatadog. This was done to not create another extension.

Follow-up work:
/ Integrate the functionality added in this PR with the rest of DI: use exception_message and populate the iseq map with the loaded iseqs
/ Use trace_points field of the iseqs to detect line probes targeting lines that have no executable code (and also no :return trace point), produce an error in this case instead of installing a probe that will never produce events
/ Report ranges of executable code to SymDB

How to test the change?

Unit tests added

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Thank you for updating Change log entry section 👏

Visited at: 2025-12-05 17:39:23 UTC

@p-datadog p-datadog mentioned this pull request Dec 5, 2025
@pr-commenter
Copy link

pr-commenter bot commented Dec 5, 2025

Benchmarks

Benchmark execution time: 2025-12-08 21:56:17

Comparing candidate commit 8be2f98 in PR branch di-c-ext with baseline commit 48b4d79 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:error - without error tracking, with_error=true

  • 🟩 throughput [+587.740op/s; +634.904op/s] or [+5.218%; +5.636%]

@p-datadog p-datadog marked this pull request as ready for review December 8, 2025 21:26
@p-datadog p-datadog requested review from a team as code owners December 8, 2025 21:26
@datadog-datadog-prod-us1
Copy link
Contributor

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 33.33%
Total Coverage: 95.23% (-0.00%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 8be2f98 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@ivoanjo
Copy link
Member

ivoanjo commented Dec 9, 2025

The C code is in libdatadog_api extension, however it does not use libdatadog. This was done to not create another extension.

I think this is the right decision -- managing multiple extensions adds complexity AND makes installation slower.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! Left a few notes :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not sure this is very valuable -- I think having the init header inlined in ìnit.c` is fine if there's no C APIs being exposed. (It saves us from a bit of an explosion of files)

(Same note for di.h)

Comment on lines +5 to +10
inline int ddtrace_imemo_type(VALUE imemo) {
// This mask is the same between Ruby 2.5 and 3.3-preview3. Furthermore, the intention of this method is to be used
// to call `rb_imemo_name` which correctly handles invalid numbers so even if the mask changes in the future, at most
// we'll get incorrect results (and never a VM crash)
return (RBASIC(imemo)->flags >> FL_USHIFT) & IMEMO_MASK;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Do you maybe want to move this to datadog_ruby_common.h (making it static inline) so we'd share it between both extensions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are not shared among all files, I'd suggest defining the prototypes in the files that need them (organization in C is hard enough, so I think there's value in avoiding an explosion of files, if we can)

Comment on lines +12 to +22
// Returns whether the argument is an IMEMO of type ISEQ.
inline bool ddtrace_imemo_iseq_p(VALUE v) {
if (rb_objspace_internal_object_p(v)) {
if (RB_TYPE_P(v, T_IMEMO)) {
if (ddtrace_imemo_type(v) == IMEMO_TYPE_ISEQ) {
return true;
}
}
}
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'd suggest maybe moving this to di.c, since this is only used there?

Comment on lines +14 to +21
if (rb_objspace_internal_object_p(v)) {
if (RB_TYPE_P(v, T_IMEMO)) {
if (ddtrace_imemo_type(v) == IMEMO_TYPE_ISEQ) {
return true;
}
}
}
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:

is (this)
  a really (awkward way)
    of (writing)
      return rb_objspace_internal_object_p(v) && RB_TYPE_P(v, T_IMEMO) && ddtrace_imemo_type(v) == IMEMO_TYPE_ISEQ;

Comment on lines +6 to +8
#ifndef DDTRACE_UNUSED
#define DDTRACE_UNUSED __attribute__((unused))
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already defined in datadog_ruby_common.h btw ;)

Comment on lines +15 to +33
struct ddtrace_di_os_each_struct {
VALUE array;
};

static int ddtrace_di_os_obj_of_i(void *vstart, void *vend, size_t stride, void *data)
{
struct ddtrace_di_os_each_struct *oes = (struct ddtrace_di_os_each_struct *)data;
VALUE array = oes->array;

VALUE v = (VALUE)vstart;
for (; v != (VALUE)vend; v += stride) {
if (ddtrace_imemo_iseq_p(v)) {
VALUE iseq = rb_iseqw_new((void *) v);
rb_ary_push(array, iseq);
}
}

return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth mentioning here that we're doing the same as https://github.com/ruby/debug/blob/master/ext/debug/iseq_collector.c although we did arrive at it independently in the beginning (but when there's a reference for what we're doing that's maintained by ruby core, I think that's a very relevant reference to keep an eye on!)

Comment on lines +39 to +84
There are several types of iseqs:

- The ones from eval'd code. These have a nil +absolute_path+.
- The ones for a whole loaded file. These have +absolute_path+ set
and have a +first_lineno+ of 0.
- The ones for a particular method defined in a file. These have
+absolute_path+ set and +first_lineno+ of greater than 0.

The first type, eval'd iseqs, are not currently of interest to DI
because the UI only supports line probes defined on a line of
source file and we interpret the lines as the "base layer" of source.

The second type, iseqs for a whole file, are only available for a
relatively small subset of loaded files. My theory is that after a
file is fully loaded, its complete iseq is no longer needed for
anything and is subject to garbage collection.

The full-file iseqs are easiest to deal with from the DI perspective
as we just need to match the file path to the probe specification and
we can use the full-file iseq to target any line in the file.

The third type, iseqs for a method, is the only iseqs we have available
for much of third-party code. They require DI to identify the correct
iseq object in a particular file that contains the line that the probe
is trying to instrument. Doing so, as far as I can tell, requires
examining the iseq's instructions because a method can define another
method via +define_method+, in which case the line numbers within one
method definition are not contiguous and methods are nested.

The +trace_points+ method of the iseq object appears te be the easiest
way of accessing the line numbers that correspond to that iseq.

Finally, it is possible for the same line of code to be present in
multiple methods. Consider for example:

class Foo
def self.bar; define_method(:dynamic) { 42 } end
end

After +bar+ executes, it will have defined the method +dynamic+ which
exists on the same lines of code as +bar+.

This situation is considered an error in DI/LD products - a probe
specification must resolve to exactly one code location, because the UI
only has provision to how one code location as the instrumentation target.
Thus, in this case, instrumentation should produce an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relevant info... and yet I'm not sure it belongs here attached to this method. E.g. this is more on the consumer side "how to deal with the data we get from all_iseqs" than "this is about all_iseqs". Maybe move it elsewhere...?

Comment on lines +87 to +92
struct ddtrace_di_os_each_struct oes;

oes.array = rb_ary_new();
rb_objspace_each_objects(ddtrace_di_os_obj_of_i, &oes);
RB_GC_GUARD(oes.array);
return oes.array;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:

  1. It's possible to pass the array directly into rb_objspace_each_objects as a pointer (without the extra struct).
  2. The RB_GC_GUARD is not needed since the compiler MUST keep the array around on the stack -- it's going to return it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants