Experimental debugger API #9604

jcpetruzza · 2025-03-18T19:27:34Z

This PR is the complement to #9334 (adding the debug_line instruction) and together with #8670 (pausing proc timers) add the support on the VM that drives the new edb debugger.

These are main parts of the PR:

A new +D emulator flag is added. When given, the VM becomes "debuggable", which means that when modules are loaded, the debug_line instruction is instrumented similar to what is currently done for tracing-breakpoints, so that one can enable and disable breakpoints on that particular lines.
An experimental erl_debugger module is added, with a new debugging API. Essentially, it allows a single, local, process to be registered as the "debugger" process for the node. This process is the one that will receive messages notifying that a process hit a breakpoint. This way, we can decouple debugger implementations that consume this API (like edb) from OTP itself.
The erl_debugger module also exposes new BIFs to inspect X and Y registers of a suspended process. Together with the new code-information BIFs added in Add support for debug information for a native debugger #9334, this let's a debugger show the values of variables in scope for a suspended process.

github-actions · 2025-03-18T19:28:24Z

CT Test Results

38 files 1 003 suites 7h 26m 30s ⏱️
12 420 tests 11 713 ✅ 703 💤 4 ❌
26 816 runs 24 824 ✅ 1 988 💤 4 ❌

For more details on these failures, see this check.

Results for commit 1fb9829.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

bjorng · 2025-03-25T16:22:12Z

I've pushed a fixup commit with mainly documentation fixes.

erts/emulator/beam/emu/instrs.tab

erts/emulator/beam/jit/arm/beam_asm_module.cpp

jcpetruzza · 2025-04-01T10:59:22Z

Applied comment fixes
Rebased and updated code to use the new version of the debug_line format
Fixed beam_debug_info_SUITE
Updated comment style

Summary: This imports the latest version of [#9604](erlang/otp#9604). In particular, it is rebased on top of [#9646](erlang/otp#9646), that extended the `debug_line` instruction with an atom to distinguish the "entry" marker and the actual lines. Reviewed By: TD5 Differential Revision: D72258205 fbshipit-source-id: 1bdd29b37ce867b6c2b6b330b6c1a865b046baf0

Summary: We update the toolchain to the latest version of [#9604](erlang/otp#9604), which fixes the build for ARM. Diff of changes [here](https://github.com/jcpetruzza/otp/compare/edb-2025-04-03..edb-2025-04-01) Reviewed By: TD5 Differential Revision: D72394516 fbshipit-source-id: 755f9afa91a95099fd9037d6a51d36f003196e52

bjorng · 2025-04-04T07:55:09Z

For security, we have a policy that only members of the OTP team are allowed to add BEAM files to primary bootstrap or erts/preloaded/ebin. Therefore, I've rebuilt erts_internal.beam and force-pushed to your branch.

bjorng · 2025-04-04T08:28:30Z

I've revised the last commit to adhere to our comment conventions and force-pushed.

erts/emulator/beam/erl_debugger.h

bjorng · 2025-04-04T11:18:05Z

I've force-pushed a new version, where I have:

Run make format to update the formatting for C++ files in the JIT (and .exs files for the doc build).
Updated the headers.
Fixed a broken link in the documentation. (Added back ticks around t:stack_frame/0.)

We add a new erl_debugger module to the kernel application, that will provides an API for writing debuggers. Here we set up the general interface, with a stub for "setting a breakpoint" as the only debugging primitive. In the next commits we first implement the "breakpoint" primitive, and later add additional ones. About the general design: - A node needs to have debugging support enabled on startup with the `+D` option and this can't be enabled afterwards. This gives admins a way to prevent processes from blocking on prod due to an accidental debugger usage, etc. - Debugging primitives based on code instrumentation can be individually enabled/disabled, etc. - A debugger is an Erlang process; there can be at most one debugger process per node. - The system will communicate with the debugger process via `debugger_events` messages. For now we introduce only one message for notifying that a process hit a breakpoint

On module loading, `debug_line` are currently ignored. Now, when debugger support is enabled (and the line-breakpoint instrumentation enabled), we emit a `i_debug_line` instruction. For now, we only use it to add entries to the line-table, the actual breakpoint instrumentation is introduced next.

The logic was duplicated and out of sync. We want to add another internal function for dealing with breakpoints, so better to have all in one place

bjorng · 2025-04-04T14:34:25Z

Added to our daily builds.

bjorng · 2025-04-07T15:26:27Z

@jcpetruzza

This PR does not build on Windows (see the unsuccessful tests above).

It seems to be because of this new line in beam_asm.h:

void beamasm_emit_align(void *instance, uint32_t alignment);

The culprit seems to be the use of uint32_t. It is unclear to me why it does not work here; it seems to work in some other places in the runtime system.

I suggest that you replace all your new uses of uint32_t with Uint32, which is our own type that we use for all internal APIs.

jcpetruzza · 2025-04-07T16:53:07Z

@bjorng Interesting, I'll fix (need to get some windows box where I can test it first)

bjorng · 2025-04-08T06:13:54Z

It turns out that beamasm_emit_align() was neither defined nor used, so I removed its prototype, and now this PR builds on Windows. I've added it to our daily builds.

Regarding the uint32_t type on Windows, I think that it is defined when compiling C++ files, but not when compiling C files. Therefore, it should never be used in a .h file.

jcpetruzza · 2025-04-08T18:31:59Z

@bjorng Thanks for the quick fix! beamasm_emit_align() was a remnant of a previous version that I failed to remove from the header file. Re uint32_t, in C it is available via <stdint.h>; it is likely we were getting this include transitively in some platforms but not others.

bjorng · 2025-04-09T12:15:02Z

So now it seems to build on all platforms.

Unfortunately, we had multiple core dumps in our daily builds. Here is one of them:

Thread 1 (Thread 0x7fbf84fc0700 (LWP 7510)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fbfc687b859 in __GI_abort () at abort.c:79
#2  0x000055d2758e6579 in erl_assert_error (expr=expr@entry=0x55d275cb0958 "checked_cp_val((sp[0]),\"beam/erl_debugger.c\",627) == ((void *)0) || sp < (Eterm*)checked_cp_val((sp[0]),\"beam/erl_debugger.c\",627)", func=func@entry=0x55d275cb0a80 <__func__.24841> "erl_debugger_stack_frames_2", file=file@entry=0x55d275cb09db "beam/erl_debugger.c", line=line@entry=627) at sys/unix/sys.c:1082
#3  0x000055d2757cc4b9 in erl_debugger_stack_frames_2 (A__p=0x55d2779d2ab0, BIF__ARGS=, A__I=) at beam/erl_debugger.c:630
#4  0x000055d27568fd08 in debug_call_light_bif (c_p=0x55d2779d2ab0, reg=0x7fbf84fbbd40, I=0x7fbf750df61c , vbf=0x55d2757cc1a4 ) at beam/jit/x86/instr_bif.cpp:394
#5  0x00007fbf7400068b in global::call_light_bif_shared ()

This happens for a debug-enabled runtime with perf support. The layout of the stack is different when perf-support is enabled.

jcpetruzza · 2025-04-09T15:16:32Z

I'll look into this, thanks!

bjorng · 2025-04-10T04:59:00Z

I've now examined another core dump. It turns out that erl_debugger_SUITE leaves some processes in an inconsistent state, and if a later test suite calls erlang:system_info(procs) the runtime system could crash.

I've pushed a fixup commit that calls erlang:system_info(procs) at the end of each group. That makes it possible to trigger the crash by running only erl_debugger_SUITE. The runtime system needs to be built in the following way:

make TYPE=debug FLAVOR=emu

The runtime system needs to be started in the following way:

$ERL_TOP/bin/cerl -emu_type debug -emu_flavor emu

If running on a Mac (and possibly when compiling the runtime system with clang on Linux), it is also necessary to apply the following patch to avoid falsely triggering an assertion:

iff --git a/erts/emulator/beam/emu/beam_emu.c b/erts/emulator/beam/emu/beam_emu.c
index 892ef2cdce..2d34882fee 100644
--- a/erts/emulator/beam/emu/beam_emu.c
+++ b/erts/emulator/beam/emu/beam_emu.c
@@ -88,9 +88,7 @@
 #ifdef NO_JUMP_TABLE
 #  define VALID_INSTR(IP) (BeamCodeAddr(IP) < (NUMBER_OF_OPCODES*2+10))
 #else
-#  define VALID_INSTR(IP) \
-    ((BeamInstr)LabelAddr(emulator_loop) <= BeamCodeAddr(IP) && \
-     BeamCodeAddr(IP) < (BeamInstr)LabelAddr(end_emulator_loop))
+#  define VALID_INSTR(IP) 1
 #endif /* NO_JUMP_TABLE */
 
 #define SET_I(ip) \

jcpetruzza · 2025-04-10T14:25:49Z

@bjorng Re the erlang:system_info(procs) issue, it looks like this is a pre-existent problem that is exposed by the erl_debugger_SUITE. I've boiled it down to this, that also fails on master. Examining the stack-trace, the process with the issue has the same pid printed by the test:

-module(repro_SUITE).

-export([all/0, groups/0]).
-export([init_per_group/2, end_per_group/2]).

-export([test_stuff/1]).

-include_lib("common_test/include/ct.hrl").


all() ->
    [{group, g}].

groups() ->
    [{g, [test_stuff]}].

init_per_group(_Group, Config) ->
    Config.

end_per_group(_Group, _Config) ->
    true = is_binary(erlang:system_info(procs)),
    ok.

test_stuff(Config) ->
    Mod = dyn_module,
    compile_and_load_dyn_module(Config, []),

    erlang:register(?MODULE, self()),

    P = erlang:spawn(Mod, sync_and_hibernate, []),
    ct:pal("HIBERNATING PROCESS ~p~n", [P]),
    receive {sync, P} -> ok end,

    compile_and_load_dyn_module(Config, []),
    compile_and_load_dyn_module(Config, []),
    ok.

compile_and_load_dyn_module(Config, Opts) when is_list(Opts) ->
    PrivDir = proplists:get_value(priv_dir, Config),
    File = filename:join(PrivDir, "dyn_module.erl"),
    file:write_file(File, dyn_module_source()),

    {ok, Mod, Code} = compile:file(File, [binary, report | Opts]),
    {module, Mod} = code:load_binary(Mod, "", Code),
    ok.

dyn_module_source() ->
    ~"""
    -module(dyn_module).
    -export([wake_up/1, sync_and_hibernate/0]).

    wake_up(X) ->
        X + 1.

    sync_and_hibernate() ->
        repro_SUITE ! {sync, self()},
        erlang:hibernate(?MODULE, wake_up, [10]).
    """.

Running it as follows, crashes on 90% of the runs for me:

> git rev-parse HEAD                                    
72903d0f471da9f49000e7376c3c59562a76b0e9

> $ERL_TOP/bin/erl -emu_type debug -emu_flavor emu -eval 'ct:run(".", "repro_SUITE"), erlang:halt().'
Erlang/OTP 28 [RELEASE CANDIDATE 2] [erts-15.2.5] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [type-assertions] [debug-compiled] [lock-checking]

Eshell V15.2.5 (press Ctrl+G to abort, type help(). for help)

Common Test: Running make in test directories...

CWD set to: "/home/danielgo/repos/otp/foo/[email protected]_15.22.29"

TEST INFO: 1 test(s), 1 case(s) in 1 suite(s)

Testing otp.foo.repro_SUITE: Starting test, 1 test cases

----------------------------------------------------
2025-04-10 15:22:29.745
HIBERNATING PROCESS <0.179.0>


1> beam/erl_printf_term.c:402:checked_is_not_list() Assertion failed: TYPE ASSERTION: !is_header(x)
                                                                                                   [1]    1071937 IOT instruction (core dumped)  $ERL_TOP/bin/erl -emu_type debug -emu_flavor emu -eval

Looking into the perf one now

jcpetruzza · 2025-04-11T11:45:39Z

Added two fixups: 1) cleanup in a testcase to workaround the hibernation bug above, 2) handle FP with perf enabled for inspecting the stack. Breakpoints are not working correctly with perf enabled, looking into that now

jcpetruzza · 2025-04-11T14:33:07Z

Pushed fix for line-breakpoints with perf enabled

jcpetruzza · 2025-04-12T12:47:11Z

Pushed further fixes to ensure erl_debugger_SUITE passes on emu flavor as well

jcpetruzza · 2025-04-14T16:10:35Z

One more fixup: handle correctly setting/clearing breakpoints on a deleted module

Summary: We bumpt the toolchain to the latest version of the [#9604](erlang/otp#9604) Main changes: * Formatting changes * Fixes build on windows * Workaround issue in VM related to hibernated processes and code purging * Fix issues when running with perf support * Fix issues when running with flavor `emu` * Fix issue setting/clearing breakpoints on deleted module Full list of changes [here](https://github.com/jcpetruzza/otp/compare/edb-2025-04-03..edb-2025-04-14) Reviewed By: alanz Differential Revision: D72969015 fbshipit-source-id: 0bca3514ae30a07b91b37ed41a089439a67e34b7

bjorng · 2025-04-17T11:35:52Z

Thanks! Added to our daily builds.

We add trampolines on each debug_line instruction, similar to the ones used on function entry. The main difference is that we need to save all live X-registers on the stack before calling `erts_internal:breakpoint/4`, and thus may require a GC.

We want to reuse the logic currently use to flip beakpoints for tracing, etc. However, at the moment, the functions where this happens not only change the code, but also set some flags value that are specific to those cases... We factor out this part of the code to their own functions. In doing so, we observe that for x86-64, the code was referring to `BEAM_ASM_FUNC_PROLOGUE_SIZE` to determine the offset to use when "disabling" a breakpoint that doesn't seem relevant. The value is indeed 6 (see, e.g., the assertion on the "enabling" case), and this corresponds to: - 1 byte for the NOP 0x90 due to call alignment - 1 byte for the opcode of a `CALL` - 4 bytes for the 32-bit offset of the call target

bjorng added team:VM Assigned to OTP team VM feature labels Mar 19, 2025

bjorng assigned bjorng and jhogberg Mar 19, 2025

bjorng reviewed Mar 26, 2025

View reviewed changes

erts/emulator/beam/emu/instrs.tab Outdated Show resolved Hide resolved

erts/emulator/beam/jit/arm/beam_asm_module.cpp Outdated Show resolved Hide resolved

jcpetruzza force-pushed the erl_debugger branch 2 times, most recently from 4921f0a to 93f4ca9 Compare April 1, 2025 10:57

jcpetruzza force-pushed the erl_debugger branch from 93f4ca9 to f9d0b36 Compare April 2, 2025 09:27

jcpetruzza force-pushed the erl_debugger branch from f9d0b36 to ca63bab Compare April 3, 2025 15:36

bjorng requested review from jhogberg and garazdawi April 4, 2025 07:23

bjorng force-pushed the erl_debugger branch from ca63bab to 012d7cf Compare April 4, 2025 07:51

bjorng added the testing currently being tested, tag is used by OTP internal CI label Apr 4, 2025

bjorng force-pushed the erl_debugger branch from 012d7cf to fff5e3b Compare April 4, 2025 08:27

garazdawi removed the testing currently being tested, tag is used by OTP internal CI label Apr 4, 2025

garazdawi reviewed Apr 4, 2025

View reviewed changes

erts/emulator/beam/erl_debugger.h Outdated Show resolved Hide resolved

bjorng force-pushed the erl_debugger branch from fff5e3b to de4b520 Compare April 4, 2025 11:14

bjorng added the testing currently being tested, tag is used by OTP internal CI label Apr 4, 2025

jcpetruzza added 3 commits April 4, 2025 13:57

erl_process: Refactor logic to print internal functions in stack traces

ba0f659

The logic was duplicated and out of sync. We want to add another internal function for dealing with breakpoints, so better to have all in one place

bjorng force-pushed the erl_debugger branch from de4b520 to 0953774 Compare April 4, 2025 12:01

bjorng added the testing currently being tested, tag is used by OTP internal CI label Apr 4, 2025

bjorng removed the testing currently being tested, tag is used by OTP internal CI label Apr 7, 2025

bjorng force-pushed the erl_debugger branch from b3caeb8 to b4805d1 Compare April 8, 2025 06:07

garazdawi added the full-build-and-check Run a more thorough testing of this PR label Apr 10, 2025

jcpetruzza force-pushed the erl_debugger branch from f6c02ff to 60edf71 Compare April 11, 2025 12:54

jhogberg mentioned this pull request Apr 14, 2025

erts: Check process registers during CLA check #9724

Merged

bjorng added the testing currently being tested, tag is used by OTP internal CI label Apr 17, 2025

jcpetruzza added 4 commits April 26, 2025 06:47

erl_debugger: implement erl_debugger:breakpoint/3 BIF

7518877

erl_debugger: add support for inspecting registers and stack frames

076a646

bjorng removed the testing currently being tested, tag is used by OTP internal CI label Apr 26, 2025

bjorng force-pushed the erl_debugger branch from 1fb9829 to 076a646 Compare April 26, 2025 04:57

bjorng approved these changes Apr 26, 2025

View reviewed changes

bjorng merged commit 0a3e258 into erlang:master Apr 26, 2025
58 of 59 checks passed

Experimental debugger API #9604

Experimental debugger API #9604

Uh oh!

Conversation

jcpetruzza commented Mar 18, 2025

Uh oh!

github-actions bot commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CT Test Results

Artifacts

Uh oh!

bjorng commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

jcpetruzza commented Apr 1, 2025

Uh oh!

bjorng commented Apr 4, 2025

Uh oh!

bjorng commented Apr 4, 2025

Uh oh!

Uh oh!

bjorng commented Apr 4, 2025

Uh oh!

bjorng commented Apr 4, 2025

Uh oh!

bjorng commented Apr 7, 2025

Uh oh!

jcpetruzza commented Apr 7, 2025

Uh oh!

bjorng commented Apr 8, 2025

Uh oh!

jcpetruzza commented Apr 8, 2025

Uh oh!

bjorng commented Apr 9, 2025

Uh oh!

jcpetruzza commented Apr 9, 2025

Uh oh!

bjorng commented Apr 10, 2025

Uh oh!

jcpetruzza commented Apr 10, 2025

Uh oh!

jcpetruzza commented Apr 11, 2025

Uh oh!

jcpetruzza commented Apr 11, 2025

Uh oh!

jcpetruzza commented Apr 12, 2025

Uh oh!

jcpetruzza commented Apr 14, 2025

Uh oh!

bjorng commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2025 •

edited

Loading