@@ -17,7 +17,7 @@ Coroutines are generally used either as generators or for asynchronous
17
17
programming. In this document, we will discuss both use cases. Even if you are
18
18
using coroutines for asynchronous programming, you should still read the
19
19
generators section, as it will introduce foundational debugging techniques also
20
- applicable to the debugging of asynchronous programming .
20
+ applicable to the debugging of asynchronous programs .
21
21
22
22
Both compilers (clang, gcc, ...) and debuggers (lldb, gdb, ...) are
23
23
still improving their support for coroutines. As such, we recommend using the
@@ -42,11 +42,11 @@ earlier.
42
42
Debugging generators
43
43
====================
44
44
45
- The first major use case for coroutines in C++ are generators, i.e., functions
46
- which can produce values via ``co_yield ``. Values are produced lazily,
47
- on-demand. For that purpose, every time a new value is requested the coroutine
48
- gets resumed. As soon as it reaches a ``co_yield `` and thereby returns the
49
- requested value, the coroutine is suspended again.
45
+ One of the two major use cases for coroutines in C++ are generators, i.e.,
46
+ functions which can produce values via ``co_yield ``. Values are produced
47
+ lazily, on-demand. For that purpose, every time a new value is requested the
48
+ coroutine gets resumed. As soon as it reaches a ``co_yield `` and thereby
49
+ returns the requested value, the coroutine is suspended again.
50
50
51
51
This logic is encapsulated in a ``generator `` type similar to this one:
52
52
@@ -483,7 +483,7 @@ Known issues & workarounds for older LLDB versions
483
483
484
484
LLDB before 21.0 did not yet show the ``__coro_frame `` inside
485
485
``coroutine_handle ``. To inspect the coroutine frame, you had to use the
486
- approach described in the :ref: `Devirtualization of coroutine handles ` section.
486
+ approach described in the :ref: `devirtualization ` section.
487
487
488
488
LLDB before 18.0 was hiding the ``__promise `` and ``__coro_frame ``
489
489
variable by default. The variables are still present, but they need to be
@@ -492,7 +492,7 @@ explicitly added to the "watch" pane in VS Code or requested via
492
492
493
493
LLDB before 16.0 did not yet provide a pretty-printer for
494
494
``std::coroutine_handle ``. To inspect the coroutine handle, you had to manually
495
- use the approach described in the :ref: `Devirtualization of coroutine handles `
495
+ use the approach described in the :ref: `devirtualization `
496
496
section.
497
497
498
498
Toolchain Implementation Details
@@ -590,6 +590,44 @@ the promise as follows:
590
590
591
591
print (task::promise_type)*(0x416eb0+16)
592
592
593
+ Implementation in clang / LLVM
594
+ ------------------------------
595
+
596
+ The C++ Coroutines feature in the Clang compiler is implemented in two parts of
597
+ the compiler. Semantic analysis is performed in Clang, and Coroutine
598
+ construction and optimization takes place in the LLVM middle-end.
599
+
600
+ For each coroutine function, the frontend generates a single corresponding
601
+ LLVM-IR function. This function uses special ``llvm.coro.suspend `` intrinsics
602
+ to mark the suspension points of the coroutine. The middle end first optimizes
603
+ this function and applies, e.g., constant propagation across the whole,
604
+ non-split coroutine.
605
+
606
+ CoroSplit then splits the function into ramp, resume and destroy functions.
607
+ This pass also moves stack-local variables which are alive across suspension
608
+ points into the coroutine frame. Most of the heavy lifting to preserve debugging
609
+ information is done in this pass. This pass needs to rewrite all variable
610
+ locations to point into the coroutine frame.
611
+
612
+ Afterwards, a couple of additional optimizations are applied, before code
613
+ gets emitted, but none of them are really interesting regarding debugging
614
+ information.
615
+
616
+ For more details on the IR representation of coroutines and the relevant
617
+ optimization passes, see `Coroutines in LLVM <https://llvm.org/docs/Coroutines.html >`_.
618
+
619
+ Emitting debug information inside ``CoroSplit `` forces us to generate
620
+ insufficient debugging information. Usually, the compiler generates debug
621
+ information in the frontend, as debug information is highly language specific.
622
+ However, this is not possible for coroutine frames because the frames are
623
+ constructed in the LLVM middle-end.
624
+
625
+ To mitigate this problem, the LLVM middle end attempts to generate some debug
626
+ information, which is unfortunately incomplete, since much of the language
627
+ specific information is missing in the middle end.
628
+
629
+ .. _devirtualization :
630
+
593
631
Devirtualization of coroutine handles
594
632
-------------------------------------
595
633
@@ -651,11 +689,7 @@ clang / LLVM usually use variables like ``__int_32_0`` to represent this
651
689
optimized storage. Those values usually do not directly correspond to variables
652
690
in the source code.
653
691
654
- For example, when compiling the following program, the compiler creates a
655
- single entry ``__int_32_0 `` in the coroutine state. Intuitively, one might
656
- assume that ``__int_32_0 `` represents the value of the local variable ``a ``.
657
- However, inspecting ``__int_32_0 `` in the debugger while single-stepping will
658
- show the following values:
692
+ When compiling the program
659
693
660
694
.. code-block :: c++
661
695
@@ -675,10 +709,16 @@ show the following values:
675
709
std::cout << a << "\n ";
676
710
}
677
711
678
- The value of ``__int_32_0 `` seemingly does not change, despite being frequently
679
- incremented. While this might be surprising, this is a result of the optimizer
680
- recognizing that it can eliminate most of the load/store operations. The above
681
- code gets optimized to the equivalent of:
712
+ clang creates a single entry ``__int_32_0 `` in the coroutine state.
713
+
714
+ Intuitively, one might assume that ``__int_32_0 `` represents the value of the
715
+ local variable ``a ``. However, inspecting ``__int_32_0 `` in the debugger while
716
+ single-stepping will reveal that the value of ``__int_32_0 `` stays constant,
717
+ despite ``a `` being frequently incremented.
718
+
719
+ While this might be surprising, this is a result of the optimizer recognizing
720
+ that it can eliminate most of the load/store operations.
721
+ The above code gets optimized to the equivalent of:
682
722
683
723
.. code-block :: c++
684
724
0 commit comments