@@ -17,7 +17,7 @@ Coroutines are generally used either as generators or for asynchronous
17
17
programming. In this document, we will discuss both use cases. Even if you are
18
18
using coroutines for asynchronous programming, you should still read the
19
19
generators section, as it will introduce foundational debugging techniques also
20
- applicable to the debugging of asynchronous programming .
20
+ applicable to the debugging of asynchronous programs .
21
21
22
22
Both compilers (clang, gcc, ...) and debuggers (lldb, gdb, ...) are
23
23
still improving their support for coroutines. As such, we recommend using the
@@ -42,11 +42,11 @@ earlier.
42
42
Debugging generators
43
43
====================
44
44
45
- The first major use case for coroutines in C++ are generators, i.e., functions
46
- which can produce values via ``co_yield ``. Values are produced lazily,
47
- on-demand. For that purpose, every time a new value is requested the coroutine
48
- gets resumed. As soon as it reaches a ``co_yield `` and thereby returns the
49
- requested value, the coroutine is suspended again.
45
+ One of the two major use cases for coroutines in C++ are generators, i.e.,
46
+ functions which can produce values via ``co_yield ``. Values are produced
47
+ lazily, on-demand. For that purpose, every time a new value is requested the
48
+ coroutine gets resumed. As soon as it reaches a ``co_yield `` and thereby
49
+ returns the requested value, the coroutine is suspended again.
50
50
51
51
This logic is encapsulated in a ``generator `` type similar to this one:
52
52
@@ -590,6 +590,42 @@ the promise as follows:
590
590
591
591
print (task::promise_type)*(0x416eb0+16)
592
592
593
+ Implementation in clang / LLVM
594
+ ------------------------------
595
+
596
+ The C++ Coroutines feature in the Clang compiler is implemented in two parts of
597
+ the compiler. Semantic analysis is performed in Clang, and Coroutine
598
+ construction and optimization takes place in the LLVM middle-end.
599
+
600
+ For each coroutine function, the frontend generates a single corresponding
601
+ LLVM-IR function. This function uses special ``llvm.coro.suspend `` intrinsics
602
+ to mark the suspension points of the coroutine. The middle end first optimizes
603
+ this function and applies, e.g., constant propagation across the whole,
604
+ non-split coroutine.
605
+
606
+ CoroSplit then splits the function into ramp, resume and destroy functions.
607
+ This pass also moves stack-local variables which are alive across suspension
608
+ points into the coroutine frame. Most of the heavy lifting to preserve debugging
609
+ information is done in this pass. This pass needs to rewrite all variable
610
+ locations to point into the coroutine frame.
611
+
612
+ Afterwards, a couple of additional optimizations are applied, before code
613
+ gets emitted, but none of them are really interesting regarding debugging
614
+ information.
615
+
616
+ For more details on the IR representation of coroutines and the relevant
617
+ optimization passes, see `Coroutines in LLVM <https://llvm.org/docs/Coroutines.html >`_.
618
+
619
+ Emitting debug information inside ``CoroSplit `` forces us to generate
620
+ insufficient debugging information. Usually, the compiler generates debug
621
+ information in the frontend, as debug information is highly language specific.
622
+ However, this is not possible for coroutine frames because the frames are
623
+ constructed in the LLVM middle-end.
624
+
625
+ To mitigate this problem, the LLVM middle end attempts to generate some debug
626
+ information, which is unfortunately incomplete, since much of the language
627
+ specific information is missing in the middle end.
628
+
593
629
Devirtualization of coroutine handles
594
630
-------------------------------------
595
631
@@ -651,11 +687,7 @@ clang / LLVM usually use variables like ``__int_32_0`` to represent this
651
687
optimized storage. Those values usually do not directly correspond to variables
652
688
in the source code.
653
689
654
- For example, when compiling the following program, the compiler creates a
655
- single entry ``__int_32_0 `` in the coroutine state. Intuitively, one might
656
- assume that ``__int_32_0 `` represents the value of the local variable ``a ``.
657
- However, inspecting ``__int_32_0 `` in the debugger while single-stepping will
658
- show the following values:
690
+ When compiling the program
659
691
660
692
.. code-block :: c++
661
693
@@ -675,10 +707,16 @@ show the following values:
675
707
std::cout << a << "\n ";
676
708
}
677
709
678
- The value of ``__int_32_0 `` seemingly does not change, despite being frequently
679
- incremented. While this might be surprising, this is a result of the optimizer
680
- recognizing that it can eliminate most of the load/store operations. The above
681
- code gets optimized to the equivalent of:
710
+ clang creates a single entry ``__int_32_0 `` in the coroutine state.
711
+
712
+ Intuitively, one might assume that ``__int_32_0 `` represents the value of the
713
+ local variable ``a ``. However, inspecting ``__int_32_0 `` in the debugger while
714
+ single-stepping will reveal that the value of ``__int_32_0 `` stays constant,
715
+ despite ``a `` being frequently incremented.
716
+
717
+ While this might be surprising, this is a result of the optimizer recognizing
718
+ that it can eliminate most of the load/store operations.
719
+ The above code gets optimized to the equivalent of:
682
720
683
721
.. code-block :: c++
684
722
0 commit comments