@@ -17,7 +17,7 @@ Coroutines are generally used either as generators or for asynchronous
17
17
programming. In this document, we will discuss both use cases. Even if you are
18
18
using coroutines for asynchronous programming, you should still read the
19
19
generators section, as it will introduce foundational debugging techniques also
20
- applicable to the debugging of asynchronous programming .
20
+ applicable to the debugging of asynchronous programs .
21
21
22
22
Both compilers (clang, gcc, ...) and debuggers (lldb, gdb, ...) are
23
23
still improving their support for coroutines. As such, we recommend using the
@@ -42,11 +42,11 @@ earlier.
42
42
Debugging generators
43
43
====================
44
44
45
- The first major use case for coroutines in C++ are generators, i.e., functions
46
- which can produce values via ``co_yield ``. Values are produced lazily,
47
- on-demand. For that purpose, every time a new value is requested the coroutine
48
- gets resumed. As soon as it reaches a ``co_yield `` and thereby returns the
49
- requested value, the coroutine is suspended again.
45
+ One of the two major use cases for coroutines in C++ are generators, i.e.,
46
+ functions which can produce values via ``co_yield ``. Values are produced
47
+ lazily, on-demand. For that purpose, every time a new value is requested the
48
+ coroutine gets resumed. As soon as it reaches a ``co_yield `` and thereby
49
+ returns the requested value, the coroutine is suspended again.
50
50
51
51
This logic is encapsulated in a ``generator `` type similar to this one:
52
52
@@ -589,6 +589,42 @@ the promise as follows:
589
589
.. code-block ::
590
590
print (task::promise_type)*(0x416eb0+16)
591
591
592
+ Implementation in clang / LLVM
593
+ ------------------------------
594
+
595
+ The C++ Coroutines feature in the Clang compiler is implemented in two parts of
596
+ the compiler. Semantic analysis is performed in Clang, and Coroutine
597
+ construction and optimization takes place in the LLVM middle-end.
598
+
599
+ For each coroutine function, the frontend generates a single corresponding
600
+ LLVM-IR function. This function uses special ``llvm.coro.suspend `` intrinsics
601
+ to mark the suspension points of the coroutine. The middle end first optimizes
602
+ this function and applies, e.g., constant propagation across the whole,
603
+ non-split coroutine.
604
+
605
+ CoroSplit then splits the function into ramp, resume and destroy functions.
606
+ This pass also moves stack-local variables which are alive across suspension
607
+ points into the coroutine frame. Most of the heavy lifting to preserve debugging
608
+ information is done in this pass. This pass needs to rewrite all variable
609
+ locations to point into the coroutine frame.
610
+
611
+ Afterwards, a couple of additional optimizations are applied, before code
612
+ gets emitted, but none of them are really interesting regarding debugging
613
+ information.
614
+
615
+ For more details on the IR representation of coroutines and the relevant
616
+ optimization passes, see `Coroutines in LLVM <https://llvm.org/docs/Coroutines.html >`_.
617
+
618
+ Emitting debug information inside ``CoroSplit `` forces us to generate
619
+ insufficient debugging information. Usually, the compiler generates debug
620
+ information in the frontend, as debug information is highly language specific.
621
+ However, this is not possible for coroutine frames because the frames are
622
+ constructed in the LLVM middle-end.
623
+
624
+ To mitigate this problem, the LLVM middle end attempts to generate some debug
625
+ information, which is unfortunately incomplete, since much of the language
626
+ specific information is missing in the middle end.
627
+
592
628
Devirtualization of coroutine handles
593
629
-------------------------------------
594
630
@@ -650,11 +686,7 @@ clang / LLVM usually use variables like ``__int_32_0`` to represent this
650
686
optimized storage. Those values usually do not directly correspond to variables
651
687
in the source code.
652
688
653
- For example, when compiling the following program, the compiler creates a
654
- single entry ``__int_32_0 `` in the coroutine state. Intuitively, one might
655
- assume that ``__int_32_0 `` represents the value of the local variable ``a ``.
656
- However, inspecting ``__int_32_0 `` in the debugger while single-stepping will
657
- show the following values:
689
+ When compiling the program
658
690
659
691
.. code-block :: c++
660
692
@@ -674,10 +706,16 @@ show the following values:
674
706
std::cout << a << "\n ";
675
707
}
676
708
677
- The value of ``__int_32_0 `` seemingly does not change, despite being frequently
678
- incremented. While this might be surprising, this is a result of the optimizer
679
- recognizing that it can eliminate most of the load/store operations. The above
680
- code gets optimized to the equivalent of:
709
+ clang creates a single entry ``__int_32_0 `` in the coroutine state.
710
+
711
+ Intuitively, one might assume that ``__int_32_0 `` represents the value of the
712
+ local variable ``a ``. However, inspecting ``__int_32_0 `` in the debugger while
713
+ single-stepping will reveal that the value of ``__int_32_0 `` stays constant,
714
+ despite ``a `` being frequently incremented.
715
+
716
+ While this might be surprising, this is a result of the optimizer recognizing
717
+ that it can eliminate most of the load/store operations.
718
+ The above code gets optimized to the equivalent of:
681
719
682
720
.. code-block :: c++
683
721
0 commit comments