Author: Grigorios Pavlakis <[email protected]>
This repository contains a proof-of-concept-level implementation of the static analysis flow proposed in my thesis "Automated estimation of worst-case energy consumption of embedded software using static analysis methods" (in Greek, «Αυτόματη εκτίμηση χειρότερης περίπτωσης κατανάλωσης ενέργειας ενσωματωμένου λογισμικού με χρήση μεθόδων στατικής ανάλυσης»).
The thesis was mentored and supervised by:
- Dr. Stefanos Skalistis (GWF Messsysteme AG) and
- Dr. Papaefstathiou Ioannis (Aristotle University).
In short, this POC shows that it is feasible to produce a software toolset that statically estimates the worst-case energy footprint of a given piece of C code on a given microcontroller platform, using widely-available, well-known embedded industry tools such as WCET static analyzers and C compilers.
The flow is highly automated, minimizing explicit programmer input and is designed to run transparently and provide the worst-case energy estimation as part of the usual CI processes.
The flow is built around three distinct CLI tools:
LoopFinder, a Clang plugin that is loaded as part of a normal compilation process. Its purpose is to retrieve raw data (loop locations and maximum number of repetitions) from inside Clang. This raw data will then be post-processed into a suitable input to a WCET analyzer for the next step of the pipeline.
To achieve this mission, LoopFinder has to:
- identify and visibly mark the location of the preheader (basic block just before the start) of each loop in Clang's IR
- output a text file with estimated loop bounds for all the loop constructs contained in the currently-processed TU
The marking is visible in the final binary's DWARF debug information, allowing the
retrieval of the exact starting address of each loop inside the binary by simple
post-processing. The loop bound estimations originate from Clang's ScalarEvolution
static loop analysis, and are used to save the programmer's time from having to
manually annotate all loops in their program.
BBInfo, an executable frontend to a WCET static analyzer, which is used to provide:
- absolute (virtual) addresses of the first and last instruction of each basic block
- for each basic block, an estimation of the number of times it will be executed at the worst case, given the information produced by LoopFinder.
For this POC, the OTAWA (https://tracesgroup.net/otawa) toolbox is used, specifically a basic IPET analysis.
FlowGen, the general go-between post-processor. It is responsible for generating the flow fact file for BBInfo, as well as for calculating the final energy estimate from BBInfo's results and a given energy model of an MCU.
You will need:
- Clang 16 prefix with development headers. The POC was tested on a built-from-source
ARM Clang toolchain from https://github.com/ARM-software/LLVM-embedded-toolchain-for-Arm
(tag:
release-16.0.0), in order to include the headers - An OTAWA prefix with ARM support, ELF loader and the lp_solve5 ILP solver.
Follow the instructions for a manual build (the automated
otawa-installscript doesn't work correctly) at http://www.tracesgroup.net/otawa/?page_id=81. The packages to build and install are (in this order):gel,gelpp,elm,otawa: https://git.renater.fr/anonscm/git/otawa/<PACKAGE_NAME>.gitgliss2, (either ofarmv5t/armv7t): https://git.renater.fr/anonscm/git/gliss2/<PACKAGE_NAME>.gitlp_solve5,otawa-arm,otawa-lp_solve5: https://git.renater.fr/anonscm/git/otawa/<PACKAGE_NAME>.git
Be prepared to patch things like missing includes etc., this code is quite old.
Given correct OTAWA and Clang prefixes, simply issue:
cmake . -Bbuild -DPWR_LLVM_PREFIX=<clang-16-prefix-path> -DOTAWA_DIR=<OTAWA-prefix-path>
cd build
make
Resulting artifacts can be found at build/src/LoopFinder/LoopFinder.so and
build/src/BBInfo/BBInfo.
FlowGen can be found in the src/FlowGen directory of the flowgen branch for now.
It is written in Python 3.11. To build it, you will need the Poetry
package manager. Follow the instructions in its README.md file to set it up in a clean
virtual environment.
Given the source:
int main(void) {
int sum = 80;
int sum2 = 10;
int sum3 = 0;
for (int i = 0; i < 15; i++) {
for (int j = 0; j < i; j++) {
if ((sum3 & 0x2) == 0) {
sum3 += i * j;
} else {
sum3 += i + j;
}
}
}
for (int i = 0; i < 5; i++) {
sum2 -= 10;
}
while (sum > 8) {
sum -= 2;
}
do {
sum += 8;
} while (sum < 1000);
return sum2;
}compile it with
clang --target="arm-none-eabi" \
-Xclang -disable-O0-optnone \
-Xclang -load -Xclang /path/to/LoopFinder.so \
-Xclang -plugin -Xclang -find-loops \
-fpass-plugin=/path/to/LoopFinder.so \
-fsave-optimization-record -foptimization-record-passes="LabeledLoopBound" \
-g -O0 test.cand run:
flowgen facts /path/to/binary /path/to/opt/report.yaml /path/to/arm-llvm-strip -o bin.ff
BBInfo /path/to/stripped/binary -f bin.ff > bin.bbiFor the final energy estimation (FlowGen comes with an energy model of the
Microchip AT91SAM7x256 ARMv5 MCU, see the FlowGen README for source) run:
flowgen estimate /path/to/stripped/binary /path/to/bbinfo/text/output --device=AT91SAM7x256!!! TODO not yet complete!
The path to llvm-strip from the ARM clang toolchain needs to be provided to FlowGen
due to a bug in OTAWA -- OTAWA hangs when given a binary containing debug information.
This way, FlowGen will retrieve everything needed from the DWARF in one go and then
delete it before passing the stripped version over to BBInfo. This has no impact on
the analysis results.
LoopFinder needs to be registered both as an optimization pass as well as a
frontend plugin, thus both the -Xclang -load and -fpass-plugin need to be
present. To output the YAML file with the loop bounds, LoopFinder needs access
to the optimization record infrastructure of Clang, which is enabled by
-fsave-optimization-record. The -foptimization-record-passes filters the
file to only contain data generated from the LabeledLoopBound pass, which
is the pass inserted by LoopFinder responsible for retrieving the loop bounds.
-disable-O0-optnone and -O0 are there due to a quirk of Clang and a limitation
of this POC, respectively. LoopFinder's marking and FlowGen's post-processing
currently rely on the fact that the loop's preheader basic block actually exists,
which is not always the case with optimizations enabled. However, disabling
optimizations also marks the produced LLVM IR with an optnone attribute, which
prevents the execution of any pass, even if it doesn't modify the IR at all
(like analysis passes) and is registered to Clang's pass manager.
For the adventurous, there is an old experiment to generate bogus DWARF labels
that by some experiments seem to be pointing at the same basic block as the IR
they were placed in. With some help from BBInfo's block start/end addresses, it
could be used as an inspiration to bypass the optimization hurdle, and can be
found at the bb-annotation-dwarf branch.