A minimal, educational ELF binary generator built from scratch using C11 and LLVM.
ELF From Zero demonstrates how to construct an Executable and Linkable Format (ELF) file manually while leveraging the LLVM backend to synthesize machine code dynamically. It serves as a bridge between high-level compiler concepts and low-level systems programming, making it an ideal study resource for understanding binary formats, cross-compilation, and the LLVM C API.
- Project Overview
- Features & Security
- Supported Architectures
- Prerequisites
- Building the Project
- Usage
- Design Decisions
- Project Structure
- Documentation
The primary goal of this project is to demystify the creation of executable binaries. Instead of relying on a standard linker (ld), this tool:
- Generates Machine Code: Uses LLVM to compile a simple "Hello World" function into raw machine code for a specific target architecture.
- Constructs ELF Headers: Manually populates
Elf64_EhdrandElf64_Phdrstructs to create a valid executable container. - Injects Code: Places the generated machine code into the
.textsegment of the ELF file. - Handles Syscalls: Uses inline assembly (defined in a JSON catalog) to perform system calls (write, exit) across different architectures.
This project adheres to paranoid coding standards to serve as a reference for secure C development:
- Strict C11 Compliance: No non-standard extensions.
- Compiler Agnostic: Supports both Clang (preferred) and GCC.
- Hardened Build (Optional):
- Full Relocation Read-Only (
RELRO) and Immediate Binding (BIND_NOW). - Non-executable stack (
NX). - Position Independent Executable (
PIE). - Stack Canaries (
-fstack-protector-strong). - Compile-time buffer checks (
_FORTIFY_SOURCE=2).
- Full Relocation Read-Only (
- Runtime Sanitizers: Can be enabled via CMake options (ASan, UBSan, LSan).
- Static Analysis: Verified with
clang-tidyusing strict configuration (zero warnings).
The project currently supports code generation for the following architectures:
- x86 (i386)
- x86-64 (AMD64)
- ARM (32-bit)
- AArch64 (ARM64)
- RISC-V (32-bit)
- RISC-V (64-bit)
- MIPS (32-bit Big Endian)
- MIPS (64-bit Big Endian)
- MIPS (32-bit Little Endian)
- MIPS (64-bit Little Endian)
To build and run this project, you need the following tools installed on your system:
- C Compiler: Clang (recommended) or GCC, supporting C11.
- CMake: Version 3.16 or higher.
- LLVM Development Libraries:
libllvm(headers and libraries). - Python 3: For the configuration generator script.
- Make or Ninja: Build system generator.
Optional:
- Doxygen: For generating API documentation.
- LaTeX/pdfTeX: For generating PDF documentation.
- Mypy: For static type checking of Python scripts.
We use CMake to manage the build configuration. This project employs "Modern CMake" practices, ensuring out-of-source builds and preventing source tree pollution.
-
Create a build directory:
mkdir build && cd build
-
Configure the project:
cmake ..
Options:
-DENABLE_PARANOID_HARDENING=ON: Enable strict security flags and sanitizers.-DTARGET_ARCH=<arch>: Force generation for a specific architecture (e.g.,riscv64,mips).-DCMAKE_C_COMPILER=gcc: Force GCC usage.
-
Compile:
make
To build the tool, generate an ELF binary for your host architecture, and execute it immediately:
make run_demoYou can also run the steps manually:
-
Run the creator:
./elf_creator
Output: A file named
elfin the current directory. Verbose output will show the detected architecture, generated IR, and machine code hex dump. -
Execute the generated binary:
./elf
Output:
Hello!
To generate code for a different architecture, ensure you configured the build with -DTARGET_ARCH=<arch> (or ensure the catalog includes it), then pass the LLVM triple:
./elf_creator --target=riscv64-unknown-linux-gnuNote: The tool automatically handles 32-bit/64-bit headers and endianness swapping. You will need a compatible emulator (like QEMU) to run the resulting binary.
The generated ELF file is minimalist: it contains Program Headers (which the OS loader needs) but omits Section Headers (which are optional for execution but used by debugging tools).
readelf -l elf: Use this to verify the binary structure. You will see aLOADsegment withR E(Read/Execute) permissions.objdump -d elf: This will likely fail or show nothing, asobjdumprelies on section headers to find the.textsection.
You might wonder why we manually define assembly strings in data/arch_catalog.json instead of using libc or asking LLVM to generate them.
- Freestanding Environment: We are building a binary "from zero," meaning no
libcis linked. We must provide the raw system call instructions ourselves. - Cross-Architecture Support: A local
libconly supports the host architecture. To support RISC-V, ARM, and MIPS simultaneously without installing massive cross-compilation toolchains, we define the minimal required assembly (the "Micro-Libc") in a lightweight JSON format. - LLVM Limitations: LLVM is a compiler backend, not an OS interface. It knows how to generate machine code, but it does not inherently know that Linux
writeis syscall1on x86_64 or64on RISC-V. This OS-specific knowledge must be supplied externally.
The project follows a clean separation of concerns. Generated files are kept strictly within the build directory.
ELF_from_zero/
├── CMakeLists.txt # Main build configuration
├── README.md # Project documentation
├── data/
│ └── arch_catalog.json # JSON database of architecture syscalls
├── src/
│ ├── arch_support.c # Architecture selection logic
│ ├── elf_creator.h # Core definitions and API
│ ├── elf_writer.c # ELF binary file construction
│ ├── llvm_emit.c # LLVM IR generation and compilation
│ └── llvm_runtime.c # LLVM initialization
├── elf_creator.c # Main entry point
└── tools/
└── gen_arch_config.py # Python script to generate C config headers
The source code is extensively documented using Doxygen-style comments.
To generate the documentation locally:
-
HTML:
make docs
Open
build/docs/html/index.htmlin your browser. -
PDF:
make docs_pdf
The PDF will be available at
build/docs/latex/refman.pdf.
- Type Checking: Run
make check_typesto verify Python scripts withmypy. - Editor Integration: Link
build/compile_commands.jsonto your project root for LSP support.