PyC is an experimental compiler project designed to compile Python-like code into executable machine code using the LLVM infrastructure as its backend. Written primarily in C, with some C++ and CUDA components, PyC explores the full compilation pipeline: frontend parsing, intermediate representation (IR) generation, optimization, and backend code generation. This project is under active development by DarkStarStrix and serves as both a learning exercise and a foundation for a lightweight compiler targeting a subset of Python syntax. It is not yet fully functional, with several features still in development.
PyC currently supports the following features:
- Frontend: Loads source code, tokenizes it, and parses it into an Abstract Syntax Tree (AST).
- Supports basic expressions (e.g., numbers, identifiers, binary operations like
+
,-
,*
,/
).
- Supports basic expressions (e.g., numbers, identifiers, binary operations like
- IR Generation: Converts the AST into LLVM Intermediate Representation (IR) for simple arithmetic operations.
- Backend:
- JIT (Just-In-Time) compilation for immediate execution.
- Object file generation with multithreaded compilation capabilities using available CPU cores.
- Optimization: Applies basic LLVM optimization passes, such as instruction combining and Global Value Numbering (GVN).
- Cross-Platform: Designed with portability in mind, though currently tested only on Windows.
- Testing: Includes a basic test suite for the parser, covering numbers, identifiers, and binary operations.
- CUDA Integration: Experimental (and currently undeveloped) CUDA-based tokenization in
kernel.cu
.
- Limited Language Support: Only basic expressions (numbers, identifiers, and binary operations) are supported. Full Python syntax (e.g., loops, conditionals, functions) is not yet implemented.
- Incomplete Symbol Table: Variable tracking and scoping are not fully functional.
- No Indentation Preprocessing: Python’s indentation-based block structure is not yet processed.
- Experimental CUDA: The CUDA parser (
parser.cu
) is a placeholder and not operational. - Error Handling: Lacks robust syntax and semantic error reporting.
The project is organized as follows:
darkstarstrix-pyc/
├── README.md # Project documentation (this file)
├── CMakeLists.txt # CMake build configuration
├── Hello.py # Sample Python file for testing ("Hello, World!")
├── hello.spec # PyInstaller spec file for Hello.py
├── kernel.cu # CUDA kernel for tokenization (experimental)
├── C_Files/ # Core C source files
│ ├── backend.c # Backend logic (JIT, object file generation)
│ ├── codegen.c # LLVM IR generation from AST
│ ├── Core.cpp # AST node management (C++)
│ ├── error_handler.c # Basic error handling system
│ ├── frontend.c # Source code loading and preprocessing
│ ├── IR.c # Intermediate Representation utilities
│ ├── ir_generator.c # LLVM IR generation logic
│ ├── main.c # Compiler entry point
│ ├── parser.cu # CUDA-based parser (experimental)
│ ├── parser.cuh # CUDA parser header with AST construction
│ ├── stack.c # Stack implementation for parsing
│ ├── symbol_table.c # Symbol table management (incomplete)
│ └── test_parser.c # Parser unit tests
├── Header_Files/ # Header files
│ ├── backend.h # Backend function declarations
│ ├── Core.h # AST node definitions
│ ├── error_handler.h # Error handling declarations
│ ├── frontend.h # Frontend function declarations
│ ├── lexer.h # Lexer interface (assumed external)
│ ├── parser.h # Parser and AST definitions
│ ├── stack.h # Stack interface
│ └── symbol_table.h # Symbol table interface
└── hello/ # PyInstaller output for Hello.py
├── *.toc, *.pyz, etc. # Build artifacts from PyInstaller
To build and use PyC, you’ll need the following:
- CMake: Version 3.29.6 or later.
- LLVM: Installed and configured (update
CMakeLists.txt
with the correct path if needed). - C/C++ Compiler: Compatible with C11 and C++14 (e.g., MSVC, GCC).
- Python 3.x: Required for testing and running PyInstaller.
- CUDA Toolkit: Optional, only needed for experimental CUDA features (
parser.cu
).
-
Clone the Repository:
git clone https://github.com/DarkStarStrix/PyC.git cd PyC
-
Configure with CMake:
- Edit
CMakeLists.txt
to set theLLVM_DIR
variable to your LLVM installation path (default:C:/Users/kunya/CLionProjects/PyC/llvm-project/build/lib/cmake/llvm
). - Run the following commands:
mkdir build cd build cmake ..
- Edit
-
Build the Project:
cmake --build . --config Release # or Debug
- The executable
MyCompiler
will be generated inbuild/bin/
.
- The executable
Run the compiler using the following command:
./build/bin/MyCompiler [options] input_file
-o <file>
: Specify the output file name (default:a.out
).-O
: Enable LLVM optimization passes.-jit
: Perform JIT compilation and execute immediately (no object file generated).-v
: Enable verbose output for debugging.-h, --help
: Display the help message.
To compile a simple input file with verbose output and optimizations:
./build/bin/MyCompiler -v -O test_input.pc -o test_output
- Note: The input file (
test_input.pc
) must contain supported expressions (e.g.,x + 42
). Full Python syntax, such as theprint("Hello, World!")
inHello.py
, is not yet supported. TheHello.py
file is included as a sample for future development.
x + 42
This will generate LLVM IR, apply optimizations (if -O
is used), and either execute it via JIT (with -jit
) or produce an object file.
- Lexer: Basic tokenization of numbers, identifiers, and operators (via
lexer.h
, assumed external). - Parser: Constructs an AST from basic expressions (numbers, identifiers, binary operations).
- IR Generation: Produces LLVM IR for simple arithmetic expressions.
- Backend: Supports JIT compilation and multi-threaded object file generation.
- Full Python Support: Process indentation and handle complex statements (e.g., loops, functions).
- Symbol Table: Implement variable tracking and scoping.
- Error Handling: Add robust syntax and semantic error reporting.
- CUDA Integration: Develop a functional CUDA-based parser.
For detailed tasks and updates, see the Issues tab on GitHub.
Run the parser tests with:
./build/bin/MyCompiler
- Note: This assumes
test_parser.c
is linked into the executable. The current test suite verifies parsing of numbers, identifiers, and binary operations.
- Code Standards: Follow C11 and C++14 standards. Include comments for clarity.
- Modularity: The project separates concerns into frontend (
frontend.c
), IR generation (ir_generator.c
,codegen.c
), and backend (backend.c
) components. - Future Plans: Expand language support, improve error handling, and integrate CUDA for parallel parsing.
Contributions are welcome! To contribute:
- Fork the repository on GitHub.
- Create a branch for your feature or bug fix:
git checkout -b feature/your-feature-name
- Commit your changes with clear messages:
git commit -m "Add feature X"
- Push your branch and submit a pull request:
git push origin feature/your-feature-name
- Guidelines: Adhere to C11/C++14 standards, add comments, and provide a clear pull request description.
- Feedback: Use the "Provide feedback" link on GitHub for suggestions or questions.