-
Notifications
You must be signed in to change notification settings - Fork 381
Bletner/llvm include sort bug #1874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the LLVM backend to use a dependency-based intermediate representation instead of a virtual stack approach. The key changes include replacing the virtual stack with a dependency blocks IR that tracks instruction dependencies, moving LLVM initialization into a thread-safe singleton pattern, and adding support for disk-based caching of compiled LLVM code.
- Introduced dependency blocks IR to replace virtual stack-based compilation
- Moved LLVM initialization to thread-safe singleton with
std::once_flag - Added file-based caching for compiled LLVM code
- Removed the
init_llvm()function from test code and integrated it into the compilation flow
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| category/vm/llvm/dependency_blocks.hpp | New file implementing dependency-based IR for EVM bytecode compilation |
| category/vm/llvm/execute.cpp | Added thread-safe LLVM initialization and disk loading functionality |
| category/vm/llvm/llvm.cpp | Implemented file cache loading logic for compiled LLVM objects |
| category/vm/llvm/emitter.hpp | Refactored to use dependency blocks IR instead of virtual stack |
| category/vm/llvm/llvm_state.hpp | Added helper methods for disk loading and updated GEP signature |
| test/vm/vm/test_vm.cpp | Removed manual init_llvm() calls as initialization is now automatic |
| category/vm/llvm/virtual_stack.hpp | Removed file as it's replaced by dependency blocks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| evmc_revision const rev = | ||
| static_cast<evmc_revision>(std::stoi(exts[0].substr(1))); | ||
| uint256_t const hash256 = | ||
| runtime::uint256_t::from_string("0x" + exts[1].substr(1)); | ||
|
|
||
| evmc::bytes32 const code_hash = bytes32_from_uint256(hash256); | ||
|
|
||
| std::shared_ptr<LLVMState> const ptr = | ||
| monad::vm::llvm::load_from_disk(rev, entry.path().string()); | ||
|
|
||
| cached_llvm_code_[rev].insert({code_hash, ptr}); |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes exts[0] starts with a specific character and calls substr(1) without validation. If the extension doesn't follow the expected format (e.g., it's empty or doesn't start with the expected character), this will cause undefined behavior or exceptions. Add validation before calling substr(1) to ensure the string has the expected format.
| evmc_revision const rev = | |
| static_cast<evmc_revision>(std::stoi(exts[0].substr(1))); | |
| uint256_t const hash256 = | |
| runtime::uint256_t::from_string("0x" + exts[1].substr(1)); | |
| evmc::bytes32 const code_hash = bytes32_from_uint256(hash256); | |
| std::shared_ptr<LLVMState> const ptr = | |
| monad::vm::llvm::load_from_disk(rev, entry.path().string()); | |
| cached_llvm_code_[rev].insert({code_hash, ptr}); | |
| // Validate exts[0] and exts[1] before calling substr(1) | |
| if (!exts[0].empty() && exts[0][0] == '.' && | |
| !exts[1].empty() && exts[1][0] == '.') { | |
| evmc_revision const rev = | |
| static_cast<evmc_revision>(std::stoi(exts[0].substr(1))); | |
| uint256_t const hash256 = | |
| runtime::uint256_t::from_string("0x" + exts[1].substr(1)); | |
| evmc::bytes32 const code_hash = bytes32_from_uint256(hash256); | |
| std::shared_ptr<LLVMState> const ptr = | |
| monad::vm::llvm::load_from_disk(rev, entry.path().string()); | |
| cached_llvm_code_[rev].insert({code_hash, ptr}); | |
| } |
| evmc_revision const rev = | ||
| static_cast<evmc_revision>(std::stoi(exts[0].substr(1))); | ||
| uint256_t const hash256 = | ||
| runtime::uint256_t::from_string("0x" + exts[1].substr(1)); |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous issue, exts[1].substr(1) is called without validating that exts[1] has at least 2 characters. This can cause undefined behavior if the extension is empty or has only one character. Add validation before calling substr(1).
| InstrIdx unspill_offset_of(StackIdx i) | ||
| { | ||
| return (static_cast<InstrIdx>(-i - 1)); |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formula -i - 1 for calculating unspill offset is not self-documenting and could be confusing. Consider adding a comment explaining the relationship between stack index and unspill instruction index, or using more descriptive intermediate variables to make the calculation clearer.
| InstrIdx unspill_offset_of(StackIdx i) | |
| { | |
| return (static_cast<InstrIdx>(-i - 1)); | |
| // Converts a stack index to the corresponding unspill instruction index. | |
| // The formula (-i - 1) is used because unspill instructions are indexed | |
| // in reverse order relative to stack indices: stack index 0 maps to the | |
| // last unspill instruction (-1), stack index 1 to (-2), etc. | |
| InstrIdx unspill_offset_of(StackIdx i) | |
| { | |
| InstrIdx unspill_instr_idx = static_cast<InstrIdx>(-i - 1); | |
| return unspill_instr_idx; |
| // you can eliminate this subtraction by rewriting the jumptable | ||
| // address |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 547-548 suggests an optimization but doesn't explain how it would work or when it should be implemented. Either provide more detailed documentation about the suggested optimization or remove the comment if it's not actionable.
| // you can eliminate this subtraction by rewriting the jumptable | |
| // address |
| verifyModule(*llvm_module)); // BAL: this fails when O2 and | ||
| // O3 are applied. Why? | ||
| verifyModule(*llvm_module)); // this fails when O2 and O3 | ||
| // are applied. Why? |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment indicates a known issue where module verification fails with O2/O3 optimization levels, but there's no explanation or tracking reference. This suggests potential correctness issues that should be investigated. Consider adding a TODO with context or a reference to a tracking issue.
| // are applied. Why? | |
| // are applied. | |
| // TODO: Investigate why verifyModule fails when O2/O3 optimizations are applied. | |
| // This may indicate a correctness issue in the optimization pipeline or module construction. | |
| // Track this issue (create a GitHub issue or internal ticket) and link it here. |
| inline bool is_pure(OpCode op) | ||
| { | ||
| return ( | ||
| op == Shl || op == Shr || op == Sar || op == Add || op == Mul || | ||
| op == Sub || op == Div || op == SDiv || op == Mod || op == SMod || | ||
| op == AddMod || op == MulMod || op == SignExtend || op == Lt || | ||
| op == Gt || op == SLt || op == SGt || op == Eq || op == IsZero || | ||
| op == And || op == Or || op == XOr || op == Not || op == Byte); | ||
|
|
||
| // should just put true or false in the OpCodeInfo table in opcodes.hpp | ||
| } |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 41 suggests this logic should be moved to the OpCodeInfo table. This would be a better design as it centralizes opcode metadata and avoids maintaining duplicate lists. Consider implementing this suggestion or creating a tracking issue for it.
| if (!std::filesystem::exists(nm_obj)) { | ||
| ObjectTransformLayer &otl = lljit->getObjTransformLayer(); | ||
| DumpObjects const dumpobjs = DumpObjects("", nm); | ||
| otl.setTransform(dumpobjs); | ||
| } |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The logic conditionally sets up object dumping only if the file doesn't exist. However, this creates an implicit side effect where the behavior changes based on file system state. Consider making this behavior more explicit or documenting why the check is necessary.
No description provided.