Skip to content

[feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv #109914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 48 commits into from
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a46dd92
added evaluateInstruction method. needs tests
arjunUpatel Sep 25, 2024
30dd584
dealing with git
arjunUpatel Sep 25, 2024
fce36f1
sign extend relevant immediates
arjunUpatel Oct 14, 2024
4c8d769
fix indentation
arjunUpatel Oct 15, 2024
81e6acd
fix ADDI + sign extention bugs
arjunUpatel Oct 15, 2024
c31baef
prevent symbol reoslution in empty sections
arjunUpatel Oct 15, 2024
de457f9
added support for compressed instructions
arjunUpatel Oct 20, 2024
d19be54
Merge branch 'llvm:main' into main
arjunUpatel Oct 20, 2024
ecfcf56
call evaluateInstruction only when target=RISCV
arjunUpatel Oct 20, 2024
33e100e
fix bug cause test failures with build
arjunUpatel Oct 21, 2024
3fc4e31
Instr eval based on reg width and attempt to pass tests
arjunUpatel Nov 19, 2024
e425a7c
address comments, merge instruction evaluation and pass register widt…
arjunUpatel Nov 21, 2024
571f056
remove debugging code
arjunUpatel Nov 21, 2024
c36f94e
silly me forgot to save changes
arjunUpatel Nov 21, 2024
c461809
run clang format
arjunUpatel Nov 21, 2024
96dd3c3
fix code suggestion
arjunUpatel Dec 23, 2024
846d055
remove absolute first
arjunUpatel Dec 23, 2024
b781312
objdump prioritize actual symbols over dummy symbols during resolution
arjunUpatel Jan 8, 2025
146043d
remove unist.h from includes
arjunUpatel Jan 8, 2025
e8ea3bb
Revert "objdump prioritize actual symbols over dummy symbols during r…
arjunUpatel Jan 21, 2025
1a23c2c
modify test to effectively test new functionailty
arjunUpatel Jan 23, 2025
707a1ed
Update .gitignore
arjunUpatel Jan 26, 2025
e94080f
Update tests to match new functionality
arjunUpatel Jan 26, 2025
4eb81b8
Add tests of scenarios provided in issue description (see issue relat…
arjunUpatel May 19, 2025
4169bd4
Update tests to accurately match new symbol resolution search pattern
arjunUpatel May 19, 2025
e52cbb9
Added tests to increase code coverage of new functionality
arjunUpatel May 19, 2025
fe84244
Help llvm-lit find new tests
arjunUpatel May 19, 2025
f2b402b
FIx zero register bug. Previously address resolution would be trigger…
arjunUpatel May 19, 2025
5f801eb
Remove ignore of local folder as per comments
arjunUpatel May 19, 2025
7d66e20
Remove extraneous header as per comments
arjunUpatel May 19, 2025
b42cdbb
Use unsigned instead of signed int for values that are always positiv…
arjunUpatel May 19, 2025
9476135
Add support for Zcb extensions + corresponding tests
arjunUpatel May 20, 2025
e3e96c5
Added support for stack pointer based load and stores
arjunUpatel May 21, 2025
e2888d7
Merge branch 'main' into main
arjunUpatel May 21, 2025
d1be8f7
Use unsigned int for ArchRegWidth
arjunUpatel May 21, 2025
9699b57
Update tests to match new functionality
arjunUpatel May 21, 2025
e86e92e
One more try at passing tests
arjunUpatel May 22, 2025
a284a64
Non-exact offset match for failing test
arjunUpatel May 23, 2025
8696193
Improve documentation for evaluateInstruction
arjunUpatel Jun 5, 2025
77e8c52
Merge branch 'main' into main
arjunUpatel Jun 5, 2025
090c062
Fix typo in llvm/test/tools/llvm-objdump/RISCV/riscv-ar-coverage.s do…
arjunUpatel Jun 9, 2025
0366e87
Avoid else case as per comments
arjunUpatel Jun 9, 2025
a01fa24
Likely did a bad merge in the past. Updating to reflect correct code …
arjunUpatel Jun 9, 2025
afd4861
Merge branch 'main' into main
arjunUpatel Jun 9, 2025
3e57664
Merge branch 'main' of github.com:arjunUpatel/llvm-project
arjunUpatel Jun 9, 2025
47d964e
Reduce map lookup of section symbols
arjunUpatel Jun 10, 2025
52ebb65
Remove changes affecting non-RISCV targets
arjunUpatel Jun 10, 2025
1737696
Merge branch 'main' of github.com:arjunUpatel/llvm-project
arjunUpatel Jun 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/include/llvm/MC/MCInstrAnalysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,10 @@ class MCInstrAnalysis {
evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const;

virtual bool
evaluateInstruction(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const;

/// Given an instruction tries to get the address of a memory operand. Returns
/// the address on success.
virtual std::optional<uint64_t>
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/MC/MCInstrAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@ bool MCInstrAnalysis::evaluateBranch(const MCInst & /*Inst*/, uint64_t /*Addr*/,
return false;
}

bool MCInstrAnalysis::evaluateInstruction(const MCInst &Inst,
uint64_t Addr, uint64_t Size,
uint64_t &Target) const {
return false;
}

std::optional<uint64_t> MCInstrAnalysis::evaluateMemoryOperandAddress(
const MCInst &Inst, const MCSubtargetInfo *STI, uint64_t Addr,
uint64_t Size) const {
Expand Down
135 changes: 121 additions & 14 deletions llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
#include <bitset>
#include <cstdint>

#define GET_INSTRINFO_MC_DESC
#define ENABLE_INSTR_PREDICATE_VERIFIER
Expand Down Expand Up @@ -178,21 +180,35 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
}

switch (Inst.getOpcode()) {
default: {
// Clear the state of all defined registers for instructions that we don't
// explicitly support.
auto NumDefs = Info->get(Inst.getOpcode()).getNumDefs();
for (unsigned I = 0; I < NumDefs; ++I) {
auto DefReg = Inst.getOperand(I).getReg();
if (isGPR(DefReg))
setGPRState(DefReg, std::nullopt);
case RISCV::LUI: {
setGPRState(Inst.getOperand(0).getReg(),
SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
break;
}
case RISCV::C_LUI: {
MCRegister Reg = Inst.getOperand(0).getReg();
if (Reg == RISCV::X2)
break;
setGPRState(Reg, SignExtend64<18>(Inst.getOperand(1).getImm() << 12));
break;

}
case RISCV::AUIPC: {
setGPRState(Inst.getOperand(0).getReg(),
Addr + SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
break;
}
default: {
// Clear the state of all defined registers for instructions that we don't
// explicitly support.
auto NumDefs = Info->get(Inst.getOpcode()).getNumDefs();
for (unsigned I = 0; I < NumDefs; ++I) {
auto DefReg = Inst.getOperand(I).getReg();
if (isGPR(DefReg))
setGPRState(DefReg, std::nullopt);
}
break;
}
break;
}
case RISCV::AUIPC:
setGPRState(Inst.getOperand(0).getReg(),
Addr + (Inst.getOperand(1).getImm() << 12));
break;
}
}

Expand Down Expand Up @@ -230,6 +246,97 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
return false;
}

bool evaluateInstruction(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const override {
switch(Inst.getOpcode()) {
default:
return false;
case RISCV::ADDI: {
if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
// TODO: Figure out ways to find the actual value of XLEN during analysis
int XLEN = 32;
uint64_t Mask = ~((uint64_t)0) >> (64 - XLEN);
Target = *TargetRegState + SignExtend64<12>(Inst.getOperand(2).getImm());
Target &= Mask;
return true;
}
break;
}
case RISCV::ADDIW: {
if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
uint64_t Mask = ~((uint64_t)0) >> 32;
Target = *TargetRegState + SignExtend64<12>(Inst.getOperand(2).getImm());
Target &= Mask;
Target = SignExtend64<32>(Target);
return true;
}
break;
}
case RISCV::C_ADDI: {
int64_t Offset = Inst.getOperand(2).getImm();
if (Offset == 0)
break;
if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
Target = *TargetRegState + SignExtend64<6>(Offset);
return true;
}
break;
}
case RISCV::C_ADDIW: {
int64_t Offset = Inst.getOperand(2).getImm();
if (Offset == 0)
break;
if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
uint64_t Mask = ~((uint64_t)0) >> 32;
Target &= Mask;
Target = *TargetRegState + SignExtend64<6>(Offset);
Target = SignExtend64<32>(Target);
return true;
}
break;
}
case RISCV::LB:
case RISCV::LH:
case RISCV::LD:
case RISCV::LW:
case RISCV::LBU:
case RISCV::LHU:
case RISCV::LWU:
case RISCV::SB:
case RISCV::SH:
case RISCV::SW:
case RISCV::SD:
case RISCV::FLH:
case RISCV::FLW:
case RISCV::FLD:
case RISCV::FSH:
case RISCV::FSW:
case RISCV::FSD: {
int64_t Offset = SignExtend64<12>(Inst.getOperand(2).getImm());
if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg()))
Target = *TargetRegState + Offset;
else
Target = Offset;
return true;
}
case RISCV::C_LD:
case RISCV::C_SD:
case RISCV::C_FLD:
case RISCV::C_FSD:
case RISCV::C_SW:
case RISCV::C_LW:
case RISCV::C_FSW:
case RISCV::C_FLW: {
if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
Target = *TargetRegState + Inst.getOperand(2).getImm();
return true;
}
break;
}
}
return false;
}

bool isTerminator(const MCInst &Inst) const override {
if (MCInstrAnalysis::isTerminator(Inst))
return true;
Expand Down
15 changes: 8 additions & 7 deletions llvm/tools/llvm-objdump/llvm-objdump.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1513,8 +1513,8 @@ collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
if (MIA) {
if (Disassembled) {
uint64_t Target;
bool TargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
if (TargetKnown && (Target >= Start && Target < End) &&
bool BranchTargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
if (BranchTargetKnown && (Target >= Start && Target < End) &&
!Labels.count(Target)) {
// On PowerPC and AIX, a function call is encoded as a branch to 0.
// On other PowerPC platforms (ELF), a function call is encoded as
Expand Down Expand Up @@ -2323,9 +2323,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
if (Disassembled && DT->InstrAnalysis) {
llvm::raw_ostream *TargetOS = &FOS;
uint64_t Target;
bool PrintTarget = DT->InstrAnalysis->evaluateBranch(
Inst, SectionAddr + Index, Size, Target);

bool PrintTarget = DT->InstrAnalysis->evaluateBranch(Inst, SectionAddr + Index, Size, Target) ||
DT->InstrAnalysis->evaluateInstruction(Inst, SectionAddr + Index, Size, Target);
if (!PrintTarget) {
if (std::optional<uint64_t> MaybeTarget =
DT->InstrAnalysis->evaluateMemoryOperandAddress(
Expand Down Expand Up @@ -2368,6 +2367,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
if (It->first != TargetSecAddr)
break;
TargetSectionSymbols.push_back(&AllSymbols[It->second]);
if (AllSymbols[It->second].empty())
TargetSecAddr = 0;
}
} else {
TargetSectionSymbols.push_back(&Symbols);
Expand Down Expand Up @@ -2398,7 +2399,7 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
break;
}

// Branch targets are printed just after the instructions.
// Branch and instruction targets are printed just after the instructions.
// Print the labels corresponding to the target if there's any.
bool BBAddrMapLabelAvailable = BBAddrMapLabels.count(Target);
bool LabelAvailable = AllLabels.count(Target);
Expand Down Expand Up @@ -2479,7 +2480,7 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
<< ">";
} else if (LabelAvailable) {
*TargetOS << " <" << AllLabels[Target] << ">";
}
}
// By convention, each record in the comment stream should be
// terminated.
if (TargetOS == &CommentStream)
Expand Down