Skip to content

feat: Linux 6.12.51 boot support with OpenSBI + critical bug fixes#22

Closed
zevorn wants to merge 44 commits intomainfrom
linux-boot-fixes
Closed

feat: Linux 6.12.51 boot support with OpenSBI + critical bug fixes#22
zevorn wants to merge 44 commits intomainfrom
linux-boot-fixes

Conversation

@zevorn
Copy link
Copy Markdown
Member

@zevorn zevorn commented Apr 6, 2026

Summary

Machina RISC-V emulator can now successfully boot OpenSBI v1.5.1 + Linux kernel 6.12.51 to a busybox shell (Please press Enter to activate this console.).

This PR includes:

Critical Bug Fixes

  • neg_align use-after-move (root cause of boot hang): neg_align_ptr() was captured before add_cpu() moved FullSystemCpu, causing timer thread to write to stale address — goto_tb chains could never be broken by timer interrupts
  • Pending IRQ delivery deadlock: when interrupts are pending but not deliverable (SIE=0), exec loop now keeps neg_align=-1 to break goto_tb chains until the guest re-enables interrupts
  • SATP mode probe sync: mmu.set_satp() now uses post-validation cpu.csr.satp instead of raw value, preventing invalid Sv48/Sv57 modes from corrupting MMU state
  • mip precise mirroring: handle_interrupt() uses precise hw_mask instead of OR, preventing sticky hardware mip bits
  • rev8 RV64 encoding: fixed insn32.decode from RV32 shamt=24 to RV64 shamt=56
  • TB page boundary: prevent instruction fetch across 4K page boundaries
  • helper_sc TLB miss: return SC failure (1) instead of using invalid guest_base
  • Register allocator bb_boundary: flush stale mappings at basic block labels
  • TB size calculation: use pc_next - pc_first instead of num_insns * 4 for RVC
  • FDT placement: 64KB margin at top of RAM for OpenSBI scratch area
  • Spill area overflow: CPU_TEMP_BUF_NLONGS 128→512 for complex kernel TBs

New Features

  • RISC-V Bitmanip extensions: Zba (5), Zbb (~24), Zbs (8), Zbc (3) — 40 new instructions with disassembly support
  • MDevice architecture upgrade: DeviceCell/DeviceRefCell for interior mutability, atomic PLIC pending bitmap, lock-free IRQ chain
  • VirtioDevice trait: pluggable backends (block, future net)
  • UART 16550 loopback/modem: MCR loopback mode + MSR signals (CTS/DSR/DCD)
  • FPU mstatus init: FS=Initial at reset for FP instruction legality
  • PLIC edge-triggered IRQ: rising-edge-only pending, matching QEMU semantics
  • Code buffer: 16 MiB → 256 MiB to reduce BufferFull during kernel boot
  • GDB single-step fix: skip breakpoint check during step to prevent re-fire

Test Results

  • 1369 passed, 0 failed, 2 ignored (softmmu exec loop hang — pre-existing)
  • Clippy zero warnings
  • Linux 6.12.51 boots to shell with OpenSBI v1.5.1

Ported from PR #19

FPU mstatus init, UART loopback/modem, and PLIC edge-triggered semantics are adapted from #19 (excluding PCIe and VirtIO-net which will be separate PRs).

Test plan

  • cargo test --workspace — 1369/1371 pass
  • cargo clippy -- -D warnings — zero warnings
  • Linux 6.12.51 + OpenSBI v1.5.1 boots to shell
  • rCore ch1-ch8 verified in prior session
  • riscv-tests 133/134 verified in prior session

🤖 Generated with Claude Code

zevorn added 30 commits April 4, 2026 09:43
Three hotpath optimizations identified via perf profiling with
tg-rcore-tutorial ch4 (13858 -> 2240 samples):

1. JumpCache: generation counter invalidation (was 18.93% CPU)
   Replace 64KB memset with O(1) generation bump. Stale entries
   detected lazily on lookup, overwritten on insert.

2. TbStore: per-page refcount replaces bitmap rebuild (was 3.45%)
   Increment on TB creation, decrement on invalidation. Eliminates
   the full-scan rebuild_code_bitmap after every phys_page flush.

3. Exec loop: is_code_page filter on dirty pages (was ~78%)
   Only call invalidate_phys_page for code pages. Data-page
   writes are the common case and don't need TB invalidation.

4. MMU: reuse dirty_pages_buf to avoid per-call Vec allocation.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Replace O(N) invalidate_all (iterate 65536 TBs, clear hash,
clear page_heads) with O(1) generation counter bump. Each TB
records its creation generation; a mismatch means stale.

Key changes:
- Add global_gen AtomicUsize to TbStore, bumped on invalidate_all
- Add gen AtomicUsize to TranslationBlock, stamped at creation
- Add gen checks to TbStore::lookup(), exit_target cache, and
  tb_add_jump dst validity
- Skip hash table and page_heads clearing in invalidate_all
  (stale entries filtered by gen check during lookup)
- Keep code_pages clear to prevent is_code_page false positives
- Add per-page TB linked list for O(k) invalidate_phys_page
- Remove dead unlink_hash method

Performance: ch4 2.281s -> 0.243s (9.4x), now 5.8x faster
than QEMU (0.243s vs 1.407s).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
 flush clears page_heads

 Add gen check to invalidate_phys_page for fix test gen stamping

 clean dead code

 remove unlink_hash

All riscv-tests (84/84 PASS] and tg-rcore ch1-ch8 boot correctly.
 No panics, no crashes. Performance: machina ~2.3s vs QEMU ~1.4s.

5.8x faster.

 QEMU.

 Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Add a new machina-gdbstub crate and wire it into the
system and emulator entry point.

Implement a basic RSP server with register, memory,
breakpoint, single-step, and target XML support.

Add unit and integration tests for protocol handling,
state coordination, and production-path GDB flows.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The gdbstub draft and plan files were generated during the
gen-plan session and subsequently moved to .humanize/plans/
by the user. Remove the copies from docs/plans/.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Wire the MObject trait into all platform devices (UART, PLIC, Aclint,
VirtioMmio) and SysBus so that every realized device participates in
the machine object hierarchy. SysBus now owns an MObjectState and
auto-attaches children on attach_to_bus (signature changes to &mut).

Add ChardevObject and ChardevResolver trait for path-based chardev
resolution, replacing direct CharFrontend injection. RefMachine gains
a MOM object tree with lookup-by-path/local-id, property introspection,
and a chardev resolver for UART backend wiring.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
…points

Instruction-level single-step (AC-1):
- Add tb_gen_code_cflags() for ephemeral 1-insn TBs with
  CF_SINGLE_STEP|1, bypassing TB cache
- Suppress IRQ/timer delivery during GDB single-step
- Complete step immediately after 1-insn TB executes

Multi-vCPU debugging (AC-2):
- Extend GdbState with per-CPU snapshots (Vec<GdbCpuSnapshot>),
  g_cpu_idx/c_cpu_idx for H-packet routing, stop_thread tracking
- Implement H-packet routing (Hg/Hc), thread enumeration
  (qfThreadInfo/qsThreadInfo/qThreadExtraInfo), T thread-alive
- Thread ID in all stop replies (T05thread:XX;)

CSR register access (AC-3):
- Add gdb_csr module with 30 common RISC-V CSRs (mstatus, mepc,
  satp, etc.) mapped to GDB register numbers 66+
- Dynamic XML generation with org.gnu.gdb.riscv.csr and
  org.gnu.gdb.riscv.virtual features
- CSR read/write via p/P packets, virtual/priv register at 65

Watchpoints (AC-4):
- Add WatchType enum and BTreeMap-based watchpoint tracking
- Z2/Z3/Z4 (write/read/access) and z2/z3/z4 handlers
- AtomicUsize counter for fast bail-out in memory helpers
- Watchpoint stop reply with watch/rwatch/awatch format

Other features:
- PhyMemMode toggle (AC-6): qqemu.PhyMemMode/Qqemu.PhyMemMode
- Bulk register write G packet (AC-7)
- qRcmd monitor passthrough placeholder (AC-5)
- Remove unused GuestCpu gdb_read/write methods
- Remove debug eprintln! traces from exec_loop and mem helpers
- Fix MObject dyn compatibility (is_type where Self: Sized)
- Fix Chardev Sync bound, SysBus root object default

Status: compiles, 41 test failures remain (mostly MOM tree
integration regressions from prior commit, plus gdbstub stop
reply format changes needing test updates).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Interior-mutable cell types for device register state.
DeviceCell<T> wraps Copy scalars, DeviceRefCell<T> wraps
complex state with parking_lot::Mutex for lightweight
per-device locking.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
24 tests verifying x86-64 backend emits correct machine code
bytes for bit-count and rotate opcodes. Covers I64/I32 type
variants and extended register (R8-R15) encodings.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
InterruptSource wraps IrqSink with a fixed IRQ number,
providing raise/lower/set methods callable from any thread
with &self. Used alongside existing IrqLine during migration.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
sh1add, sh2add, sh3add, add.uw, slli.uw — address
computation instructions. All use pure IR combos
(shl + add or ext_u32 + shl). 17 tests added.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Basic bit manipulation: clz/ctz/cpop via IR opcodes,
andn/orn/xnor as IR combos, max/min via movcond,
rol/ror via RotL/RotR opcodes, rev8 via Bswap64,
orc.b via helper call, W-variants for 32-bit ops.
34 tests added.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Carry-less multiplication: clmul, clmulh, clmulr.
Implemented as helper calls (no x86-64 direct mapping).
22 tests added.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Single-bit operations: bclr/bext/binv/bset + immediate
variants. Translation and tests added. Decode patterns
added to insn32.decode but decodetree generator does not
emit dispatch for them yet — needs build.rs investigation.
30 tests added (19 currently failing pending decode fix).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
MmioOps now requires Sync for multi-vCPU safety.
RegionType::Io holds Arc<dyn MmioOps> directly instead
of Arc<Mutex<Box<...>>>. Device-internal locking handles
serialization.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
PLIC source input (set_irq) is now fully lock-free using
AtomicU32 pending bitmap. Context state (claim/complete)
uses DeviceRefCell. PlicIrqSink holds Arc<Plic> directly.
Entire IRQ propagation chain is lock-free.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Cache block operations (cbo.clean, cbo.flush, cbo.inval,
cbo.zero) implemented as NOP. Machina has no cache
simulation. Linux probes and adapts. 10 tests added.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Uart16550 register state moved to DeviceRefCell<Uart16550Regs>.
SysBusDeviceState wrapped in parking_lot::Mutex for setup calls.
All public methods now take &self so the device can be shared via
Arc<Uart16550> without an outer Mutex. Chardev receive() works
lock-free via interior mutability from the input thread.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
All Bitmanip instructions disassemble correctly.
43 encoding-level tests verify each instruction.
Covers R-type, I-type, unary ops, and shift-immediate
variants across all 4 dispatch functions.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
ACLINT register state (mtimecmp, msip, epoch, mtime_base) in
DeviceRefCell<AclintRegs>. SysBusDeviceState in parking_lot::Mutex.
Timer outputs (IrqLine) and WfiWaker in parking_lot::Mutex for
setup. All methods &self, AclintMmio wraps Arc<Aclint> directly.
Remove MObject/MDevice trait impls, add object_info/with_mdevice.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
RefMachineChardevResolver and chardev_resolver() became
dead code after device migration to interior mutability.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Replace scattered const BASE/SIZE with RefMemMap enum
and REF_MEMMAP array. Add RefIrqMap for IRQ assignment.
Prepares for multi-slot VirtIO and new device addresses.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
VirtioMmio now holds Box<dyn VirtioDevice>. Device ID,
features, config, queue handling all delegated to the
backend device. Prepares for VirtioNet.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
At basic block boundaries the register allocator failed to
invalidate stale register mappings, causing cross-block
use of expired values. Add bb_boundary to flush mappings.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
guest_size was computed as num_insns * 4, incorrect for
2-byte compressed instructions. Use pc_next - pc_first.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Replace ptr.add() with ptr.wrapping_add() in fetch_insn16
and fetch_insn32 to avoid UB on kernel virtual addresses
that wrap around the address space.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
SC helper returned invalid host address on TLB miss by
using guest_base as fallback addend. Return SC failure
instead (spurious failure is legal per RISC-V spec).

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
JIT goto_tb direct jumps could form infinite loops.
Add AtomicI32 neg_align field checked before each jump.
Timer interrupts set flag to break the chain.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Restore helper_orc_b and helper_clmul/h/r that were
accidentally dropped. Fix helper_sc to return SC failure
on TLB miss instead of using invalid guest_base addend.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Add page_byte_limit check in translate_insn. When pc_next
reaches the page boundary, terminate TB to prevent reading
garbage bytes from beyond the mapped guest page.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
zevorn added 14 commits April 6, 2026 12:25
Add Illegal Instruction exception (cause=2) handler for
EXCP_UNDEF instead of silently exiting the emulator.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Add clock-frequency = 3686400 to UART FDT node (required
by Linux of_serial driver). Add initrd field to MachineOpts
and -initrd/-append CLI arguments. Write bootargs to FDT
/chosen node.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Fix wildcard-in-or-patterns in gdbstub handler, derivable
impl in gdb snapshot, OR pattern range in gdb breakpoints.
Auto-format helpers and main.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Account for Zba(5) + Zbb(29) + Zbs(8) + Zbc(3) +
Zicbom/Zicboz(4) = 49 new instruction decode patterns.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
RV64GC profile now enables Bitmanip extensions by default
(matching QEMU virt). FDT ISA string updated to include
zba_zbb_zbc_zbs_zicsr_zifencei.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Remove Ecall-from-S from MEDELEG mask (incorrect delegation).
Validate SATP mode (only accept Bare=0 and Sv39=8).
Fix PMP NAPOT range overflow when g >= 61.
Implement initrd loading in boot.rs with FDT chosen node.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Rejection tests now explicitly disable the corresponding
extension in RiscvCfg instead of relying on it being off
by default (which changed after enabling Zb* globally).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Fix handle_interrupt to use precise mirroring of hardware
mip bits from shared_mip instead of OR (which made bits
sticky). Leave 64KB margin at top of RAM for FDT placement
so OpenSBI scratch space doesn't exceed RAM boundary.
Add generate_fdt_with() for initrd/bootargs FDT regeneration.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
mmu.set_satp() was called with the raw new_val instead of
the validated csr.satp value. When the kernel probes Sv48/
Sv57 modes (which we reject), the MMU would be set to the
rejected mode while csr.satp stayed at 0. This caused page
table walks with an invalid root PPN, producing infinite
instruction page faults.

Fix: use self.cpu.csr.satp (post-validation) for
mmu.set_satp() instead of raw new_val.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
rev8 shamt was 24 (RV32) instead of 56 (RV64). Linux kernel
uses rev8 in interrupt handlers, causing illegal instruction.
Also increase CPU_TEMP_BUF_NLONGS from 128 to 512 to handle
complex kernel TB spill demands.

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Fix critical Linux boot stall caused by two issues:

1. neg_align_ptr was captured before add_cpu() moved the
   FullSystemCpu into the CpuManager Vec, making the timer
   thread write to a stale address.  Move connect_neg_align
   after add_cpu so the pointer reflects the final location.

2. When interrupts are pending but not deliverable (e.g.,
   SIE=0 in a critical section), the exec loop now keeps
   neg_align=-1 set to break goto_tb chains on every
   iteration, re-checking until the guest re-enables
   interrupts.  Without this, the kernel's seqlock spin
   loops would deadlock waiting for timer interrupts that
   could never be delivered.

Also enlarge the JIT code buffer from 16 MiB to 256 MiB to
reduce BufferFull-triggered global TB invalidation during
kernel boot, and fix the rev8 test encoding (RV64 shamt=56).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Three issues found and fixed:

1. The test wrote P5 (x5/t0) during GDB pause, but the
   MROM reset vector uses t0 for address computation.
   The dirty snapshot restore overwrote t0, causing the
   jump target load to read from garbage address (PC=0).
   Fix: write to s1 (x9) which is unused by the reset
   vector.

2. The step reply assertion expected exact "T05thread:01;"
   but the server returns "T05thread:01;swbreak:;" when
   the previous stop was a breakpoint. Fix: use
   starts_with instead of exact match.

3. The breakpoint at 0x80000000 re-fired on every
   single-step attempt because gdb_check_breakpoint ran
   before the step could execute the instruction. Fix:
   skip breakpoint check during GDB single-step mode.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Add run_with_retry helper for tests that need BufferFull
handling.  Update the two ignored softmmu tests with
descriptive ignore reasons: both hang because the exec
loop enters an infinite TB translation cycle (BufferFull
storm from self-modifying code detection or exception
delivery loop).  These require a deeper exec loop fix
to resolve.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Merge three correctness fixes from #19:

1. FPU mstatus init (csr.rs): set FS=Initial (bit 13) at
   reset so FP instructions are legal when F/D extensions
   are present. Without this, any OS using FPU traps on the
   first FP instruction. Matches QEMU reset behaviour.

2. UART 16550 loopback/modem (uart.rs): implement MCR
   loopback mode (THR output routed back to RX FIFO) and
   MSR signal lines (CTS/DSR/DCD). Some kernels check
   modem status before using the UART.

3. PLIC edge-triggered IRQ (plic.rs): change set_irq to
   only latch pending on rising edge (0->1 transition).
   complete_irq no longer auto-re-pends based on wire
   level. This matches QEMU semantics and prevents
   interrupt storms when the guest defers source-clearing
   to a bottom-half/task.

Update PLIC and ref_machine tests for edge-triggered
semantics: lowering a source no longer clears pending
(only claim does).

Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn>
Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn zevorn closed this Apr 6, 2026
@zevorn zevorn deleted the linux-boot-fixes branch April 6, 2026 13:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 448b465d71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread system/src/gdb.rs
Comment on lines +231 to +233
if addr >= ram_base && addr + len as u64 <= ram_end {
let off = (addr - ram_base) as usize;
let ptr = unsafe { (ram_ptr as *const u8).add(off) };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard GDB RAM bounds checks against u64 overflow

The RAM fast-path check uses addr + len as u64 <= ram_end without checked_add, so a crafted GDB memory request near u64::MAX can wrap the addition, pass the bounds test, and then drive ptr.add(off) with a huge offset. In that case the stub may read/write out of bounds or crash instead of rejecting the packet. The same overflow pattern is repeated in write_memory, so both read and write paths should use overflow-safe range validation.

Useful? React with 👍 / 👎.

Comment on lines +649 to +650
let tb_idx = unsafe { shared.tb_store.alloc(pc, flags, cflags) }?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reclaim single-step TBs instead of leaking allocations

Single-step mode translates a fresh TB on every step (run_exec_loop calls tb_gen_code_cflags), but this helper always allocates a new TB index and never reuses or invalidates it afterward. Because these TBs are not inserted into normal caches yet still consume TbStore entries/code buffer, long debugging sessions eventually exhaust MAX_TBS or translation space, after which stepping returns BufferFull and cannot continue.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant