Conversation
Stage-gated incremental strategy with 5 stages (S0-S4), hybrid difftest (serial output + event tracing), and RLCR iteration loop per stage. Signed-off-by: machina-ai <machina-ai@gevico.org>
Stage-gated incremental plan for booting OpenSBI + Linux 6.12.51 to userspace shell. Covers difftest infrastructure, S0-S4 stages, and final validation. Signed-off-by: machina-ai <machina-ai@gevico.org>
Structured implementation plan with acceptance criteria, path boundaries, phase dependencies, and risk estimates. Signed-off-by: machina-ai <machina-ai@gevico.org>
Thread-local event tracer for CSR, exception, and MMIO events. Enabled via --trace <file> flag. Output is one structured record per line for comparison with QEMU -d traces. Signed-off-by: machina-ai <machina-ai@gevico.org>
Add trace calls to handle_priv_csr (CSR writes), handle_exception (exception entries), and AddressSpace read/write (MMIO dispatch). These are are trace records for a structured log for debugging boot failures against QEMU reference. Signed-off-by: machina-ai <machina-ai@gevico.org>
Signed-off-by: machina-ai <machina-ai@gevico.org>
OpenSBI v1.5.1 + Linux 6.12.51 full boot log on virt machine. Updated gen_ref.sh with correct rootfs path. Signed-off-by: machina-ai <machina-ai@gevico.org>
…inux kernel 6.12.51 all the way to a user-space shell. The boot completes within 60 seconds, displaying Please press Enter to activate this console., and the busybox shell can execute commands (e.g., ls /, uname -a).
Critical Bug Fixes
1. Register Allocator bb_boundary Bug (accel/src/regalloc.rs)
At basic block boundaries (label positions), the register allocator failed to invalidate stale register mappings, causing cross-block use of expired register values that resulted in pc=0 crashes. Added a bb_boundary function to correctly flush non-fixed temporary mappings at labels.
2. TB Size Calculation Error (system/src/cpus.rs)
guest_size was computed as num_insns * 4, which is incorrect for 2-byte compressed (RVC) instructions. Changed to pc_next - pc_first for accurate size calculation.
3. Infinite goto_tb Chain (accel/src/x86_64/emitter.rs, guest/riscv/src/riscv/cpu.rs)
JIT-generated goto_tb direct jumps could form infinite loops with no mechanism to break out (unlike QEMU's icount_decr / exit_request). Implemented a neg_align atomic flag:
Added an AtomicI32 field to RiscvCpu
emit_goto_tb now checks this flag before jumping; if negative, the chain breaks
ACLINT timer threads set the flag to -1 on interrupt to break the chain
exec_loop resets the flag at the start of each iteration
4. Missing FDT clock-frequency (hw/riscv/src/ref_machine.rs)
The UART device tree node lacked a clock-frequency property, causing the of_serial driver probe to fail with -ENOENT, which prevented the kernel console from being registered. Added clock-frequency = 3686400.
5. helper_sc NULL Pointer Dereference (guest/riscv/src/riscv/trans/helpers.rs)
The SC (store-conditional) helper used guest_base as a fallback addend on TLB miss, producing invalid host addresses for Sv39 user-mode virtual addresses. Fix: return SC failure on TLB miss (spurious failure is allowed by the RISC-V specification).
6. Interrupt Bit Synchronization (system/src/cpus.rs)
Hardware-controlled mip bits (MTI/MSI/MEI/SEI) were synchronized via simple OR, causing bits to become sticky. Changed to precise mirroring from shared_mip, leaving software-controlled bits (STIP/SSIP) untouched.
7. ACLINT 4-byte Split Access (hw/intc/src/aclint.rs)
ACLINT read/write handlers ignored the size parameter, failing to handle 4-byte split accesses to 8-byte mtime/mtimecmp registers.
8. Instruction Fetch Pointer UB (guest/riscv/src/riscv/mod.rs)
fetch_insn16/fetch_insn32 used .add() on raw pointers which has undefined behavior when the result wraps around the address space (common for kernel virtual addresses). Changed to wrapping_add for well-defined wrapping arithmetic.
Test Status
All 1139 unit tests pass
Linux 6.12.51 + initramfs boots to user-space shell
------
example
------
./target/release/machina \
-M riscv64-ref \
-m 256M \
-nographic \
-kernel /path/to/linux-6.12.51/arch/riscv/boot/Image \
-initrd /path/to/linux-6.12.51/chytest/rootfs.cpio.gz \
-append "console=ttyS0 earlycon"
There was a problem hiding this comment.
Pull request overview
This PR advances the Machina RISC-V full-system emulator to successfully boot OpenSBI and Linux 6.12.51 to a userspace shell, adding difftest tooling and multiple correctness fixes across the JIT, CPU/MMU, interrupt/timer devices, and FDT/boot pipeline.
Changes:
- Add initrd + kernel cmdline plumbing (CLI →
MachineOpts→ FDT/chosen→ boot loader), plus new difftest scripts/logs. - Add event tracing (
--trace) and instrument CSR/exception/MMIO paths to aid QEMU trace comparison. - Fix several runtime correctness issues impacting Linux boot (e.g., TB sizing for RVC, goto_tb chain breaking, regalloc BB boundary invalidation, ACLINT split 32-bit accesses, safer fetch pointer math, SC helper behavior).
Reviewed changes
Copilot reviewed 36 out of 37 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
util/src/trace.rs, util/src/lib.rs |
New trace module and export. |
src/main.rs |
CLI flags for -initrd, -append, --trace; wiring neg_align into ACLINT. |
core/src/machine.rs |
Extend MachineOpts with initrd. |
hw/riscv/src/ref_machine.rs, hw/riscv/src/boot.rs |
FDT updates (bootargs, initrd props, UART clock); initrd loading; QEMU-style FDT placement. |
hw/intc/src/aclint.rs |
Split 4B/8B access handling + neg_align signaling on timer interrupts. |
accel/src/x86_64/emitter.rs, accel/src/exec/exec_loop.rs |
goto_tb chain-breaking check + flag reset each loop. |
memory/src/address_space.rs |
MMIO trace hooks. |
tools/difftest/* |
Reference generation + serial diff tooling and reference log. |
Cargo.toml |
Add machina-util dependency; adjust release profile. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static TRACE_ENABLED: AtomicBool = AtomicBool::new(false); | ||
|
|
||
| thread_local! { | ||
| static TRACE_FILE: RefCell<Option<File>> = RefCell::new(None); | ||
| } | ||
|
|
||
| pub fn init_trace(path: &str) -> std::io::Result<()> { | ||
| let f = File::create(path)?; | ||
| TRACE_FILE.with(|t| *t.borrow_mut() = Some(f)); | ||
| TRACE_ENABLED.store(true, Ordering::Relaxed); | ||
| Ok(()) |
There was a problem hiding this comment.
TRACE_FILE is thread-local, but init_trace() only initializes it for the calling thread. If tracing is enabled and events are emitted from any other thread (e.g., future multi-vCPU exec threads), those events will be silently dropped because TRACE_ENABLED is global but TRACE_FILE is None in other threads. Consider using a global OnceLock<Mutex<File>> (or similar) so all threads write to the same sink, or make per-thread files explicit (e.g., suffix with thread id) and initialize them lazily when first used.
| if machina_util::trace::is_enabled() { | ||
| machina_util::trace::trace_mmio( | ||
| addr.0, size as u8, 0, false, "", | ||
| ); | ||
| } | ||
| let ops = ops.lock().unwrap(); | ||
| ops.read(region_off, size) |
There was a problem hiding this comment.
MMIO trace events currently log dev as an empty string, and the read-path logs val=0 before performing the read. This makes the trace ambiguous and can be misleading when comparing against QEMU traces. Consider threading through the MemoryRegion name (or FlatRange metadata) as dev, and logging the actual read value after ops.read() returns.
| if machina_util::trace::is_enabled() { | |
| machina_util::trace::trace_mmio( | |
| addr.0, size as u8, 0, false, "", | |
| ); | |
| } | |
| let ops = ops.lock().unwrap(); | |
| ops.read(region_off, size) | |
| let ops = ops.lock().unwrap(); | |
| let val = ops.read(region_off, size); | |
| if machina_util::trace::is_enabled() { | |
| machina_util::trace::trace_mmio( | |
| addr.0, size as u8, val, false, "io", | |
| ); | |
| } | |
| val |
| OUTDIR="${1:-.}" | ||
| mkdir -p "$OUTDIR" | ||
| KERNEL="/home/chyyuu/thecodes/buildkernel/linux-6.12.51/arch/riscv/boot/Image" | ||
| INITRD="/home/chyyuu/thecodes/buildkernel/linux-6.12.51/chytest/rootfs.cpio.gz" | ||
| echo "Generating QEMU reference serial output..." |
There was a problem hiding this comment.
This script hard-codes absolute KERNEL/INITRD paths under /home/..., which makes it unusable for other developers and CI. Consider taking kernel/initrd paths as arguments (or environment variables) and failing with a clear usage message if they’re missing.
| let initrd_size = std::fs::metadata(path) | ||
| .map_err(|e| format!("initrd: {}: {}", path.display(), e))? | ||
| .len(); | ||
| let initrd_start = | ||
| RAM_BASE + ((self.ram_size - 32 * 1024 * 1024) & !0xFFF); | ||
| let initrd_end = initrd_start + initrd_size; | ||
| if initrd_end > RAM_BASE + self.ram_size - 0x1000 { | ||
| return Err(format!( |
There was a problem hiding this comment.
initrd_start is computed with self.ram_size - 32 * 1024 * 1024 without checking that ram_size is at least 32MiB. For smaller RAM sizes this subtraction can underflow (panic in debug; wrap in release), yielding an invalid initrd placement address and potentially allowing out-of-bounds loads. Add an explicit size check (e.g., require ram_size >= 32MiB + initrd_size + safety_margin) or use checked/saturating arithmetic and return a clear error when RAM is too small for the initrd placement strategy.
| let initrd_size = std::fs::metadata(path) | |
| .map_err(|e| format!("initrd: {}: {}", path.display(), e))? | |
| .len(); | |
| let initrd_start = | |
| RAM_BASE + ((self.ram_size - 32 * 1024 * 1024) & !0xFFF); | |
| let initrd_end = initrd_start + initrd_size; | |
| if initrd_end > RAM_BASE + self.ram_size - 0x1000 { | |
| return Err(format!( | |
| const INITRD_RESERVED_TAIL: u64 = 32 * 1024 * 1024; | |
| const INITRD_SAFETY_MARGIN: u64 = 0x1000; | |
| let initrd_size = std::fs::metadata(path) | |
| .map_err(|e| format!("initrd: {}: {}", path.display(), e))? | |
| .len(); | |
| let min_ram_size = INITRD_RESERVED_TAIL | |
| .checked_add(initrd_size) | |
| .and_then(|v| v.checked_add(INITRD_SAFETY_MARGIN)) | |
| .ok_or_else(|| { | |
| format!( | |
| "initrd ({} bytes) is too large for the RAM placement calculation", | |
| initrd_size | |
| ) | |
| })?; | |
| if self.ram_size < min_ram_size { | |
| return Err(format!( | |
| "ram_size ({} bytes) is too small for initrd placement; need at least {} bytes", | |
| self.ram_size, min_ram_size | |
| ) | |
| .into()); | |
| } | |
| let initrd_offset = self | |
| .ram_size | |
| .checked_sub(INITRD_RESERVED_TAIL) | |
| .ok_or_else(|| { | |
| format!( | |
| "ram_size ({} bytes) is too small for initrd placement", | |
| self.ram_size | |
| ) | |
| })?; | |
| let initrd_start = RAM_BASE | |
| .checked_add(initrd_offset & !0xFFF) | |
| .ok_or_else(|| "initrd start address overflow".to_string())?; | |
| let initrd_end = initrd_start | |
| .checked_add(initrd_size) | |
| .ok_or_else(|| "initrd end address overflow".to_string())?; | |
| let max_initrd_end = RAM_BASE | |
| .checked_add(self.ram_size) | |
| .and_then(|v| v.checked_sub(INITRD_SAFETY_MARGIN)) | |
| .ok_or_else(|| "RAM end address underflow/overflow".to_string())?; | |
| if initrd_end > max_initrd_end { | |
| return Err(format!( |
| /// Add a CPU to be managed. Ownership is transferred. | ||
| pub fn add_cpu(&mut self, cpu: FullSystemCpu) { | ||
| /// Returns a mutable reference to the added CPU so | ||
| /// callers can obtain stable pointers after placement. | ||
| pub fn add_cpu( | ||
| &mut self, | ||
| cpu: FullSystemCpu, | ||
| ) -> &mut FullSystemCpu { | ||
| self.cpus.push(cpu); | ||
| self.cpus.last_mut().unwrap() | ||
| } |
There was a problem hiding this comment.
add_cpu() returns a mutable reference and the comment claims this enables callers to obtain “stable pointers after placement”, but CPUs are stored in a Vec<FullSystemCpu>. Any subsequent push() that reallocates will move existing elements and invalidate previously taken raw pointers (e.g., the neg_align pointer handed to ACLINT). If you need stable addresses, store CPUs as Box<FullSystemCpu> (or Pin<Box<...>>) or reserve capacity for all CPUs up-front before taking pointers.
| /// ```text | ||
| /// cmp DWORD [rbp + neg_off], 0 | ||
| /// js past_jmp ; neg → exit TB | ||
| /// <nop padding for alignment> | ||
| /// JMP rel32 ; patched to next TB | ||
| /// past_jmp: | ||
| /// <fall through to exit_tb> | ||
| /// ``` | ||
| /// | ||
| /// The disp32 field is aligned to 4 bytes so that | ||
| /// concurrent patching is atomic on x86-64. | ||
| pub fn emit_goto_tb( | ||
| &self, | ||
| buf: &mut CodeBuffer, | ||
| neg_off: i32, | ||
| ) -> (usize, usize) { | ||
| // cmp DWORD [rbp + neg_off], 0 | ||
| // 83 BD <disp32> 00 | ||
| buf.emit_u8(0x83); | ||
| buf.emit_u8(0xBD); | ||
| buf.emit_u32(neg_off as u32); | ||
| buf.emit_u8(0x00); |
There was a problem hiding this comment.
The generated goto_tb sequence reads neg_align via a plain memory operand (cmp DWORD [rbp + neg_off], 0). Since neg_align is an AtomicI32 written from other threads, mixing atomic stores with non-atomic reads is undefined behavior in Rust’s memory model (and can also break on architectures where plain loads aren’t atomic/ordered). Consider making the JIT side perform an atomic load (e.g., via a tiny helper that does AtomicI32::load(Ordering::Relaxed)), or change the design so both sides use non-atomic + external synchronization (e.g., stop-the-world), or place the flag in shared memory that is only accessed atomically from all threads.
| FlatRangeKind::Io { ops } => { | ||
| if machina_util::trace::is_enabled() { | ||
| machina_util::trace::trace_mmio( | ||
| addr.0, size as u8, val, true, "", | ||
| ); | ||
| } | ||
| let ops = ops.lock().unwrap(); | ||
| ops.write(region_off, size, val); | ||
| } |
There was a problem hiding this comment.
MMIO write trace events also log dev as an empty string, which makes it hard to correlate accesses with specific devices/regions when debugging. Consider passing a useful device/region identifier (e.g., MemoryRegion name) into the trace call.
| machina-util = { workspace = true } | ||
| libc = { workspace = true } | ||
|
|
||
| [profile.release] |
There was a problem hiding this comment.
[profile.release] debug = 1 changes release build settings for the entire workspace, increasing binary size and potentially impacting performance. If the goal is just better backtraces for local debugging, consider using RUSTFLAGS/cargo build -Z options locally, or gating this behind a dedicated Cargo profile (e.g., profile.release-with-debug) so production/CI release builds remain unchanged.
| [profile.release] | |
| [profile.release-with-debug] | |
| inherits = "release" |
…CP_UNDEF Two bugs caused machina to silently exit when running user-mode binaries (e.g., busybox /init), preventing the Linux shell from starting: 1. Page boundary crossing in TB translation (root cause): gen_code computed max_insns as `avail_bytes / 2`, assuming each instruction is 2 bytes (compressed). However, 4-byte instructions consume 2 halfwords but only count as 1 instruction, so the translator could exceed the page boundary when mixing 32-bit and compressed instructions. Beyond the boundary, `guest_base` (valid only for the current page) would read wrong memory, producing garbage bytes that land the PC mid-instruction. Fix: add `page_byte_limit` to RiscvDisasContext and check it after each instruction in translate_insn. When pc_next reaches the page boundary, the TB is terminated with TooMany. This cooperates with the existing cross_page_insn mechanism for boundary-spanning instructions. 2. Missing EXCP_UNDEF handler in exec_loop: When the translator encountered undecodable bytes (garbage from the page crossing bug), it generated EXCP_UNDEF. But exec_loop had no match arm for EXCP_UNDEF, so it returned ExitReason::Exit(5) which caused machina to print "execution exited" and terminate, instead of raising an Illegal Instruction exception (cause=2) for the guest kernel to handle. Fix: add EXCP_UNDEF arm that calls handle_exception(2, pc).
…yOS support
Three fixes to enable StarryOS (ArceOS-based RISC-V OS) to boot on Machina:
1. UART loopback mode (hw/char/src/uart.rs):
When MCR bit 4 (LOOP_BACK) is set, route THR writes back into
the RX FIFO instead of the chardev. The uart_16550 crate's
test_loopback() sends 0x42 and a 16-byte message, expecting
them echoed back -- without this fix the test byte leaks to the
console ('B') and the blocking read hangs forever.
2. UART modem status signals (hw/char/src/uart.rs):
Set MSR to CTS|DSR|DCD (0xB0) when a chardev is attached,
indicating the virtual receiver is ready. The uart_16550 v0.5
ready_to_send() checks MSR::CTS before transmitting; with
MSR=0 all output after init was blocked indefinitely.
Also implement MCR->MSR loopback mapping per 16550 spec
(DTR->DSR, RTS->CTS, OUT1->RI, OUT2->DCD).
3. mstatus.FS initialisation (guest/riscv/src/riscv/csr.rs):
Initialise mstatus with FS=Initial (bit 13) so that FP
instructions are legal when F/D extensions are present.
Previously FS=Off caused IllegalInstruction on the first
fmv.d.x in clear_fp_registers() during task context setup.
Matches QEMU's CPU reset behaviour.
Import VirtioDevice trait in pci.rs so trait methods (features, config_read, handle_queue) resolve on VirtioBlk, and add the queue_idx parameter required by the new handle_queue signature.
Symptom StarryOS's busybox init process crashed with a segmentation fault on Machina (store page fault at VA 0xcb000, WRITE | USER), while the identical binary ran perfectly on QEMU. The OS booted through RustSBI, ArceOS initialization, VirtIO block device discovery, ext4/pseudofs mounting, and entered user space — but busybox was killed by SIGSEGV immediately after launch. Diagnosis Instrumented Machina with diagnostic prints in the exception handler (exception.rs), syscall path (lib.rs), and memory fault handler (cpus.rs) — including full GPR dumps, Sv39 page table walks, and user-mode ecall argument logging. Identified the faulting instruction: sb zero, 16(a5) at PC 0x402200a inside ld-musl-riscv64.so.1. The store target 0xcb000 was unmapped. Traced the bad address upstream: Register a5 was computed from a4, which was set by addiw a4, s0, -20 at PC 0x4021ffe. With s0 = 0x50, the correct result should be a4 = 0x3c, yielding store address 0xcafec (within the mapped BSS page at 0xca000). Instead, a4 = 0x50 (unchanged from s0), shifting the address to 0xcb000. Ruled out JIT codegen issues: Exhaustively reviewed the addiw translation path (gen_arith_imm_w → gen_set_gpr_sx32 → ExtI32I64), the register allocator (sync_globals, clobber_caller_saved), the IR optimizer (constant folding, copy propagation), and the x86-64 backend (movslq emission, QemuSt MMIO path, InsnStart fault-PC saving). All were correct. Discovered the root cause: The 4-byte addiw instruction at 0x4021ffe straddles the virtual page boundary at 0x4022000 — its first 2 bytes are on one page and the last 2 bytes on the next. Machina's cross_page pre-fetch mechanism (in tb_gen_code) only activates when page_remain % 4 == 2 at the start of a translation block. When a TB begins page-aligned and the cross-page instruction appears mid-block, the check is never triggered. The translator's fetch_insn32 then performs an unaligned 4-byte read via a raw host pointer, reading 2 bytes from the correct physical page and 2 bytes from whatever physically-contiguous host memory follows — which belongs to a different guest physical page. The corrupted encoding 0x0004071b decodes as addiw a4, s0, 0 instead of the correct 0xfec4071b (addiw a4, s0, -20). Fix Modified translate_insn in machina-cursor/guest/riscv/src/riscv/mod.rs to add an explicit guard before fetching any 32-bit instruction: if the instruction would cross the page boundary (pc_next + 4 > page_byte_limit) and was not already handled by the cross_page pre-fetch path, the current translation block is terminated early (cur_insn_len = 0; is_jmp = TooMany). This forces the next TB to start at that PC with page_remain == 2, which correctly triggers the existing cross-page pre-fetch logic to assemble the full instruction from both physical pages. Verification After applying the fix and cleanup, rebuilt Machina (cargo build --release) and ran StarryOS. The full boot sequence completed successfully — busybox shell prompt starry:~# appeared, matching QEMU's behavior exactly.
------ Problem StarryOS, when built with its default configuration (features = ["qemu"]), uses VirtIO-PCI transport to discover block devices. It enumerates PCI devices through the ECAM (Enhanced Configuration Access Mechanism) at 0x30000000, matching the QEMU riscv64 virt platform layout. Machina only implemented VirtIO-MMIO transport at 0x10001000, which requires StarryOS to be specially compiled with the bus-mmio feature. Without PCI support, the default StarryOS kernel could not find any block device, causing a panic: "No block device found!". Root Cause Analysis 1. StarryOS's axdriver uses a compile-time bus selection: bus-mmio enables VirtIO-MMIO probing; without it, bus-pci is the default (enabled by the qemu feature via axfeat/bus-pci). 2. The PCI driver enumerates devices by reading the ECAM configuration space at PCI_ECAM_BASE = 0x30000000, looking for VirtIO vendor ID 0x1AF4. 3. Machina had no PCI infrastructure — no ECAM controller, no PCI configuration space, no VirtIO-PCI device model. 4. The DTB generated by Machina lacked a pci-host-ecam-generic node, so even if PCI probing code existed in the kernel, it would have no device tree entry to work with. Implementation New file: hw/virtio/src/pci.rs — VirtIO-PCI Transport Implemented a complete VirtIO-PCI modern (v1) device with: - PCI configuration space (256 bytes): Vendor 0x1AF4, Device 0x1042 (modern block device), status with capabilities-list bit set, header type 0, interrupt pin INTA#. - Four VirtIO PCI capabilities chained from offset 0x40: VIRTIO_PCI_CAP_COMMON_CFG, VIRTIO_PCI_CAP_ISR_CFG, VIRTIO_PCI_CAP_NOTIFY_CFG (with 20-byte extended format including notify_off_multiplier), and - -VIRTIO_PCI_CAP_DEVICE_CFG. Each points into BAR0 with appropriate offsets and lengths. - BAR0 register region (4 KiB): Common config (feature negotiation, device status, queue configuration), ISR status with auto-clear and interrupt de-assertion, notify register triggering virtqueue processing, and device-specific config (block capacity, sector size). - BAR size probing: Writing 0xFFFFFFFF to BAR0 returns the size mask 0xFFFFF000, allowing the guest to determine the 4 KiB BAR size. - PciEcamOps (impl MmioOps): Decodes ECAM addresses (bus << 20 | device << 15 | function << 12 | register), dispatches to the VirtIO-PCI device's configuration space for bus 0 at the configured slot. - PciBarOps (impl MmioOps): Covers the PCI MMIO window; on each access, checks whether the address falls within the device's current BAR0 range (after the guest programs it), and dispatches to the appropriate BAR0 register handler. Both ops types share the same Arc<Mutex<VirtioPciState>>, ensuring consistent state between config-space BAR writes and BAR-region accesses. Modified: hw/riscv/src/ref_machine.rs — Machine Integration - Added address constants matching the QEMU riscv64 virt platform: ECAM at 0x30000000 (256 MiB), PCI MMIO at 0x40000000 (1 GiB), PCI I/O at 0x03000000 (64 KiB), PCI interrupt range at PLIC sources 32–35. - In init(): When -drive is specified, creates both a VirtIO-PCI device (on PCI bus, slot 1) and a VirtIO-MMIO device (at 0x10001000). Each has its own VirtioBlk backend instance (separate mmap of the same disk image). This dual-mode approach ensures compatibility with both default (bus-pci) and bus-mmio StarryOS builds. - Registers the PCI ECAM and PCI MMIO window as direct subregions of the root memory region (not via sysbus, since they're fixed-address host-bridge regions). - In generate_fdt(): Added a pci@30000000 node with compatible = "pci-host-ecam-generic", device_type = "pci", proper #address-cells/#size-cells (3/2 per PCI binding), bus-range 0–255, ranges covering PIO and 32-bit MMIO windows, and a full interrupt-map mapping 4 slots × 4 pins to PLIC IRQs 32–35 using the standard swizzle formula irq = PCIE_IRQ_BASE + ((slot + pin - 1) % 4). Modified: hw/virtio/src/lib.rs Added pub mod pci; to export the new module. Verification After building (cargo build --release, zero warnings), ran ./runmachina-starry.sh with the unmodified default StarryOS kernel (features = ["qemu"], no bus-mmio). The full boot sequence completed successfully: - RustSBI initialized and handed off to supervisor mode - ArceOS banner displayed, memory/scheduler/drivers initialized - VirtIO block device discovered via PCI — ext4 filesystem mounted - Pseudofs (devfs, tmpfs, proc, sys) mounted - User space entered, busybox executed - Shell prompt starry:~# appeared — identical behavior to QEMU
--------
Problem
StarryOS running on Machina displayed the shell prompt (starry:~#) but refused all keyboard input. The same kernel binary worked perfectly on QEMU with full interactive shell capability.
Diagnosis
A systematic trace through the entire input pipeline — from host stdin to guest interrupt delivery — revealed:
1. Stdin → UART path: OK. The StdioChardev thread successfully read bytes from stdin and delivered them to Uart16550::receive(). The UART FIFO accumulated bytes, LSR_DR was set, and IER had bit 0 (RX interrupt) enabled.
2. UART → PLIC path: OK. update_irq() correctly computed IIR_RX_AVAIL, set irq_pending = true, and called irq_line.set(true) which invoked PlicIrqSink::set_irq(10, true).
3. PLIC → CPU path: OK. After StarryOS configured PLIC source 10 (priority=6, enabled for S-mode context), update_outputs() correctly asserted the S-mode external interrupt line, setting bit 9 (SEI) in shared_mip.
4. Root cause: interrupt storm in PLIC complete_irq. The CPU was receiving and handling the external interrupt. However, ArceOS's interrupt handler uses a deferred-processing model: the PLIC claim/complete cycle runs in the interrupt handler, but the actual UART read (try_receive_bytes) is performed by a separate tty-reader task woken via register_irq_waker. The handler does not read the UART RBR register to clear the interrupt source.
Machina's PLIC implemented strict level-triggered resample semantics: complete_irq() checked source_level[irq] and, if still asserted, immediately re-set the pending bit. Since the UART IRQ line remained high (LSR_DR still set, byte unread), this caused an infinite interrupt loop:
```
claim(10) → handler (no UART read) → complete(10)
→ source still high → re-pend → SEI re-asserted
→ CPU takes interrupt again → claim(10) → ...
```
The tty-reader task was starved of CPU time and could never run to read the UART and clear the interrupt source.
QEMU's PLIC uses edge-triggered semantics at the claim/complete boundary: it only latches pending on a 0→1 transition of the source wire and does not automatically re-pend on complete. After claim clears pending and complete clears the claim record, the interrupt remains quiescent until the device de-asserts and re-asserts its IRQ line.
Fix
Modified hw/intc/src/plic.rs with two changes to match QEMU's interrupt semantics:
1. set_irq() — rising-edge pending latch:
```
pub fn set_irq(&mut self, source: u32, level: bool) {
let prev = self.source_level[source as usize];
self.source_level[source as usize] = level;
if level && !prev {
// Only latch pending on 0→1 transition
self.set_pending(source, true);
}
self.update_outputs();
}
```
Previously, every call with level=true unconditionally set the pending bit, causing redundant PLIC activations on every update_irq() call from the UART (including TX-complete paths).
2. complete_irq() — no automatic re-pend:
```
pub fn complete_irq(&mut self, context: u32, irq: u32) {
if self.claim[ctx] == irq {
self.claim[ctx] = 0;
}
self.update_outputs();
// Removed: source_level re-pend logic
}
```
Previously, complete_irq checked source_level[irq] and re-set pending if the wire was still high, causing the infinite interrupt loop.
Result
After the fix, the interrupt lifecycle works correctly:
1. UART receives byte → IRQ line 0→1 → PLIC latches pending → CPU takes SEI
2. Handler: claim(10) → wake tty-reader → complete(10) → no re-pend
3. tty-reader task runs → reads UART RBR → FIFO empty → IRQ line goes low (1→0)
4. Next byte arrives → IRQ line 0→1 → new interrupt cycle
Verified with automated testing: echo hello → hello, pwd → /root, confirming full interactive shell functionality on Machina.
------ 1. VirtIO Network Device Implementation Implemented full virtio-net device support for the machina RISC-V emulator across multiple crates: - hw/virtio/src/device.rs (new) — Introduced a VirtioDevice trait to abstract the common interface for different VirtIO device types, decoupling device-specific logic from the MMIO transport layer. - hw/virtio/src/net.rs (new) — Implemented VirtioNet with Linux TAP backend, including tap_open() for host TAP device creation, parse_mac() for MAC address parsing, full VirtioDevice trait implementation (device ID, feature negotiation, config space for MAC/status/max_virtqueue_pairs), TX path via handle_queue(), and RX path via standalone fill_rx_queue(). - hw/virtio/src/block.rs — Refactored VirtioBlk to implement the new VirtioDevice trait. - hw/virtio/src/mmio.rs — Refactored VirtioMmio transport to be generic over Box<dyn VirtioDevice>, added an asynchronous RX polling thread for incoming TAP packets. - core/src/machine.rs — Added NetdevOpts struct and netdev field to MachineOpts. - src/main.rs — Implemented QEMU-compatible CLI parsing for -netdev tap,id=...,ifname=... and -device virtio-net-device,netdev=...[,mac=...]. - hw/riscv/src/ref_machine.rs — Integrated virtio-net into the reference RISC-V machine with dedicated MMIO region (0x10002000), interrupt line, and FDT node generation. 2. Comprehensive Unit Tests (25 tests) Wrote 25 unit tests covering: - parse_mac — valid input, edge cases (all-FF, too few/many octets, bad hex, empty string) - VirtioDevice trait — device ID, queue count, feature bits (VERSION_1, MAC, STATUS) - Config space — MAC byte/u16 reads, link status, max virtqueue pairs, out-of-range access - RX path — single packet injection, no-descriptor case, chained descriptors, sequential multi-packet - TX path — single packet, chained header+data, multiple packets, queue 0 no-op verification - Round-trip — RX inject then TX forward, verifying end-to-end data integrity Used libc::pipe to simulate a TAP device in user-space without requiring privileges. 3. Test Infrastructure Investigation Analyzed machina's testing methodology — central machina-tests crate in tests/src/ plus per-crate #[cfg(test)] modules — and verified all tests (1139 existing + 25 new) pass with cargo test --workspace. 4. Test Migration to Project Convention Migrated all 25 virtio-net tests from the inline #[cfg(test)] mod tests block in hw/virtio/src/net.rs to a dedicated tests/src/virtio_net.rs file in the central machina-tests crate, following the project's established convention. This involved: - Creating tests/src/virtio_net.rs with proper external crate imports replacing super::* - Registering the new module in tests/src/lib.rs - Removing ~680 lines of test code from production source net.rs - Locally defining VirtIO spec constants (VIRTIO_NET_F_MAC, VIRTIO_NET_F_STATUS) that were private in the source crate Final result: 1164 tests passing, 0 failures across the entire workspace.
--------- Bug Machina's virtio-net device declared `VIRTIO_F_VERSION_1` (modern VirtIO) but used a header size of **10 bytes** instead of the correct **12 bytes** required by VirtIO v1. Root Cause The VirtIO v1 (modern) `virtio_net_hdr` includes a `num_buffers` field not present in the legacy format: | Field | Size | |-------|------| | flags | 1 | | gso_type | 1 | | hdr_len | 2 | | gso_size | 2 | | csum_start | 2 | | csum_offset | 2 | | **num_buffers** (v1 only) | **2** | | **Total** | **12** | `VIRTIO_NET_HDR_SIZE` was set to 10, causing both TX and RX paths to miscalculate the boundary between the virtio header and the Ethernet frame. Symptom - **TX**: Only 10 bytes were skipped before writing to TAP, so 2 extra bytes (the `num_buffers` field) leaked into the Ethernet frame, shifting the entire frame by 2 bytes. `tcpdump` showed all outgoing frames with `ethertype Unknown (0x3456)` — the last 2 bytes of MAC address `52:54:00:12:34:56` being misinterpreted as EtherType. - **RX**: The host never replied because it couldn't parse the malformed frames. Additionally, RX injection prepended a 10-byte header instead of 12, corrupting incoming frames for the guest driver. - **Result**: 100% packet loss on ping. Fix Single-line change in hw/virtio/src/net.rs: ```rust // Before: pub const VIRTIO_NET_HDR_SIZE: usize = 10; // After: pub const VIRTIO_NET_HDR_SIZE: usize = 12; ``` Additional Fixes Applied During Investigation 1. **`tx_one()` return value**: Was always returning `0` regardless of bytes written. Fixed to return the actual byte count (used by the virtqueue used ring). 2. **TAP fd non-blocking**: Added `O_NONBLOCK` to `tap_open()` so the RX poll loop works correctly with the non-blocking read pattern in `rx_loop()`. Verification After the fix, `tcpdump` correctly shows ARP request/reply and ICMP echo request/reply. Guest ping to host (`10.0.2.1`) succeeds with 0% packet loss. ``` sudo tcpdump -i tap0 -n -c 20 sudo ./runmachina-linux.sh 2>machina-debug.log ip link set eth0 up ip addr add 10.0.2.15/24 dev eth0 ping -c 3 10.0.2.1
Merge three correctness fixes from #19: 1. FPU mstatus init (csr.rs): set FS=Initial (bit 13) at reset so FP instructions are legal when F/D extensions are present. Without this, any OS using FPU traps on the first FP instruction. Matches QEMU reset behaviour. 2. UART 16550 loopback/modem (uart.rs): implement MCR loopback mode (THR output routed back to RX FIFO) and MSR signal lines (CTS/DSR/DCD). Some kernels check modem status before using the UART. 3. PLIC edge-triggered IRQ (plic.rs): change set_irq to only latch pending on rising edge (0->1 transition). complete_irq no longer auto-re-pends based on wire level. This matches QEMU semantics and prevents interrupt storms when the guest defers source-clearing to a bottom-half/task. Update PLIC and ref_machine tests for edge-triggered semantics: lowering a source no longer clears pending (only claim does). Signed-off-by: Yu CHEN <yuchen@tsinghua.edu.cn> Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
|
Thanks for the contribution! The following fixes from this PR have been ported into #23: Merged:
Not merged (will be separate PRs):
These are larger features that need independent review and testing. |
--------- Root Causes Two independent bugs caused 6 test failures: Bug 1: PLIC level-triggered interrupt semantics (hw/intc/src/plic.rs) The PLIC set_irq method only latched the pending bit on a rising edge (0→1) but never cleared it on a falling edge (1→0). Additionally, complete_irq did not re-pend the interrupt when the source wire was still asserted. This meant: - After a device de-asserted its IRQ line, the PLIC output stayed high because the pending bit was never cleared. - After claim+complete with the source still active, no new interrupt was generated. This broke 5 tests: test_plic_set_irq_propagates, test_plic_level_triggered_resample, test_irq_updates_cpu_mip, test_ref_machine_irq_wiring, and test_uart_rx_irq_to_plic. Fix: - set_irq: Added falling-edge handling — when level transitions from high to low, clear the pending bit. - complete_irq: After clearing the claim, if the source wire is still asserted, re-latch the pending bit (level-triggered resample). Bug 2: Hardcoded header size in test (tests/src/virtio_net.rs) VIRTIO_NET_HDR_SIZE was changed from 10 to 12, but test_net_rx_chained_descriptors still used hardcoded [0u8; 10] and offset 10 instead of the constant. This caused a length mismatch (12 vs 10 bytes) in the assertion. Fix: Replaced hardcoded values with VIRTIO_NET_HDR_SIZE and derived offsets from the constant. Result All 1164 tests pass, 0 failures.
… risdv64-elf-ubuntu-24.04-gcc
machina can run Linux-6.12.51