Spike-vs-RTL lockstep: what it proves, and what it doesn’t
Lockstep co-simulation is one of the most convincing functional checks you can run on a RISC-V core: run the same program on a trusted reference model and on your RTL, and compare what each instruction did, in order. When the two retire traces agree, that is strong evidence the core executed that program correctly. The trap is in the phrasing — a passing lockstep is bounded evidence for the executed window, not a proof of ISA completeness and not a signoff. This guide walks through how it works and is deliberate about that boundary.
The reference model: Spike
A reference model is an independent implementation of the same specification, used as the golden answer. For RISC-V that is Spike, the official ISA simulator: it executes a program and reports, for every committed instruction, the program counter, the instruction word, which registers were read and written, and any memory access. Spike is upstream and maintained by the RISC-V project — it is the oracle, not something ChipVerify authors. The evidence is only as good as the reference is trustworthy, so the reference model and its version travel with the result.
RVFI: the comparison contract
To compare a core against Spike you need the core to report what it did at the same granularity the reference does. That is exactly what RVFI (the RISC-V Formal Interface) standardizes: a per-instruction retire channel — rvfi_valid, rvfi_order, the instruction word, register read/write addresses and data, the program counter before and after, trap, and memory access. With RVFI in place, the DUT emits a retire trace in the same shape as Spike’s, and the comparison becomes a row-by-row check.
Retire-trace comparison, step by step
In lockstep, both the reference and the DUT execute the same program. On each retired instruction the harness lines up the two traces by instruction order and compares the fields: same PC, same decoded instruction, same architectural register write, same memory effect. The first row where they disagree is the failure — and it is precise, because it points at the exact instruction and the exact field that diverged, not a vague downstream symptom many cycles later.
| Compared per instruction | A mismatch usually means |
|---|---|
| Program counter (pc_rdata / pc_wdata) | Wrong branch/jump target or fetch fault |
| Decoded instruction word | Fetch/alignment or decode bug |
| Register write addr + data | ALU, forwarding, or writeback error |
| Memory address + read/write data | Load/store unit or ordering bug |
What a passing lockstep does not prove
This is the part worth being blunt about. A green lockstep run says: for the instructions this program executed, with these operands, the core agreed with the reference. It does not say:
- that the core is ISA-complete — instructions, operand ranges, and corner cases the program never reached are simply untested.
- that it is a compliance result — an ISA compliance claim is a coverage statement over the whole specification and needs its own methodology and reviewed reference.
- that it is a signoff — functional agreement on an executed window is evidence, not a tapeout sign-off, which also spans timing, power, and physical verification.
The honest framing is “reference-model agreement for the program we ran,” and it is only meaningful alongside the question how much did that program touch? — which is what coverage answers. A clean lockstep over a program that exercises a sliver of the core is a small piece of evidence, not a large one.
Where lockstep fits in the flow
Lockstep complements the other evidence rather than replacing it. Lint and CDC catch structural problems; synthesis confirms the RTL maps to gates; directed and random simulation drive scenarios; and RVFI-based formal proofs from upstream riscv-formal bound behavior over all inputs to a depth. Lockstep ties execution to a golden model for the programs you actually run, and pairs best with functional coverage so you know how much of the design that execution reached. For the integrated-core view of these engines, see the RISC-V pre-signoff evidence guide.
Related reading
- RISC-V core: pre-signoff evidence — integrate an upstream core and run the evidence engines on it.
- Integrating an open-source RISC-V core (SERV vs Ibex) — choosing a core, license/provenance, and bus interfaces.
- Functional coverage closure — the question lockstep can’t answer: how much did we touch?
- SystemVerilog assertions — the property style RVFI formal proofs build on.
Run RISC-V evidence on your core
Sign in and ChipVerify AI runs lint, CDC, synthesis, simulation, and the RVFI/riscv-formal readiness scaffold on an integrated RISC-V core — pre-signoff evidence about the RTL, framed honestly. Reference-model lockstep is reference-model agreement for the executed program, not ISA compliance or a foundry signoff.