Does a passing Spike-vs-RTL lockstep mean the core is ISA-compliant?

No. A lockstep run only proves that, for the instructions the test program actually executed, the core's retire trace matched the Spike reference model. It says nothing about instructions, operand values, or corner cases the program never reached. ISA compliance is a coverage claim over the whole instruction set and requires a dedicated compliance methodology and a reviewed reference; a clean lockstep over one program is bounded evidence for that executed window, not a completeness or signoff statement.

What is RVFI and why is it used for lockstep comparison?

RVFI, the RISC-V Formal Interface, is a standard set of retire-channel signals a core exposes when an instruction commits: valid, instruction order, the instruction word, register reads and writes, the program counter before and after, and any memory access. It gives a tool a uniform, per-instruction view of what the core actually did, which is exactly the granularity a reference model produces — so RVFI is the natural contract for comparing a DUT against Spike instruction by instruction.

What is a reference model in this context?

A reference model is an independent implementation of the same specification used as the golden answer. For RISC-V, Spike (the official ISA simulator) executes the same program and emits its own retire trace. Lockstep checks the DUT's trace against the reference's trace step by step. The strength of the evidence depends on the reference being trustworthy and on the stimulus exercising the behavior you care about; agreement on a program neither one stresses proves little.

Where does lockstep fit in a verification flow?

Lockstep is a strong functional check that complements, but does not replace, the rest of the flow: lint and CDC catch structural issues, synthesis confirms the RTL maps to gates, directed and random simulation drive scenarios, and RVFI-based formal proofs (riscv-formal) bound behavior over all inputs within a depth. Lockstep ties execution to a golden model for the programs you run; combining it with coverage tells you how much of the design that execution actually touched.

Spike-vs-RTL lockstep: what it proves, and what it doesn’t

Lockstep co-simulation is one of the most convincing functional checks you can run on a RISC-V core: run the same program on a trusted reference model and on your RTL, and compare what each instruction did, in order. When the two retire traces agree, that is strong evidence the core executed that program correctly. The trap is in the phrasing — a passing lockstep is bounded evidence for the executed window, not a proof of ISA completeness and not a signoff. This guide walks through how it works and is deliberate about that boundary.

Run RISC-V evidence on your core Try the instant demo

The reference model: Spike

A reference model is an independent implementation of the same specification, used as the golden answer. For RISC-V that is Spike, the official ISA simulator: it executes a program and reports, for every committed instruction, the program counter, the instruction word, which registers were read and written, and any memory access. Spike is upstream and maintained by the RISC-V project — it is the oracle, not something ChipVerify authors. The evidence is only as good as the reference is trustworthy, so the reference model and its version travel with the result.

RVFI: the comparison contract

To compare a core against Spike you need the core to report what it did at the same granularity the reference does. That is exactly what RVFI (the RISC-V Formal Interface) standardizes: a per-instruction retire channel — rvfi_valid, rvfi_order, the instruction word, register read/write addresses and data, the program counter before and after, trap, and memory access. With RVFI in place, the DUT emits a retire trace in the same shape as Spike’s, and the comparison becomes a row-by-row check.

Retire-trace comparison, step by step

In lockstep, both the reference and the DUT execute the same program. On each retired instruction the harness lines up the two traces by instruction order and compares the fields: same PC, same decoded instruction, same architectural register write, same memory effect. The first row where they disagree is the failure — and it is precise, because it points at the exact instruction and the exact field that diverged, not a vague downstream symptom many cycles later.

Compared per instruction	A mismatch usually means
Program counter (pc_rdata / pc_wdata)	Wrong branch/jump target or fetch fault
Decoded instruction word	Fetch/alignment or decode bug
Register write addr + data	ALU, forwarding, or writeback error
Memory address + read/write data	Load/store unit or ordering bug

What a passing lockstep does not prove

This is the part worth being blunt about. A green lockstep run says: for the instructions this program executed, with these operands, the core agreed with the reference. It does not say:

that the core is ISA-complete — instructions, operand ranges, and corner cases the program never reached are simply untested.
that it is a compliance result — an ISA compliance claim is a coverage statement over the whole specification and needs its own methodology and reviewed reference.
that it is a signoff — functional agreement on an executed window is evidence, not a tapeout sign-off, which also spans timing, power, and physical verification.

The honest framing is “reference-model agreement for the program we ran,” and it is only meaningful alongside the question how much did that program touch? — which is what coverage answers. A clean lockstep over a program that exercises a sliver of the core is a small piece of evidence, not a large one.

Where lockstep fits in the flow

Lockstep complements the other evidence rather than replacing it. Lint and CDC catch structural problems; synthesis confirms the RTL maps to gates; directed and random simulation drive scenarios; and RVFI-based formal proofs from upstream riscv-formal bound behavior over all inputs to a depth. Lockstep ties execution to a golden model for the programs you actually run, and pairs best with functional coverage so you know how much of the design that execution reached. For the integrated-core view of these engines, see the RISC-V pre-signoff evidence guide.

Run RISC-V evidence on your core

Sign in and ChipVerify AI runs lint, CDC, synthesis, simulation, and the RVFI/riscv-formal readiness scaffold on an integrated RISC-V core — pre-signoff evidence about the RTL, framed honestly. Reference-model lockstep is reference-model agreement for the executed program, not ISA compliance or a foundry signoff.

Run this check free on your RTL Try the instant demo