Virtus Cyber Academy

RV32I-Lite Encoding Card

2,284 words · ~10 min read
Markdown PDF

VCA-CSA-101 cross-chapter quick-reference handout. Anchors: §4.5-§4.10; spec source: 5).

Purpose: complete RV32I-Lite ISA encoding reference for hand-encoding, hand-decoding, and assembler/disassembler verification. Print double-sided; pin to the wall during Labs 4.1-4.5, Ch 6 assembler work, and Ch 6a linker relocation work. Every byte your CPU fetches (on Tang Primer 25K canonical Phase-1 silicon, or Tang Nano 20K advanced-track silicon) is encoded by this card; every byte riscv32-unknown-elf-objdump disassembles agrees with it.


At a glance

Property Value
Base RV32I subset (deliberate; everything emitted is real RV32I-legal)
Instructions 11 real + 9 pseudo (8 expand to one real instruction; la expands to two via R_VIRTUS_LA_GP12 reloc. See §The 9 pseudo-instructions)
Formats 4 of RV32I's 6, R, I, S, B
Word size 32 bits (every instruction is exactly 4 bytes)
Endianness Little-endian
Registers 8 general-purpose (x0-x7); x0 hardwired to zero
Alignment All instructions 4-byte aligned; all data accesses word-aligned
Branch range ±4 KiB (B-format 13-bit signed offset, 2-byte units)
Compatibility Bit-for-bit compatible with full RV32I. riscv32-unknown-elf-as accepts our source; riscv32-unknown-elf-objdump decodes our binaries

Register file

RV32I-Lite has 8 registers (full RV32I has 32; we use a strict subset). The ABI mnemonics below match full-RV32I conventions, so RV32I-Lite source is directly readable as RV32I source.

Register ABI name Saver Role in CSA-101 / Virtus OS v1
x0 zero (hardwired) Hardwired to 0; reads return 0; writes discarded
x1 ra Caller Return address (set by jalr linkage; consumed by ret pseudo)
x2 sp Caller Stack pointer; descending; word-aligned
x3 gp - Global pointer; reserved by ABI; unused in Virtus OS v1
x4 tp - Thread pointer; reserved by ABI; unused in Virtus OS v1
x5 t0 Caller Temporary / argument 0 / scratch
x6 t1 Caller Temporary / argument 1 / scratch
x7 t2 Caller Temporary / argument 2 / scratch

Encoding: any register field (5 bits) takes values 00000-00111 for x0-x7. Values 01000-11111 are reserved for full-RV32I registers x8-x31 and never emitted by RV32I-Lite code.

Register convention (CSA-201 forward-compatible)

The convention is committed in vm/protocol.py:14-39 (canonical block comment; ground-truth source). Ch 8 §8.6.2 promotes it to chapter prose.

RV32I-Lite emits a strict subset of the wider RV32I ABI; the wider classes are forward-compatible reservations.

Class Members in CSA-101 emit Reserved (CSA-201+) Caller's responsibility Callee's responsibility
Caller-clobbered ("temporary") t0, t1, t2 t3-t6 Save before call if value needed after None. May overwrite freely
Callee-preserved ("saved") (none in current emit) s0-s11 None. Assume preserved across call Save to stack before use; restore before return
Argument / return (passed via stack. See vm-segment-cheat-sheet) a0-a7 (arg); a0 (return) Push args before call; pop result after Read args from stack; push result before return
Stack pointer sp (= x2) - None Restore sp to ARG + 4 before return (per Ch 8 §8.7 step 4)
Global pointer gp (= x3) - None Never write gp. Only the linker prologue (linker/prologue.py, R7.2-α) writes it; user code reads through segment-pointer slots at gp+0..0x10
Return address (delivered via stack push. See Ch 8 §8.6) ra (= x1) Push return-label before call Pop into temp; jalr to it
Thread pointer (not used) tp (= x4) - -

Why students rarely see this directly. Jack-emitted programs go compiler → translator → assembler → linker → silicon, and the register convention is consumed entirely between translator and assembler, by the time bytecode reaches the assembler, the register choices are already baked into emission templates. The Jack programmer never picks t0 or t1; the translator does. The convention only becomes visible to the student in three places: Lab 8.2 (writing the translator that picks the registers), Lab 8.4 (gdb session against running silicon. info registers shows t0/t1/t2 holding scratch values), and CSA-201's inline-asm / hand-rolled-extension labs (where the student honors the convention themselves).

Cross-references: Ch 8 §8.6.2 (chapter-prose enumeration); vm/protocol.py:14-39 (canonical source); cross-chapter-vm-segment-cheat-sheet.md (saved-frame layout); cross-chapter-instr-mem-layout.md (gp+0..0x10 segment-pointer slots + gp+0x40..0x3FF la-ptr-table reserve).


The hardwired-zero trick. Because x0 always reads as 0:

Each pseudo costs zero opcodes; the hardwired zero buys an entire family of conventional-looking instructions for free.


The 11 real instructions

R-format. Register-register arithmetic (5 instructions)

opcode = 0110011 for all five. Differentiated by funct3 and funct7.

Mnemonic Action funct7 funct3 opcode Hex template
add rd, rs1, rs2 rd = rs1 + rs2 0000000 000 0110011 0x00..0033
sub rd, rs1, rs2 rd = rs1 - rs2 0100000 000 0110011 0x40..0033
and rd, rs1, rs2 rd = rs1 & rs2 0000000 111 0110011 0x00..7033
or rd, rs1, rs2 rd = rs1 | rs2 0000000 110 0110011 0x00..6033
xor rd, rs1, rs2 rd = rs1 ^ rs2 0000000 100 0110011 0x00..4033

Note: add and sub differ only in the high bit of funct7. In silicon: one bit selects the adder's carry-in (0 = add, 1 = subtract via two's-complement rs1 + ~rs2 + 1).

I-format. Register-immediate, loads, jalr (3 instructions)

Mnemonic Action funct3 opcode Hex template
addi rd, rs1, imm12 rd = rs1 + sext(imm12) 000 0010011 0x..0013
lw rd, imm12(rs1) rd = M32[rs1 + sext(imm12)] 010 0000011 0x..2003
jalr rd, rs1, imm12 t = rs1 + sext(imm12); rd = PC+4; PC = t & ~1 000 1100111 0x..0067

Immediate range: −2048 to +2047 (12-bit signed). Sign-extended at decode.

Word alignment: lw's effective address (rs1 + imm12) must be a multiple of 4. Misaligned loads trap on real silicon (in CSA-101, behavior is undefined; CSA-201's PMP traps on misalignment).

jalr low-bit force-zero: the target address has its low bit cleared, regardless of rs1+imm12. This preserves 2-byte alignment for the C extension (which RV32I-Lite does not use, but the encoding preserves bit-for-bit).

S-format. Stores (1 instruction)

Mnemonic Action funct3 opcode Hex template
sw rs2, imm12(rs1) M32[rs1 + sext(imm12)] = rs2 010 0100011 0x..2023

Operand-order quirk: sw rs2, offset(rs1) lists the source register first in assembly source. (RV32I convention; not RV32I-Lite invention.)

Immediate split: the 12-bit immediate is split. imm[11:5] at bits [31:25], imm[4:0] at bits [11:7]. The split preserves the position of rs1 and rs2 so register-file read ports do not need format-aware muxing.

B-format. Conditional branches (2 instructions)

Mnemonic Action funct3 opcode Hex template
beq rs1, rs2, label if rs1 == rs2 then PC += sext(imm13) 000 1100011 0x..0063
bne rs1, rs2, label if rs1 != rs2 then PC += sext(imm13) 001 1100011 0x..1063

13-bit signed offset, 2-byte units → ±4 KiB reachable from the branch instruction. Bit 0 of the offset is forced to zero (preserved for C extension compatibility).

The B-format immediate shuffle is the most heavily-shuffled in RV32I:

The shuffle preserves rs1 at [19:15] and rs2 at [24:20].


Bit-field layouts

Each format is a 32-bit word. Bit 31 (MSB) is on the left. Register fields are 5 bits; funct3 is 3 bits; funct7 is 7 bits; opcode is 7 bits.

bit:  31      25 24    20 19    15 14  12 11      7 6        0
      ┌─────────┬────────┬────────┬──────┬─────────┬──────────┐
   R:  funct7    rs2     rs1   funct3   rd      opcode  
      ├─────────┴────────┼────────┼──────┼─────────┼──────────┤
   I:      imm[11:0]      rs1   funct3   rd      opcode  
      ├─────────┬────────┼────────┼──────┼─────────┼──────────┤
   S: imm[11:5]  rs2     rs1   funct3imm[4:0]   opcode  
      ├─┬───────┼────────┼────────┼──────┼───────┬─┼──────────┤
   B: simm10:5  rs2     rs1   funct3imm 4:1b  opcode  
      └─┴───────┴────────┴────────┴──────┴───────┴─┴──────────┘
       31 30..25 24..20    19..15  14..12 11..8  7   6..0
       ^                                          ^
       imm[12]                                    imm[11]

Cross-format invariants (the entire reason CPU decode is cheap):


Hand-encoding checklist

For any instruction:

  1. Identify the format from the opcode/family table above.
  2. Place each field at its bit position in the format layout.
  3. Concatenate to 32 bits.
  4. Group into nibbles for hex.
  5. Verify with printf "<asm>" | riscv32-unknown-elf-as -march=rv32i -o /tmp/t.o - then riscv32-unknown-elf-objdump -d /tmp/t.o.

Worked: addi x1, x0, 5

 imm[11:0]    rs1   funct3  rd     opcode
 000000000101 00000 000     00001  0010011

Worked: sw x6, 12(x2)

 imm[11:5] rs2   rs1   funct3 imm[4:0] opcode
 0000000   00110 00010 010    01100    0100011

Worked: beq x5, x0, +8 (branch forward 8 bytes)

 imm[12] imm[10:5] rs2   rs1   funct3 imm[4:1] imm[11] opcode
 0       000000    00000 00101 000    0100     0       1100011

The 9 pseudo-instructions

Eight pseudos expand to one real instruction at assembly time. The ninth (la) is a CSA-101-specific carve-out that expands to two instructions plus a linker relocation. See the note immediately after the table. No pseudo has its own opcode. The student writes the pseudo; the assembler emits the real bytes; the disassembler may display either form (per --no-aliases flag).

Pseudo Expansion Use
nop addi x0, x0, 0 No-op; padding; pipeline stall
mv rd, rs addi rd, rs, 0 Copy register (RV32I has no dedicated move)
li rd, imm (imm fits in 12 bits) addi rd, x0, imm Load small immediate (range −2048..+2047)
neg rd, rs sub rd, x0, rs Arithmetic negation: rd = -rs
beqz rs, label beq rs, x0, label Branch if rs == 0
bnez rs, label bne rs, x0, label Branch if rs != 0
ret jalr x0, x1, 0 Return from subroutine
jr rs jalr x0, rs, 0 Jump to register; no link
la rd, sym addi rd, gp, off12 + lw rd, 0(rd) (see note below) Materialise the address of a .data-resident symbol via a linker-resolved 12-bit offset off gp (CSA-101 carve-out. Non-standard)

Note on la (CSA-101 carve-out. Non-standard expansion). Standard RV32I expands la rd, sym to auipc rd, %hi(sym) + addi rd, rd, %lo(sym). CSA-101 has neither auipc nor lui, so it cannot synthesise a 32-bit absolute address from a 20-bit upper-immediate. Instead, the linker emits a per-symbol pointer-table entry in .data, anchored off gp (the global pointer). The assembler emits a placeholder addi rd, gp, 0 + lw rd, 0(rd) pair carrying a single R_VIRTUS_LA_GP12 relocation; the linker patches the 12-bit immediate to the symbol's pointer-table offset. The la-ptr-table itself lives at gp+0x40 (= .data + 0x40); the leading 64 bytes (gp+0x00..gp+0x3F) are reserved for the VM segment-pointer region (LCL_addr / ARG_addr / THIS_addr / THAT_addr at gp+0x00..gp+0x0F + temp[0..7] at gp+0x10..gp+0x2F per cross-chapter-vm-segment-cheat-sheet.md), with a 16-byte alignment cushion at gp+0x30..gp+0x3F for future segment additions. This places the slot capacity at (2048 − 0x40) / 4 = 496 la-references per program. Adequate for any CSA-101 program. This is a CSA-101 simplification, CSA-201 reverts to the standard auipc+addi expansion once U-format lands. The pointer-table approach (Option A) was selected; the alternative of a dedicated la-base register (Option B) was deferred.

Disassembler display. objdump recognises the pseudo patterns and displays them by default:

$ printf 'ret\n' | riscv32-unknown-elf-as -march=rv32i -o /tmp/t.o -
$ riscv32-unknown-elf-objdump -d /tmp/t.o
   0:   00008067    ret           # default: pseudo form
$ riscv32-unknown-elf-objdump -d --no-aliases /tmp/t.o
   0:   00008067    jalr  zero,ra,0   # literal underlying instruction

Ghidra always shows the literal underlying instruction. Decompiler is conservative; pseudos are display-layer convenience, not semantic content.


What's deliberately NOT in RV32I-Lite

The full RV32I base ISA has 47 instructions; RV32I-Lite has 11. Each missing instruction is a deliberate teaching choice, the cost of its absence is felt in CSA-101, recovered in CSA-201.

Missing instructions (from full RV32I base)

Instruction(s) What it does Cost in CSA-101 Returns in CSA-201
jal rd, imm21 J-format unconditional jump-and-link with 20-bit PC-relative offset All control transfer must use jalr (register-indirect) or beq x0, x0 (always-taken branch) CSA-201 §1. Adds J-format
lui rd, imm20 Load upper immediate (high 20 bits) Cannot materialise 32-bit immediates in one instruction; large constants go through .data-resident pointers via la pseudo CSA-201 §1. Adds U-format
auipc rd, imm20 Add upper immediate to PC; produces PC-relative 32-bit address Cannot do PC-relative 32-bit addressing; cross-section calls use linker-resolved indirection CSA-201 §1. Adds U-format
slt, slti, sltu, sltiu Set if less than (signed/unsigned, register/immediate) lt/gt comparisons require sub + sign-bit extraction + branch (~12 instructions per lt/gt) CSA-201 §2. Adds set-less-than family
blt, bge, bltu, bgeu Signed/unsigned compare-and-branch (less-than, greater-or-equal) Inequality branches synthesised from sub + beq/bne (or compose via slt once it lands) CSA-201 §2
xori, andi, ori Immediate bitwise (XOR/AND/OR with sign-extended 12-bit immediate) Cannot mask/toggle immediate bits in one instruction; must materialise constant first CSA-201 §1
slli, srli, srai Shift left/right (logical/arithmetic) by immediate "Shift by k" implemented as k self-adds (add rd, rd, rd repeated); 1-cycle shift becomes k-cycle loop CSA-201 §1
sll, srl, sra Shift left/right by register Same. Software-loop shift CSA-201 §1
lh, lhu, lb, lbu Half-word and byte loads (signed/unsigned) All loads are word loads; byte access requires lw + mask + shift CSA-201 §1
sh, sb Half-word and byte stores All stores are word stores; byte modify requires read-modify-write CSA-201 §1 (the framebuffer's RMW hazard from Ch 12 §12.6.5 lands here)
fence, fence.i Memory ordering / instruction-fence Single-threaded, single-core, no caches. Fences are no-ops in CSA-101 CSA-201 §6 (when preemption arrives)
ecall, ebreak Environment call (syscall trap) / debugger trap No supervisor mode; OS services called via direct jalr CSA-201 §2. Adds privilege levels + traps
CSR instructions (csrrw etc.) Control-status register read/write No CSRs in CSA-101 (no privilege state, no machine-mode bookkeeping) CSA-201 §2

Missing extensions

Extension What it adds When it returns
M extension (mul, mulh, div, rem) Hardware multiply and divide CSA-201 §3, Math.lib's ~1000-cycle software multiply becomes a 1-cycle mul; the gap is the speedup the student personally measures
A extension (lr.w, sc.w, atomic RMW) Atomic load-reserved/store-conditional; atomic arithmetic CSA-201 §6, when preemption arrives and the framebuffer's RMW hazard becomes real
F extension (single-precision FP) IEEE 754 binary32 add/sub/mul/div/sqrt con-101 (Virtus Console retro-FPU course), RV32I-Lite has no FPU
D extension (double-precision FP) IEEE 754 binary64 (not in CSA-201 either; advanced electives only)
C extension (compressed) 16-bit instruction encodings interleaved with 32-bit Out of scope across CSA-101 + CSA-201 (the encoding preserves C-extension compatibility (branches are 2-byte aligned) but no 16-bit instructions are emitted)
Zicfilp/Zicfiss (control-flow integrity) Forward-edge type tags + backward-edge shadow stack CSA-201 §8. Closes the CFI gap from Ch 12 §12.11

Pseudo-instructions deferred (need missing instructions)

Pseudo Expansion Why deferred
not rd, rs xori rd, rs, -1 Needs xori
seqz rd, rs sltiu rd, rs, 1 Needs sltiu
snez rd, rs sltu rd, x0, rs Needs sltu
blez rs, l bge x0, rs, l Needs bge
bgez rs, l bge rs, x0, l Needs bge
bltz rs, l blt rs, x0, l Needs blt
bgtz rs, l blt x0, rs, l Needs blt
j label jal x0, label Needs jal (J-format)
call label auipc x1, ... + jalr x1, ... Needs auipc (U-format)
tail label auipc x6, ... + jalr x0, ... Needs auipc
nop-N (multi-cycle pad) N × addi x0, x0, 0 Available in CSA-101 (just N nops); not a single pseudo

(la rd, sym is a CSA-101 pseudo. See §The 9 pseudo-instructions above. Its CSA-101 expansion is non-standard.)


Identifier syntax (labels, symbol names)

The assembler accepts the following character classes in identifiers (label definitions, symbol references, directive arguments):

Position Allowed characters
First character A-Z, a-z, _, .
Continuation A-Z, a-z, 0-9, _, ., $

Why $ is admitted in continuation. The Ch 7/8 VM translator emits per-function label namespacing in the form <funcname>$<label>. E.g. Main.run$IF_FALSE_0, Main.run$WHILE_TOP_2. This is the Jack/Hack convention preserved in the Virtus VM (Ch 8 §8.2): the $ character disambiguates a function-local label from the function's own name without requiring per-function symbol tables in the assembler. Since Ch 4 doesn't pin identifier syntax (Lab 4.x labels are simple alphanumeric) and Ch 6 introduced labels without enumerating $, this carve-out was confirmed when the VM translator's $-bearing labels first reached the assembler (control-flow inside function).

Why . is admitted everywhere. Ch 11 stdlib service mangling (Output.printString, Math.multiply) uses . as the class-method delimiter; the linker resolves these symbols verbatim against the stdlib roster. Same convention as full RV32I.


Cross-format common-bit-position reference

For the decoder you build in Ch 5 (and for hand-decoding in Lab 4.2):

Read these bits unconditionally, every cycle, regardless of format:

  bit [6:0]     opcode             (always)
  bit [11:7]    rd  OR  imm[4:0]   (R/I have rd; S has imm-low; B has imm[4:1]+imm[11])
  bit [14:12]   funct3              (when format has it)
  bit [19:15]   rs1                 (always - even formats with no rs1 read here harmlessly)
  bit [24:20]   rs2                 (always - same)
  bit [31:25]   funct7 OR imm[11:5] (R has funct7; S has imm-high; B has imm[12]+imm[10:5])
  bit [31]      sign bit of any immediate (sign-extender driven unconditionally from this bit)

Format-disambiguation by opcode (the only field whose meaning is format-independent):

opcode Format Family
0110011 R Register-register arithmetic (add/sub/and/or/xor)
0010011 I Register-immediate arithmetic (addi)
0000011 I Loads (lw)
0100011 S Stores (sw)
1100011 B Branches (beq/bne)
1100111 I jalr
0000000 (illegal in RV32I-Lite) The all-zeros word, Lab 4.2's HALT-trap test case

Verification. Bit-identical against the GNU toolchain

Every byte RV32I-Lite emits is bit-identical to what riscv32-unknown-elf-as -march=rv32i emits for the same source. This is the chapter's central claim and the chapter's reproducibility test.

# Encode the same instruction by hand and via GNU as
$ printf 'addi x1, x0, 5\n' | riscv32-unknown-elf-as -march=rv32i -o /tmp/gnu.o -
$ riscv32-unknown-elf-objdump -d /tmp/gnu.o
   0:   00500093    addi  ra,zero,5

# Hand-encoded: 0x00500093 - match

The chapter's Lab 4.4 ("hand-encoded vs GNU-emitted bit-comparison") is graded on this property. The Virtus subset is real RISC-V, not a Virtus fork. Disassembly works in objdump, in Ghidra, and in Compiler Explorer.


Where to read more