VCA-CSA-201: Computer Systems Architecture II
CSA-101 closed at the system line: a Tang Primer 25K (canonical Phase-1 silicon) running an OS the
student wrote, on a CPU the student synthesised. Every layer the chapter omitted was named.
CSA-201 pays the bills. Full RV32I (the 11-instruction RV32I-Lite gains
jal in J-format, the M extension's mul/div/rem,
lui and auipc for full 32-bit immediate materialisation, and the privileged
ISA's U/S-mode split). Sv32 paged virtual memory with an MMU. PMP physical memory protection. CSRs
and the ecall trap. A register allocator and a peephole-optimisation pass in the compiler.
The driver-writing track that opens the Peripheral IP Pack's black boxes. The DE10-Nano's
Cyclone V replaces the Tang Primer 25K as the production-scale target where tighter pipelining, larger
BRAM, and external DRAM all become available; CSA-101's Tang silicon (Primer 25K canonical; Nano 20K
advanced-track alt) returns as the comparison benchmark for every measured speedup.
Every cost CSA-101 deliberately paid is now measured against the recovered version.
workbench Tab 3 (yowasp) supports both Tang silicon targets in-browser for pre-flash bitstream sanity-checks before the DE10-Nano-side Quartus work begins; Optional: external DRAM module, logic analyser (Saleae or open-source) (see hardware platform · we update this as the kit firms up)Course Overview
CSA-201 is the academy's Part-II anchor course. It assumes CSA-101's graduates: students
who have personally written, synthesised, and shipped a complete computing stack. The pedagogical
contract is that CSA-201 is comparative anatomy, for every cost CSA-101 paid (no
mul; no register allocator; no MMU; no privilege boundary; no W^X enforcement; no scheduler;
no syscall trap; no filesystem), CSA-201 introduces the recovered version, the student measures the
speedup or hardening, and the unoptimised baseline they personally own makes the measurement
meaningful.
Closes the CSA-101 forward-promises. Lab 7.4's 3-5× translator-bloat
baseline closes against CSA-201's register-allocator chapter. Lab 11.4's 30-80×
compiler-output bloat closes against CSA-201's peephole-optimisation chapter, the inlining pass,
and the optional Compiler Explorer (godbolt.org) production-grade comparison module. Math.lib's
~1,000-cycle multiply closes against CSA-201's M-extension chapter (single-cycle mul;
~1,000× speedup measured on the same Tang Primer 25K bitstream the student carried over from
CSA-101). Virtus OS v1's deliberately-vulnerable
surface (W^X / ASLR / canaries / CFI all absent per Ch 12 §12.11) closes against CSA-201's
mitigation track, which adds each defence one at a time and measures its cycle and code-size cost.
Position relative to peer offerings. CSA-201 is the only formal curriculum at this course that assumes the student personally wrote a 1,500-line RISC-V CPU and a 1,500-line OS in the prior course. University graduate-level computer-architecture courses (MIT 6.823, CMU 18-447, Stanford CS149) cover similar mechanisms but assume the student is reading the architecture, not having personally built the precursor. CSA-201's pace and depth are calibrated against CSA-101's graduates' existing apparatus.
Pedagogy. The three CSA-101 teaching habits continue at advanced depth, with the paired-textbook system carried forward at production-grade depth. The build-it-yourself anchor continues with Patterson & Hennessy's Computer Organization & Design: The Hardware / Software Interface, RISC-V Edition (Morgan Kaufmann; the canonical undergraduate computer-architecture textbook; comparator for the M-extension / privileged-ISA / pipelining modules) and Bryant & O'Hallaron's Computer Systems: A Programmer's Perspective (CSAPP; Addison-Wesley; the canonical systems-programming textbook for the MMU / linker / virtual-memory modules). The down-to-earth-narrative anchor continues with Petzold's CODE at advanced depth (~25 new weaves across CSA-201's chapters; the privileged-ISA chapter touches Petzold on the mainframe-era separation of supervisor and user modes; the MMU chapter touches Petzold on segment and page-table descriptors, CSA-track Petzold weaves are doctrinal per the track-specific-foundational-anchors framework). Tool Journal (~30 new entries: Verilator, Sail RISC-V golden model, riscv-tests, riscv-formal, godbolt.org for godbolt-comparison work, perf, CSR dumpers). Compare and Contrast (full RV32I vs ARM Cortex-A vs MIPS R3000 vs x86_64; the comparison thread carries the 6502 / SB6141 / Apple M-series anchors forward).
Curriculum Outline
Fourteen modules across ~14 weeks. Each module recovers a specific cost CSA-101 deliberately paid.
| Module | Topic | What CSA-101 cost it recovers |
|---|---|---|
| 1 | Full RV32I + M extension | Math.multiply 1,000×1; mul + div + rem as single-cycle |
| 2 | The privileged ISA + ecall trap | U/S split; syscall mechanism; OS-app boundary becomes hardware-enforced |
| 3 | Compiler register allocator | Lab 7.4 baseline; ~3-5× bloat reduction at translator level |
| 4 | Compiler peephole optimisation | Lab 11.4 quantitative-climax baseline; ~1.5-2× reduction |
| 5 | Compiler inlining + constant folding | Library-call overhead from Ch 11; closes the Lab 11.4 5-categories |
| 6 | SSA-IR + Compiler Explorer (godbolt.org) module | Production-grade compiler comparison; observe LLVM optimisations on the same source |
| 7 | Sv32 paged virtual memory + MMU | CSA-101's flat-physical address space; page tables; TLB; VA → PA translation |
| 8 | PMP + W^X enforcement | Ch 12 §12.11 W^X absence; per-region R/W/X bits; classical stack-smash defended |
| 9 | Stack canaries + CFI | Compiler-flag -fstack-protector; CFI shadow stack; Zicfilp/Zicfiss when available |
| 10 | Tracing garbage collection | Ch 12 §12.5.4 omission. Allocator gains GC; mark-and-sweep variant |
| 11 | Preemption + scheduler | Ch 12 §12.1 single-task baseline; round-robin scheduler; context switch cost measured |
| 12 | Driver-writing track | SSD1306 OLED + SD-card SPI + ENC28J60 Ethernet from datasheets. Opens IP-Pack black boxes |
| 13 | External DRAM + filesystem | Tang silicon BRAM ceiling (1,008 Kbit on Primer 25K; 624 Kbit on Nano 20K); SD-card driver from Module 12; FAT16/exFAT walker |
| 14 | Capstone, Virtus OS v2 on DE10-Nano | Production-scale OS with U/S, MMU, PMP, scheduler, FS, ~4,000 lines vs CSA-101's 1,500 |
Learning Outcomes
step-by-step.
- Remember. State the privileged-ISA register set (CSRs), the Sv32 page-table format, the M extension instructions, and the eight RISC-V calling-convention argument-register conventions.
- Understand. Explain why each of CSA-101's deliberate omissions costs what it costs, in cycles, gates, and code size.
- Apply. Implement a register allocator over the CSA-101 compiler's tree-walking emit; measure the bloat-reduction.
- Apply. Implement Sv32 paged virtual memory in HDL on the DE10-Nano; demonstrate VA → PA translation on a running program.
- Apply. Implement PMP regions; demonstrate stack-smash defended at the silicon level.
- Apply. Write a driver from the SSD1306 datasheet; demonstrate output on a real OLED panel.
- Analyze. Use godbolt.org to compare the CSA-101 compiler's output against gcc/clang on the same source; classify each optimisation gap.
- Synthesize. Ship Virtus OS v2: a production-scale OS on DE10-Nano with U/S split, MMU, PMP, scheduler, filesystem, and toggleable mitigations.
Hands-On Labs
Fourteen modules, one capstone. Each lab measures a CSA-101 cost and recovers it.
- Lab 1.1: M extension,
mulsingle-cycle vs Math.multiply 1,000-cycle. Speedup measured. - Lab 2.1:
ecalltrap. First user-to-supervisor transition. Cycle cost measured. - Lab 3.1: register allocator pass added to compiler; emit reduction observed.
- Lab 4.1: peephole pass emits ~30% smaller assembly per the §11.9 5-categories.
- Lab 5.1: inliner pass; library-call overhead measured before/after.
- Lab 6.1: godbolt.org module. Compare CSA-101 compiler output against gcc -O0/-O2/-O3 on identical C source.
- Lab 7.1: Sv32 paged VM running; demonstrate page-fault handler.
- Lab 8.1: PMP-defended stack-smash, the same exploit that landed in Ch 12 §12.11 now traps cleanly.
- Lab 9.1: stack canaries detect return-address overwrite; CFI shadow stack catches ROP.
- Lab 10.1: tracing GC running on Memory.lib; cycle-cost-of-GC measured.
- Lab 11.1: round-robin scheduler with two demo tasks; context-switch cost measured.
- Lab 12.1: SSD1306 OLED driver written from datasheet; output verified.
- Lab 13.1: SD-card filesystem walker reads FAT16 partition.
- Lab 14 (capstone): Virtus OS v2 on DE10-Nano with all of the above integrated.
Assessment
First, your project must work. Virtus OS v2 boots on DE10-Nano; demonstrates U/S transition; MMU translates VA→PA; PMP defends against stack-smash; scheduler runs two tasks; SD-card filesystem mounts. Then we score the report on three dimensions (40/30/30). mitigation depth (40%) · measurement quality of speedups vs CSA-101 baseline (30%) · demo + 6-8 page report covering the 14 modules' recovered costs (30%). B− minimum on Tier 2 for the certificate.
Career Outcomes & Cross-Course Bridges
- → Part-II electives. CSA-201 prereqs the academy's 6 named Part-II electives: VCA-ARM-201 (ARMv8 architecture and Apple-M-series silicon), VCA-NET-201 (advanced networking, networking-track anchor), VCA-EMB-201 (embedded firmware engineering at production scale), VCA-NET-301 (network-protocol reverse engineering), VCA-X86-201 (x86_64 architecture and reverse engineering), VCA-MIPS-201 (MIPS R3000+ in industry contexts).
- → VCA-RE-101. CSA-201 graduates land RE-101's SB6141 binary-analysis with the privileged-ISA and MMU mental models already in place.
- → VCA-AI-301. The XD-strand on-ramp reuses CSA-201's mitigation track to study toggleable defences.
- Industry. Junior CPU architects; OS-kernel engineers; compiler engineers; firmware-security researchers; FPGA engineers at production scale.
Tool Journal: CSA-201 Originating Entries
~30 new tools enter the diary in CSA-201.
- Quartus Prime Lite (DE10-Nano synthesis)
- SignalTap (Cyclone V on-die logic analyser)
- Verilator (production HDL simulation; CSA-101 referenced it as optional, CSA-201 makes it mandatory)
- Sail RISC-V, the official RISC-V golden model in Sail
- riscv-tests, the ISA conformance test suite
- riscv-formal. Formal-verification framework for RISC-V cores
- godbolt.org, Compiler Explorer for production-grade comparison
- perf / perf-stat, Linux performance profiler when the CPU runs Linux
- OpenSBI, the RISC-V supervisor binary interface bootstrap
- U-Boot RISC-V. Bootloader for production-scale Linux
- Linux 6.x RISC-V. Ported and run on student silicon
- BusyBox. User-space for Linux on RISC-V
- perf trace + ftrace. Kernel tracing
- strace (deeper use; CSA-101 used it on Python tooling, CSA-201 uses it on user-mode Linux on student silicon)
- gdb-multiarch + JTAG. Hardware debugger over JTAG
- OpenOCD, on-chip debugger
- QEMU + KVM. Full-system emulation when bare-metal is impractical
- LLDB. Alternative debugger
- RISC-V instruction tracer (custom, optional)
- perf record + flamegraph. Profile-guided understanding
- SSD1306 datasheet workflow, the discipline of driver writing from spec
- SD-card SPI test harness
- ENC28J60 SPI Ethernet test harness
- FAT16 filesystem walker
- cycle-counter telemetry pipeline. Time-series of measured optimisation gains
- PMP region tracer. Observe which regions are R/W/X enabled at runtime
- page-fault tracer. Observe Sv32 page-fault paths
- scheduler-trace tool. Visualise context switches
- checkpatch.pl + clang-format, Linux-kernel style discipline
- buildroot. Full-system rootfs builder
Recommended Readings
Primary anchor pair (continued from CSA-101 at advanced depth)
- David Patterson and John Hennessy, Computer Organization and Design: The Hardware / Software Interface, RISC-V Edition, 2nd ed. Morgan Kaufmann, 2020 (ISBN 978-0-12-820331-6). The build-it-yourself anchor at advanced depth: pipelining, the M extension, exceptions / interrupts, virtual memory, and parallelism on the same RISC-V baseline CSA-101 used. Library-acquire or paperback ~$90-100.
- Charles Petzold, CODE: The Hidden Language of Computer Hardware and Software, 2nd ed. Microsoft Press, 2022. The down-to-earth-narrative anchor (CSA-track-only by design); ~25 new weaves across the CSA-201 chapters at advanced depth. Continues from CSA-101's ~37 weaves.
Module-specific anchors (CSA-201 introduces)
- Randal Bryant and David O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd ed. (CSAPP). Addison-Wesley, 2015. The systems-programming companion at advanced depth; primary anchor for the linker (Module 6), MMU (Module 7), and exception-control-flow (Modules 2 / 11) chapters. Paperback ~$130-160.
- Andrew Waterman and Krste Asanović (eds.), The RISC-V Instruction Set Manual, Volumes I & II. RISC-V International (FREE; current published edition). Module 1 (full RV32I + M extension) and Module 2 (privileged ISA) normative reference.
- Daniel P. Bovet and Marco Cesati, Understanding the Linux Kernel, 3rd ed. O'Reilly. Optional companion for Modules 11 (preemption / scheduler) and 13 (filesystem).
Before You Start
- Have you completed CSA-101 and shipped its capstone? (If no → CSA-101's capstone is central prereq; without it, CSA-201's comparative-anatomy thesis has no baseline.)
- Can you read RISC-V assembly fluently and explain a calling-convention prologue/epilogue? (If no → CSA-101 §4 + §8 review; revisit the Ch 4 lab pack.)
- Have you read the Sv32 page-table format from the privileged spec? (If no → CSA-201 Module 7 prereq reading; pre-read the Sail RISC-V Sv32 model.)
- Are you comfortable installing closed-source tooling (Quartus Prime Lite)? (If no → CSA-201 Module 1 prereq install discipline.)
- Do you have ~$210 for the DE10-Nano + Pi station kit + USB-C + microSD? (If no → HW-101's BoM section + cohort-shared bench access.)
Format Prescriptions
Hour budget: ~30 lec hr + ~55 lab hr + ~95 indep hr (= ~180 hr total).
Live (standard cadence)
2 sessions/wk × 90 min over 14 weeks. Best for college-elective post-CSA-101.
Night class
1-2 sessions/wk evenings; ~30 weeks. Module 7 (MMU) and Module 14 (capstone) need extended-evening blocks.
Bootcamp
40 hr/wk × 5 weeks intensive. Compressed but feasible.
Async self-paced
Recorded video; per-student DE10-Nano kit; AI-assistant tier add-on; 1:1 tutoring premium for MMU + driver-writing.
High school / homeschool co-op
Year-long cadence at HS scheduling. Recommended pairing with CSA-101 in the prior year.