The RISC-V Files - Part 2: Digging Deeper

The RISC-V Files - Part 2: Digging Deeper

An old fart documents his exploration of a new(ish) technology

·

16 min read

A look at the ISA

As before, please note that I am not a RISC-V expert. This information has been gleaned from multiple sources and where possible I'll link to the original documents.

The RISC-V ISA was designed to be as modular as possible so that manufacturers could target as wide a range of applications as possible, from tiny embedded devices to high-end server systems. Consequently, there are a handful of "base" ISAs that form the bare minimum of a RISC-V processor, and then a selection of optional "extensions" which provide extra functionality. For the base ISA you have a choice of:

Base ISA NameDescription
RV32I32-bit Integer
RV32E32-bit Integer, with a reduced number of integer registers - intended for smaller, embedded systems
RV64I64-bit Integer
RV128I128-bit Integer

The ISAs are essentially identical except for the bit-size (referred to internally as "XLEN") which indicates the size of the registers and the address-space. All RISC-V processors must implement one of these base ISAs, together with a selection of extensions such as the ones listed below.

Extension NameDescription
MInteger Multiplication and Division
AAtomics
FSingle-Precision Floating-Point
DDouble-Precision Floating-Point
GGeneral
QQuad-Precision Floating-Point
LDecimal Floating-Point
C16-bit Compressed Instructions
BBit Manipulation
VVector Instructions

Part of the spec includes a naming convention that consists of the base ISA followed by the names of the implemented extensions.
e.g. RV64IMAFD describes a 64-bit RISC-V containing integer multiplication, atomic instructions, floating point, and double-precision floating-point instructions.

Privilege Modes

In addition to the regular extensions, there are also special privilege modes which may be implemented, that control what running code can access. Support for these privilege modes is required by operating systems such as Linux/BSD that implement separation between user programs and system-level processes. The available RISC-V privilege modes are:

Level Abbreviation

Name

Description

U

User/Application

Unprivileged mode - intended for OS users.

S

Supervisor

Supervisor mode - intended for OS kernel.

M

Machine

Provides full access to the hardware.

Machine mode (M) is the only mode required to be implemented and allows for full access to all of the processor's features and memory. It is akin to what you would normally expect from a microcontroller. However, if S and U are implemented then the RISC-V becomes capable of running protected Operating Systems such as Linux/BSD.
In addition to these modes, there is also a hypervisor mode, but frankly, I haven't even begun to scratch the surface on that stuff yet.

You've Gotta Have Hart

A mysterious term that you will encounter in RISC-V world is "hart" [sic]. It's defined by the spec as a "hardware thread", but this has confused many people because it's a concept that doesn't easily map to other architectures. Consequently, it's not uncommon to find people equating it with "core" - which is wrong; a RISC-V core may have multiple harts. You'll also find people defining it in terms of hardware components, such as registers and instruction fetch units; this isn't necessarily true either.

How a hart is implemented in hardware is not really relevant, providing each hart can run independently and keeps its own state and execution context. It's very similar to hyper-threading in that regard; however, the implementation is not prescribed, so it could be engineered in multiple ways.

Registers

The RISC-V has 32 general-purpose registers (16 in the RV32E) named x0 to x31, and a separate program counter register that contains the address of the instruction currently being executed. x0 is special in that it always contains the value 0. The width of these registers is either 32, 64, or 128 bits, as dictated by the XLEN value (see above). Technically, these registers can be used in any way you like however, RISC-V specifies recommended usages for each of them, together with alternative names (AKA "ABI names") that indicate their purpose. Here are a few examples:

Register name

ABI name

Description

x0

zero

always set to zero

x1

ra

return address

x2

sp

stack pointer

x3

gp

global pointer

x4

tp

thread pointer

x5

t0

temporary register 0

x6

t1

temporary register 1

x7

t2

temporary register 2

x8

s0 / fp

saved register 0 / frame pointer

x9

s1

saved register 1

x10

a0

function argument 0 / return value 0

x11

a1

function argument 1 / return value 1

x12

a2

function argument 2

x13

a3

function argument 3

x14

a4

function argument 4

x15

a5

function argument 5

x16

a6

function argument 6

x17

a7

function argument 7

Instructions

As with all RISC processors, the number of instructions in the ISA is small: the base RV32I has just 47 instructions. All instructions are generally 32-bits wide; however, they have been designed in such a cunning way that they can be extended to over 192-bits. As I mentioned before, the RISC-V design is really pretty good.

Rather than pathetically transcribing the instructions from the spec to here, how about a different idea (for those that are interested in those kinds of details)? Building it yourself!

Wait, wait, don't run away! Even if you have no idea about VHDL or Verilog, this amazing tutorial documents the experience of someone going from knowing nothing about it at all, to designing and building their own tiny little RISC-V core called "FemtoRV". Even if you don't actually follow the steps, reading along will give you greater insight into the guts of RISC-V than you will possibly gain from reading dry texts (e.g. the RISC-V spec).

So, if you're interested in embarking on an extraordinarily enlightening journey, I can't recommend Bruno Levy's "From Blinky to RISC-V" tutorial highly enough. You don't need any hardware, but investing in a low-cost FPGA makes the whole thing more magical. For what it's worth, it inspired me to learn about Verilog and follow along. Running code on a processor you built yourself feels a lot like a Jedi using his homebrew lightsaber for the first time. Probably.

If you're not interested in the nitty-gritty then fair enough, there's a lot more to dig into without that. And of course, a quick Google will get you the summary.

The $11 RISC-V Linux Machine

Picture of the M1S Dock

A few weeks ago someone on Twitter posted a link to an $11 RISC-V Linux machine, and within a few minutes, I'd ordered it. [Actually, it turns out I'd accidentally ordered the crappy 3d-printed case on its own; so I had to re-order the right part number. But let's not talk of that ever again. Eventually, I got the right part delivered.]

It's called the "M1S Dock", and apart from the low price, it has some interesting specs, not least of which is that it contains three RISC-V CPUs:
· RV64GCV 480MHz
· RV32GCP 320MHz
· RV32EMC 160MHz

It also contains a general-purpose hardware accelerator that can be used for video/audio handling.

Getting Linux up on this was pretty straightforward. Sipeed has made available a simple Linux example that is ready to run.

As with most of the documentation for these Sipeed devices, it's written in Chinese, and at best there are "AI" translations which can be a little hard to follow. [I'm not knocking them at all for this by the way! These devices are incredible for the price and it's nice of them to make any kind of English docs available at all as far as I'm concerned. But occasionally it is kind of funny, especially when the hollow marking bullshit lines are translated into things like "Embrace Digital Intelligence Future with Chip Power"...but I digress.] Regardless, here is my interpretation of them, including things that I had to discover the hard way.

Firstly, a word about nomenclature. You'll see this device and its components described in different ways, and attributed to different manufacturers - I found it quite confusing when trying to get answers. Google Translate is your friend.

The "M1s Dock" is made by Sipeed and based around the Sipeed M1s module. Documentation starts on the Sipeed Wiki.
The M1s module is based around a SoC (System on Chip) called the BL808, made by Bouffalo[sic] Labs. The BL808 in turn includes cores designed by Alibaba's subsidiary, T-Head, most notably the Xuantie c906. These cores are not only based on RISC-V, but they are open source with an Apache 2.0 license!

So, to get started, download the Bouffalo Labs "DevCube" software. The main bundle contains binaries for x86_64 Linux, MacOS and Windows. If this is no good for you, there are open-source tools you can build for your specific needs, but that's a longer story for another day.

Next, download and unzip the example Linux build:

https://dl.sipeed.com/fileList/MAIX/M1s/M1s_Dock/7_Firmware/m1sdock_linux_20221116.zip

Inside you will find three .bin files and a markdown file with instructions for installing.

At this point, plug in your M1S Dock: connect the USB-C socket marked "UART" (on the left of module) to the USB on your machine. Then put it into Boot Mode:

  • Hold down the Boot button

  • Press and release the Reset button

  • Release the Boot button

Now run the DevCube binary. You may need to chmod u+x the binary so that you can run it. For example, on my mac:

implant:~/devel/sipeed/devcube$ chmod u+x BLDevCube-macos-arm64
implant:~/devel/sipeed/devcube$ ./BLDevCube-macos-arm64

The first thing you should be presented with is a dialog asking which chip you're working with.

Choose the BL808 and click finish.

The main DevCube window should appear. On the top right is a pull-down menu for selecting which serial port you want to use. When the device is plugged in, you should see two ports listed. The one you want to select is the one with the higher number.

E.g.:

Then select the "MCU" tab and fill in the form as below. Rather than enter the pathnames, click the adjacent "Browse" button and use your file browser to select the correct file.

Here it is in text form:

M0

group0

0x58000000

/path/to/low_load_bl808_m0@0x58000000.bin

D0

group1

0x58000000

/path/to/low_load_bl808_d0@0x58000000.bin

Click "Create & Download" and the flashing should commence, with details in the log window.
Once the progress bar turns green at 100%, change to the "IOT" tab.

In the "Single Download Options" section, check the "Enable" box. Enter "0xD2000" into the next box, and then click "Browse". Use your file browser to select the "whole_img_linux@0xD2000.bin" file. Then, when you're ready, click "Create & Download" again. This will take longer than before, and don't worry when it pauses at the erasing stage - just let it go and wait until it finishes.

At this point, you should have a working, albeit tiny, Linux installation. Fire up a serial terminal at 2 million baud. Yes, really. I know. On a Linux box this isn't a problem, GNU screen and friends handle it just fine. On a Mac, this is an issue for some reason. Thanks to the good folk of The Internet, it turns out that a python tool called "miniterm" handles it well. It's part of pyserial. You can run it with:

python -m serial.tools.miniterm

--- Available ports:
---  1: /dev/cu.Bluetooth-Incoming-Port 'n/a'
---  2: /dev/cu.usbserial-SI88480 'USB TO DUALUART'
---  3: /dev/cu.usbserial-SI88481 'USB TO DUALUART'
--- Enter port index or full name: /dev/cu.usbserial-SI88480

This time, select the lower of the two serial port numbers. Then type 'CTRL-T' followed by 'b' to change the baud rate, and type '2000000'

You should be set.

Press RESET and you should see the boot log:

--- Settings: /dev/cu.usbserial-SI88480  2000000,8,N,1
--- RTS: active    DTR: active    BREAK: inactive
--- CTS: active    DSR: inactive  RI: inactive  CD: inactive
--- software flow control: inactive
--- hardware flow control: inactive
--- serial input encoding: UTF-8
--- serial output encoding: UTF-8
--- EOL: CRLF
--- filters: default
dynamic memory init success,heap size = 26 Kbyte
C906 start...
mtimer clk:1000000
linux load start...
len:0x00376c53
vm linux load done!
dtb load done!
opensbi load done!

load time: 426340 us

OpenSBI v0.6
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name          : T-HEAD Xuantie c910
Platform HART Features : RV64ACDFIMSUVX
Platform Max HARTs     : 1
Current Hart           : 0
Firmware Base          : 0x3eff0000
Firmware Size          : 56 KB
Runtime SBI Version    : 0.2

MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b1ff
[    0.000000] Linux version 5.10.4 (taorye@tao-b660mstx) (riscv64-unknown-linux-gnu-gcc (Xuantie-900 linux-5.10.4 glibc gcc Toolchain V2.2.4 B-20211227) 10.2.0, GNU ld (GNU Binutils) 2.35) #4 SMP Fri Nov 4 18:23:30 CST 2022
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000050000000-0x0000000053ffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000050000000-0x0000000053ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000050000000-0x0000000053ffffff]
[    0.000000] On node 0 totalpages: 16384
[    0.000000]   DMA32 zone: 224 pages used for memmap
[    0.000000]   DMA32 zone: 0 pages reserved
[    0.000000]   DMA32 zone: 16384 pages, LIFO batch:3
[    0.000000] software IO TLB: Cannot allocate buffer
[    0.000000] SBI specification v0.2 detected
[    0.000000] SBI implementation ID=0x1 Version=0x6
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] riscv: ISA extensions acdfimsuv
[    0.000000] riscv: ELF capabilities acdfimv
[    0.000000] percpu: Embedded 17 pages/cpu s32600 r8192 d28840 u69632
[    0.000000] pcpu-alloc: s32600 r8192 d28840 u69632 alloc=17*4096
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists, mobility grouping off.  Total pages: 16160
[    0.000000] Kernel command line: console=ttyS0,2000000 loglevel=8 earlyprintk earlycon=sbi root=/dev/mtdblock0 ro rootfstype=squashfs
[    0.000000] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.000000] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 53200K/65536K available (3960K kernel code, 2846K rwdata, 2048K rodata, 159K init, 288K bss, 12336K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[    0.000000]     Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 64 local interrupts mapped
[    0.000000] plic: interrupt-controller@e0000000: mapped 64 interrupts with 1 handlers for 2 contexts.
[    0.000000] random: get_random_bytes called from start_kernel+0x298/0x3a6 with crng_init=0
[    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
[    0.000018] sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns
[    0.000817] Console: colour dummy device 80x25
[    0.001079] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=4000)
[    0.001567] pid_max: default: 32768 minimum: 301
[    0.002186] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.002511] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.007155] ASID allocator initialised with 65536 entries
[    0.007761] rcu: Hierarchical SRCU implementation.
[    0.009096] smp: Bringing up secondary CPUs ...
[    0.009260] smp: Brought up 1 node, 1 CPU
[    0.010347] devtmpfs: initialized
[    0.013165] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.013639] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[    0.015213] NET: Registered protocol family 16
[    0.016370] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.017284] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.018442] i2c-core: driver [dummy] registered
[    0.039257] SCSI subsystem initialized
[    0.041583] clocksource: Switched to clocksource riscv_clocksource
[    0.065101] NET: Registered protocol family 2
[    0.067063] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.067483] TCP established hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.067913] TCP bind hash table entries: 512 (order: 1, 8192 bytes, linear)
[    0.068317] TCP: Hash tables configured (established 512 bind 512)
[    0.069030] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.069451] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.070200] NET: Registered protocol family 1
[    0.072405] workingset: timestamp_bits=62 max_order=14 bucket_order=0
[    0.087194] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.088848] NET: Registered protocol family 38
[    0.089119] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    0.089538] io scheduler mq-deadline registered
[    0.089692] io scheduler kyber registered
[    0.095236] 30002000.serial: ttyS0 at MMIO 0x30002000 (irq = 1, base_baud = 2000000) is a BFLB UART
[    0.095702] printk: console [ttyS0] enabled
[    0.095702] printk: console [ttyS0] enabled
[    0.096154] printk: bootconsole [sbi0] disabled
[    0.096154] printk: bootconsole [sbi0] disabled
[    0.126643] brd: module loaded
[    0.152643] loop: module loaded
[    0.154170] physmap-flash 58500000.xip_flash: physmap platform flash device: [mem 0x58500000-0x588fffff]
[    0.155635] 1 fixed-partitions partitions found on MTD device xip-flash.0
[    0.156067] Creating 1 MTD partitions on "xip-flash.0":
[    0.156406] 0x000000000000-0x000000280000 : "rootfs"
[    0.161139] mousedev: PS/2 mouse device common for all mice
[    0.162078] i2c /dev entries driver
[    0.162621] i2c-core: driver [i2c-slave-eeprom] registered
[    0.164022] [perf] T-HEAD C900 PMU probed
[    0.166703] NET: Registered protocol family 10
[    0.169036] Segment Routing with IPv6
[    0.169554] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.171437] NET: Registered protocol family 17
[    0.171810] Key type dns_resolver registered
[    0.172435] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
[    0.185215] VFS: Mounted root (squashfs filesystem) readonly on device 31:0.
[    0.192650] devtmpfs: mounted
[    0.193663] Freeing unused kernel memory: 156K
[    0.218686] Run /sbin/init as init process
[    0.218957]   with arguments:
[    0.219151]     /sbin/init
[    0.219330]     earlyprintk
[    0.219513]   with environment:
[    0.219716]     HOME=/
[    0.219875]     TERM=linux
********************************
 Exec rcS
********************************
********mount all********
mount: according to /proc/mounts, porc is already mounted on /proc
mount: according to /proc/mounts, devtmpfs is already mounted on /dev
mount: mounting devpts on /dev/pts failed: No such file or directory
This may take some time ...
mount: mounting sysfs on /sys failed: Device or resource busy
--------Start Local Services--------
********************************
********************************

Linux login: root
login[40]: root login on 'ttyS0'
Processing /etc/profile ...
Set search library path in /etc/profile
Set user path in /etc/profile
id: unknown ID 0
Welcome to Linux
[@Linux root]#

You can login as "root" with no password, and have a poke about.

Getting Lower

This hardware doesn't just run Linux. As much as I love Linux/BSD, I find messing around with the bare metal a lot more interesting. Bouffalo also provides an SDK which helps you build FreeRTOS applications. We'll talk about that next time. If you've been following along or just looking closely you'll have noticed the name "OpenSBI" cropping up in the boot logs. This is a fascinating part of the RISC-V world and stands for "Open Source Supervisor Binary Interface". You'll see above that it identifies the platform it's running on at start-up:

Platform Name          : T-HEAD Xuantie c910
Platform HART Features : RV64ACDFIMSUVX

Hopefully, those names, numbers, and letters should mean something to you at this stage. We'll cover OpenSBI too. Also, because everyone seems to think the only hope for the future of the universe is Rust, we'll get some rust code running on bare metal RISC-V too. Thanks for reading.