I/O Device Driver

At this point, you have finished reading all the kernel code of egos-2000 under the grass directory. You have also finished cpu_intr.c and cpu_mmu.c under the earth directory which handle interrupts, exceptions, and virtual memory.

In this project, you will read dev_tty.c and dev_disk.c under the earth directory. They contain the driver code for the terminal (keyboard input & screen output) and disk devices. The two files together have fewer than 170 lines of code, but they give good examples of an important concept to learn in this project, memory-mapped I/O.

I/O bus and device

An I/O bus connects various devices with the CPU. There are different types of I/O bus, and we introduce three of them connecting the CPU to the terminal and disk devices. In general, computers need to read keyboard input, and print characters on a screen. There used to be a terminal device separate from a computer's main body handling these functionalities. This photo of the VT100 video terminal was taken at the Computer History Museum.

Failed to load picture

UART and terminal

In egos-2000, a terminal is connected to the CPU using Universal Asynchronous Receiver/Transmitter (UART). UART involves only two hardware pins on the CPU -- one for receiving bytes and the other for sending (i.e., transmiting) bytes.

When pressing a key on the keyboard, the terminal sends the corresponding character as a byte through UART to the CPU, and the operating system reads this byte from the Receiver hardware pin. When the operating system prints a character, it sends out a byte through the CPU's Transmitter hardware pin, and UART will pass the byte to the terminal. Asynchronous in UART means that electrical signals on the two hardware pins do not wait for each other.

The code below shows how egos-2000 connects to a terminal device using UART.

#define UART_BASE   0x10000000UL
#define LINE_STATUS 5UL

void uart_getc(char* c) {
    while (!(REGB(UART_BASE, LINE_STATUS) & (1 << 0)));
    *c = REGW(UART_BASE, 0) & 0xFF;
}

void uart_putc(char c) {
    while (!(REGB(UART_BASE, LINE_STATUS) & (1 << 5)));
    REGW(UART_BASE, 0) = c;
}

The CPU uses the special memory address 0x10000000 to communicate with the terminal. When receiving a byte, bit#0 of the line status register at 0x10000005 will be set to 1 by the CPU. After detecting such a byte, uart_getc reads it from memory address 0x10000000. When sending a byte, uart_putc waits for UART to be idle (i.e., UART finishes sending the previous byte) by checking bit#5 of the line status register. After UART is ready, uart_putc writes to address 0x10000000 the byte to be printed on the terminal screen.

The code looks simple because most of the complexities are handled by the hardware. For example, when running egos-2000 on a RISC-V board, and using the screen command on MacOS as the terminal, the UART/USB bridge chip labeled below will convert the electrical signals between UART and Universal Serial Bus (USB) so that egos-2000 does not have to handle the complexity of USB. In many operating systems, the driver code for USB could be a lot more complicated than the code for UART above.

Failed to load picture

This UART driver code is also an example of memory-mapped I/O. Specifically, a hardware manufacturer can define special memory regions used to control I/O devices, and different manufacturers can define different regions. Indeed, egos-2000 can run on both QEMU and RISC-V boards which use different regions for UART. As shown in library/egos.h, RISC-V boards use the memory region starting at 0xF0001000 to control UART. The driver code for RISC-V boards is also slightly different from the code for QEMU in dev_tty.c.

SPI and SD card

A computer typically needs a disk storing blocks of data when the computer is powered off, and egos-2000 uses an SD card as the disk.

On a RISC-V board, an SD card is connected with the CPU using Serial Peripheral Interface (SPI) which has four hardware pins on the CPU, two more pins than UART. The picture from Wikipedia illustrates the four hardware pins.

Failed to load picture

Consider the CPU as the SPI Main and the SD card device as the SPI Sub. Both sides have four hardware pins, and their functionalities are described below.

Chip Select (CS) resets the SD card before starting to use it.
Serial Clock (SCLK) provides clock signals from the CPU (e.g., 20MHz).
Main Out Sub In (MOSI) is used by the CPU to send bytes to the SD card.
Main In Sub Out (MISO) is used by the SD card to send bytes to the CPU.

Similar to UART, the CPU provides memory-mapped I/O regions for communicating with the SD card through SPI. Different from UART, the SPI Main and SPI Sub exchange bytes during communication. The code below explains how it works.

static char spi_exchange(char byte) {
    /* The "exchange" here means sending a byte and then receiving a byte. */
    REGW(SDSPI_BASE, LITEX_SPI_MOSI)    = byte;
    REGW(SDSPI_BASE, LITEX_SPI_CONTROL) = (8 * (1 << 8) | (1));

    while ((REGW(SDSPI_BASE, LITEX_SPI_STATUS) & 1) != 1);
    return (char)(REGW(SDSPI_BASE, LITEX_SPI_MISO) & 0xFF);
}

First, byte is sent out through the MOSI pin. You can ignore LITEX_SPI_CONTROL which is hardware specific. After sending byte, SPI immediately receives a byte from the MISO pin (i.e., from the SD card) as the return value of spi_exchange. The while loop waits for the arrival of a byte just like the loop in uart_getc. The difference is that SPI always receives a byte after sending a byte (i.e., synchronous), while UART is asynchronous.

While spi_exchange uses the MOSI and MISO SPI pins, the code below controls the clock signals sent out to the SCLK pin from the CPU.

#define CPU_CLOCK_RATE 100000000 /* 100MHz */
INFO("Set the SPI clock to 20MHz for the SD card");
REGW(SDSPI_BASE, LITEX_SPI_CLKDIV) = CPU_CLOCK_RATE / 20000000 + 1;

TIP

As an exercise, read the sdspi_init function in dev_disk.c from which you will see how SCLK is controlled during the initialization of an SD card.

An SD card command has 6 bytes so that the operating system can ask the SD card to do a certain task by sending the corresponding 6-byte command using spi_exchange. Given an SD card command, the sdspi_exec_cmd function sends the 6 bytes, and waits for the reply from SD card until a timeout.

static char sdspi_exec_cmd(char* cmd) {
    /* Send a 6-byte SD card command through the SPI bus. */
    for (uint i = 0; i < 6; i++) spi_exchange(cmd[i]);
    #define TIME_OUT 8000
    for (uint reply, i = 0; i < TIME_OUT; i++)
        if ((reply = spi_exchange(0xFF)) != 0xFF) return reply;

    return 0xFF;
}

Read an SD card block

The SD card command #17 is defined for reading a block. An SD card block is typically 512 bytes -- when reading or writing an SD card, the operating system will read or write a 512-byte block altogether. This is different from the terminal device which reads or writes in the granularity of a single byte.

Given sdspi_exec_cmd, the sdspi_read function below reads a block from the SD card to memory address dst, and the offset argument decides which block should be read. For example, if offset is 0, sdspi_read will read the very first block on the SD card.

static void sdspi_read(uint offset, char* dst) {
    /* Wait until SD card is ready for a new command. */
    while (spi_exchange(0xFF) != 0xFF);

    /* Send a read request with command #17. */
    char* arg = (void*)&offset;
    char reply, cmd17[] = {17 | (1 << 6), arg[3], arg[2], arg[1], arg[0], 0xFF};
    if (reply = sdspi_exec_cmd(cmd17)) FATAL("cmd17 returns non-zero status");

    /* Wait for the 512-byte block, and ignore the 2-byte checksum. */
    while (spi_exchange(0xFF) != 0xFE);
    for (uint i = 0; i < BLOCK_SIZE; i++) dst[i] = spi_exchange(0xFF);
    spi_exchange(0xFF);
    spi_exchange(0xFF);
}

On the high level, the code above proceeds with the following steps.

Wait for the SD card device to be ready for the next command.
Send command #17 to the SD card. Out of the 6 bytes, the 4 bytes in the middle encode a block number (i.e., offset) indicating which block should be read.
Wait for the SD card device to be ready to send back the 512-byte block data.
Receive 512 bytes from the SD card as the block data together with a 2-byte checksum.

At this point, we have shown a concrete example of controlling an SD card device using SPI with a good amount of hardware details of SPI. Beyond this point, we focus on the memory-mapped I/O interface alone because the hardware pins and signals become complicated.

PCIe and plug-and-play

While SPI provides a simple way of using an SD card, there are two constraints.

SPI devices typically need manual setup and cannot be automatically detected.
Reading the 512-byte block data byte-by-byte through spi_exchange is very slow.

To address such problems, we introduce Plug-and-Play and Direct Memory Access (DMA). Plug-and-Play allows the operating system to detect a new device connected with the CPU, and configure memory-mapped I/O regions for the new device. DMA allows the hardware to read and write the memory directly so that an SD card can write the block data into memory without the CPU having to run spi_exchange in a loop of 512 times.

Plug-and-Play and DMA are made possible by Peripheral Component Interconnect (PCI) or its successor PCIe. In egos-2000, the memory region starting at 0x30000000 controls PCIe which connects with multiple devices. The operating system controls device #i on PCIe by reading or writing the memory region [0x30000000+0x8000*i, 0x30000000+0x8000*(i+1)). For example, the addr=0x1 in the definition of QEMU_SD_CARD in the Makefile indicates that an SD card is inserted as device #1 on the PCIe bus. Therefore, the memory at 0x30008000 is used to control this SD card (i.e., SDHCI_PCI_ECAM in library/egos.h). The table below is from a wiki page for PCI, and it shows how to interpret the memory at 0x30008000.

Failed to load picture

The first 4 bytes at 0x30008000 provide the Vendor ID and Device ID for device #1 on PCIe. Based on the two ID numbers, an operating system learns that device #1 is an SD card. This is essentially how Plug-and-Play works: the operating system periodically reads the 4 bytes at 0x30000000 (device #0), 0x30008000 (device #1), 0x30010000 (device #2), and so on. According to the two ID numbers, the operating system learns whether a device is plugged in or plugged out at a PCIe address. Later in this project, you will modify the QEMU_GRAPHIC in the Makefile, and plug in a VGA device as device #2 on the PCIe bus.

TIP

As an exercise, read the first two lines of the sdhci_init function in dev_disk.c. They modify Command (0x4) and Base address #0 (0x10) according to the table above. Understand these two lines of code, especially the Memory Space and Bus Master bits, by reading the wiki page for PCI.

SD host controller interface

The PCIe bus provides an advanced memory-mapped I/O interface for SD cards called the Secure Digital Host Controller Interface (SDHCI). An example of using SDHCI in egos-2000 is the sdhci_read function in dev_disk.c which reads a block with the command #17 just like sdspi_read, but sdhci_read uses the DMA feature in the SDHCI specification.

SDHCI specification

The SD Association maintains the official SDHCI specification just like RISC-V International maintains the ratified ISA specifications of RISC-V. In this project, you will need Chapter 2.1 and 2.2 of the SD Host Controller Simplified Specification. Table 2-1 in Chapter 2.1.1 shows the SDHCI register map, and the screenshot below shows part of this table which is enough to understand the sdhci_read function.

Failed to load picture

Direct memory access

The first few lines of sdhci_read prepare direct memory access.

/* Prepare DMA (SDMA mode of SDHCI). */
static __attribute__((aligned(BLOCK_SIZE))) char aligned_buf[BLOCK_SIZE];
REGW(SDHCI_BASE, 0x0) = (uint)aligned_buf;
REGW(SDHCI_BASE, 0x4) = (1 << 16) | BLOCK_SIZE;

According to the table above, the address of a 512-byte buffer aligned_buf is written to the SDMA System Address register at 000h. The Block Size register at 004h is written by BLOCK_SIZE, and the 16-bit Block Count register at 006h is written by 1. Together, these values indicate that sdhci_read asks SDHCI to write 1 block of BLOCK_SIZE bytes to the buffer aligned_buf when executing the upcoming SD card read command.

TIP

Read the rest of sdhci_read and the sdhci_exec_cmd function by referencing the table above. The code shows how to issue SD card command #17 by writing to the Argument register at 008h, the Transfer Mode register at 00Ch, and the Command register at 00Eh. It further shows how to wait for the SD card command #17 to finish after which the 512-byte block data should be ready in aligned_buf given how we have prepared DMA.

Chapter 2.2 of the specification explains the details of all the registers in Table 2-1. You are encouraged to study the difference between Single Operation DMA (SDMA) and Advanced DMA (ADMA) by reading more of the specification or asking ChatGPT. For example, you will learn that a constraint of SDMA is that the DMA buffer cannot span across multiple memory pages. By making the DMA buffer aligned by BLOCK_SIZE, the code above guarantees that all the 512 bytes of aligned_buf are in the same 4KB memory page. With this constraint in mind, we now ask you to implement a better version of the SD card driver.

Read and write multiple blocks

Our driver code uses the SD card command #17 to read a single block, and the disk_read function simply calls sdhci_read or sdspi_read multiple times within a loop.

The SD card standard provides command #18 and #25 for reading and writing consecutive blocks altogether. Your job is to replace the loop in disk_read and implement disk_write with your own SD card driver using SD command #18 and #25. The details of command #18 and #25 can be found in this blog while you can certainly find other materials about the two SD card commands online.

Start with the driver for SDHCI which runs on QEMU. After you finish, egos-2000 should be able to run normally. It would be very useful and important to write unit tests for your driver code. For example, instead of booting egos-2000, you can run your unit tests right after the call to disk_init() in boot() of earth/boot.c. The test code should write and read the 4MB SD card multiple times, and check whether the code behavior is expected. If you have a RISC-V board, you can further implement the driver code for the SPI bus.

Plug in a VGA device on PCIe

To learn more about device drivers, you will now plug in a VGA device on the PCIe bus, and write the driver code initializing this device. The goal is to ensure apps/user/video_demo.c can work on QEMU as shown in this screenshot.

Failed to load picture

To start with, update the QEMU_GRAPHIC in the Makefile as -device VGA,addr=0x2 -serial mon:stdio which plugs in a VGA device as device #2 on PCIe. The rest of your code should be in boot() of earth/boot.c. From QEMU's document for this VGA device, you can find the "PCI spec" of this device as follows.

PCI Region 0: Framebuffer memory, 16 MB in size (by default).
PCI Region 2: MMIO bar, 4096 bytes in size (QEMU 1.3+)

This means that you need to initialize both base address #0 and base address #2 of device #2's PCI configuration. The first address has been defined for you as VIDEO_FRAME_BASE in library/egos.h, and it has been used by video_demo to set the RGB value of each pixel.

Your job is to define the address for base address #2, and initialize the memory-mapped I/O region according to the "MMIO area spec" section of QEMU's document. The "vga ioports" and "bochs dispi interface registers" in this section are not really well explained, so search for more information yourself. The comments in boot() provide some hints and guidance.

Your driver code only runs on QEMU because the VGA/HDMI device on the RISC-V boards does not require any driver code to run video_demo. If you have a board such as the $35 Tang Nano 20K, and you are more interested in designing a Graphical User Interface (GUI) for egos-2000, you can directly run your GUI design on your board without worrying about the VGA driver at all. You simply need a monitor and a VGA or HDMI cable.

Accomplishments

You have learned memory-mapped I/O and three types of I/O bus by reading the device driver code in egos-2000. The SDHCI driver gives a concrete example of direct memory access. You have also written some driver code for an SD card and a VGA display device.

I/O Device Driver ​

I/O bus and device ​

UART and terminal ​

SPI and SD card ​

Read an SD card block ​

PCIe and plug-and-play ​

SD host controller interface ​

SDHCI specification ​

Direct memory access ​

Read and write multiple blocks ​

Plug in a VGA device on PCIe ​

Accomplishments ​

I/O Device Driver

I/O bus and device

UART and terminal

SPI and SD card

Read an SD card block

PCIe and plug-and-play

SD host controller interface

SDHCI specification

Direct memory access

Read and write multiple blocks

Plug in a VGA device on PCIe

Accomplishments