Skip to content

System Call & Protection

You have seen how an operating system manages the life cycles of threads (or processes, as we don't distinguish the two right now), and schedules multiple threads in a preemptive way using timer interrupts. This project helps you understand two more things:

  • how threads invoke system calls in order to communicate with each other;
  • how an operating system protects its memory so that malicious threads cannot corrupt the system calls by modifying the code or data of the operating system.

System calls and memory protection rely on exception handling which is similar to interrupt handling. We thus start by introducing exception handling.

Exception handling

An exception happens if something goes wrong when the CPU executes an instruction. For example, an exception would happen when an instruction tries to access the memory at an invalid address. Instead of ignoring the problem and proceeding to the next instruction, the CPU automatically jumps to a special function called an exception handler, just like how the CPU jumps to the interrupt handler when receiving a timer interrupt. Indeed, we will use the same function to handle interrupts and exceptions, and then use the so-called mcause CSR to identify what caused the CPU to jump to the handler function.

The mcause CSR

Below are the screenshots of Table 22 and Table 23 from this document which describe the mcause CSR. You can further read chapter 3.1.15 of the RISC-V reference manual. When an exception or interrupt happens, the CPU will set the value of mcause before jumping to the handler function.

Failed to load picture

For example, mcause is set to 0x80000007 when the CPU receives a timer interrupt. Bit#31 of mcause is set to 1 because timer interrupt is an interrupt. The "exception code" bits are set to 0000000111 because 7 is the code for machine timer interrupt. When the CPU meets an illegal instruction (i.e., the 4 bytes pointed by the program counter cannot decode into a CPU instruction), mcause will be set to 0x2 before the CPU jumps to the handler function.

TIP

Exceptions are different from interrupts. Exceptions are triggered by CPU instructions that cause something wrong. Interrupts are triggered by devices outside of the CPU, such as a timer, a disk, or a network interface controller. The mcause CSR helps the operating system understand which particular interrupt or exception needs to be handled.

The ecall exception

Most exceptions happen due to something wrong such as invalid memory access. However, RISC-V provides a special instruction, ecall, which triggers the so-called environment call exception intentionally (see exception #8 and #11 in Table 23). This instruction is for system calls: if a thread invokes ecall, the control flow will transfer to the operating system which will serve a system call for this thread.

In egos-2000, ecall is used in library/syscall/syscall.c. Specifically, asm("ecall") will raise an environment call exception and then the CPU will jump to trap_entry as what you have seen in P2. trap_entry calls kernel_entry which further calls excp_entry, and excp_entry handles system calls within the if statement using this condition: (id >= EXCP_ID_ECALL_U && id <= EXCP_ID_ECALL_M). EXCP_ID_ECALL_U and EXCP_ID_ECALL_M are defined as 8 and 11 respectively according to Table 23. Take a look at how mcause has been read and used in trap_entry and kernel_entry.

TIP

The U-Mode and M-mode in Table 23 stand for user mode and machine mode. We will touch these privilege modes very soon when we start to explain memory protection.

A sketch of the "kernel"

We have been using the word "kernel" since P2, but we never explain what is an OS kernel. With the knowledge of mcause, we can show you a sketch of the "kernel".

c
void kernel() {
    int mcause_val, id;
    asm("csrr %0, mcause" : "=r"(mcause_val));
    id = mcause_val & 0x3FF;
    if (mcause_val & (1 << 31)) {
        if (id == 7) proc_yield();
    } else {
        if (id >= 8 && id <= 11) handle_system_call();
        if (id == 1 || id == 5 || id == 7) handle_memory_access_fault();
    }
}

The code above sketches the core of an operating system (aka. the kernel):

  • handle thread scheduling upon a timer interrupt
  • handle system calls when a thread invokes ecall
  • handle other exceptions such as memory access faults

We call it a sketch because a complete operating system needs to handle all the interrupts and exceptions, while the 3 bullets above are probably the most important ones. You have seen thread scheduling in P2, and P3 will give you hands-on experiences with the last two bullets from this kernel sketch.

Inter-process communication

In egos-2000, there are only 2 types of system calls which are designed for inter-process communication (i.e., sending and receiving messages). We now introduce the system call interface for applications and then explain what happens within the OS kernel.

Application-side interface

The code below is from library/syscall/syscall.h and it defines the data structures for system calls in egos-2000.

c
enum syscall_type {
    SYS_UNUSED,
    SYS_RECV, /* 1 */
    SYS_SEND, /* 2 */
};

struct syscall {
    enum syscall_type type; /* SYS_SEND or SYS_RECV */
    int sender;             /* sender process ID    */
    int receiver;           /* receiver process ID  */
    char content[SYSCALL_MSG_LEN];
    enum { PENDING, DONE } status;
};

The content field holds the message being sent or received. Say process A wants to send a message to process B through SYS_SEND, this system call may not succeed immediately. It will only succeed after process B invokes the SYS_RECV system call with process A as the sender (i.e., process B is ready to receive a message from process A). For this reason, before process B invokes the SYS_RECV system call, the SYS_SEND system call by process A would be in status PENDING instead of DONE.

TIP

In other words, egos-2000 implements a blocking version of inter-process communication such that a system call would only return after a message has been successfully sent or received. It is certainly possible to implement a non-blocking version such that system calls return immediately. egos-2000 implements the blocking version for simplicity of the code.

With struct syscall in mind, the system call interface sys_send and sys_recv defined in library/syscall/syscall.c should be easy to understand.

c
static struct syscall* sc = (struct syscall*)SYSCALL_ARG;

void sys_send(int receiver, char* msg, uint size) {
    sc->type = SYS_SEND;
    sc->receiver = receiver;
    memcpy(sc->content, msg, size);
    asm("ecall");
}

void sys_recv(int from, int* sender, char* buf, uint size) {
    sc->type   = SYS_RECV;
    sc->sender = from;
    asm("ecall");
    memcpy(buf, sc->content, size);
    if (sender) *sender = sc->sender;
}

Again, the ecall highlighted above will trigger an environment call exception and then the CPU will jump to the exception handler (i.e., trap_entry) right after ecall.

Kernel-side handling

As we have seen in P2, the exception handler function in egos-2000, trap_entry, will call kernel_entry which further calls excp_entry for the environment call exception. We now explain the following if-statement for system calls.

c
if (id >= EXCP_ID_ECALL_U && id <= EXCP_ID_ECALL_M) {
    /* Copy the system call arguments from user space to the kernel */
    uint syscall_paddr = earth->mmu_translate(curr_pid, SYSCALL_ARG);
    memcpy(&proc_set[curr_proc_idx].syscall, (void*)syscall_paddr,
            sizeof(struct syscall));

    proc_set[curr_proc_idx].mepc += 4;
    proc_set[curr_proc_idx].syscall.status = PENDING;
    proc_set_pending(curr_pid);
    proc_try_syscall(&proc_set[curr_proc_idx]);
    proc_yield();
    return;
}

First of all, proc_set[curr_proc_idx] is the struct process of the current process which has just invoked the system call.

  • Note that this process initialized the struct syscall data structure at memory address SYSCALL_ARG. Line#3 and #4 copy this data structure into the kernel. You can ignore the earth->mmu_translate for now and we will explain it in P4.

  • Recall that mepc stands for the program counter when the exception occurs, and the value of mepc is read into proc_set[curr_proc_idx].mepc in function kernel_entry. Therefore, line#7 says that, after the system call is done, the kernel should return to the instruction right after ecall in this process (i.e., skip the 4-byte ecall instruction).

  • Line#8 sets the system call status to PENDING and line#9 sets the process status to PROC_PENDING_SYSCALL. Line#10 attempts to handle the SYS_SEND or SYS_RECV for the current process and line#11 finds the next process to schedule.

  • You can also find proc_try_syscall in proc_yield because the scheduler will attempt a pending system call repeatedly until it succeeds.

Read proc_try_syscall, proc_try_send and proc_try_recv yourself. There are only 39 lines of code, but they gracefully handle inter-process communication. On the high-level, if a process has a pending system call, proc_try_syscall will attempt the system call again and set the process status to PROC_RUNNABLE if the attempt succeeds.

TIP

At this point, you have finished reading grass/kernel.c, including kernel_entry, intr_entry, excp_entry, proc_yield, proc_try_syscall, proc_try_send and proc_try_recv.

Introduce process sleep

After getting familiar with the system call control flow, we now ask you to use system calls to introduce process sleep. As shown in library/syscall/servers.h, the GPID_PROCESS process in egos-2000 accepts 3 message types for spawning and terminating processes. Your job is to add a fourth message type PROC_SLEEP to struct proc_request so that the process which sends this message to GPID_PROCESS will sleep for a certain amount of time before being scheduled again. Start from a fresh copy of egos-2000 and add the following code as a new file apps/user/sleep.c.

c
#include "app.h"

int main() {
  const uint usec_cnt = 5000000;
  printf("Start to sleep for %d microseconds\n\r", usec_cnt);
  sleep(usec_cnt);
  printf("Woke up again after %d microseconds\n\r", usec_cnt);
}

Then run make qemu and sleep will be automatically added as a user command:

shell
> make qemu
...
 /home/yunhao sleep
Start to sleep for 5000000 microseconds
Woke up again after 5000000 microseconds

However, you will see the second line of printing immediately after the first line because the sleep function in library/syscall/servers.c has not been implemented. The goal is to see the second line 5 seconds after the first line when you complete the steps below.

  1. Update the struct proc_request and the sleep function mentioned above so that this sleep function sends a PROC_SLEEP message to the GPID_PROCESS process.

  2. In apps/system/sys_proc.c, add a case for PROC_SLEEP and put a debug printing there temporarily, so you know that GPID_PROCESS succeeds in receiving the message.

  3. Add the proc_sleep function in grass/kernel.c to the grass interface (struct grass in library/egos.h) and initialize it in the grass_entry function of grass/init.c, just like the proc_alloc and proc_free functions.

  4. Invoke grass->proc_sleep in the PROC_SLEEP case you have just added in step 2. Then add a debug printing in the proc_sleep function of grass/kernel.c, so you know that proc_sleep is called by GPID_PROCESS with the correct pid and usec arguments.

  5. Implement this proc_sleep function which should put process pid into sleep for usec microseconds. This involves a few modifications to the kernel.

    • Add one or more fields to struct process and initialize them in proc_alloc().
    • Modify such fields for process pid in proc_sleep(). In addition to argument usec, you also need mtime_get() which returns the clock time in millisecond (on QEMU).
    • Modify the for-loop in proc_yield() and schedule a process only if it is not sleeping according to the fields in struct process and the clock time from mtime_get().
  6. After implementing the proc_sleep function, the kernel could meet a situation where no process can be scheduled. You will need some code within the if(CORE_IDLE) block of proc_yield() according to the comments there.

  7. Remove the debug printings. Run sleep again in the egos-2000 shell and you shall see the Woke up ... printing 5 seconds after the first line of printing.

Protect the OS memory

By far, all the code we have seen runs in the so-called machine mode which means that the code can freely access the memory. However, user applications should not be able to freely read or write the memory. Otherwise, a malicious application can corrupt the memory of the kernel and cause damages. On the high-level, we now ask you to do 3 things:

  • specify the memory region that code in the user mode is allowed to access
  • run the code of all user applications in the user mode instead of machine mode
  • terminate a user application if it triggers an exception by trying to read or write a memory address outside of the regions allowed

Setup a PMP region

Read through chapter 3.7 of the RISC-V reference manual for Physical Memory Protection (PMP) and then write your code in earth/cpu_mmu.c:

c
void mmu_init() {
    /* Setup a PMP region for the whole 4GB address space */
    asm("csrw pmpaddr0, %0" : : "r"(0x40000000));
    asm("csrw pmpcfg0, %0" : : "r"(0xF));

    /* Student's code goes here (System Call & Protection). */

    /* Setup PMP NAPOT region 0x80400000 - 0x80800000 as r/w/x */

    /* Student's code ends here. */
    ...
}

TIP

Your code should overwrite the two CSRs pmpaddr0 and pmpcfg0, so the 4GB region no longer takes effect and it is replaced by the 4MB region [0x80400000, 0x80800000). As a result, you will now only be able to choose software TLB when booting egos-2000, but that's OK.

Specifically, code running in the user mode cannot access any memory region by default. After you finish the code above, code running in the user mode will be able to access one and only one memory region, [0x80400000, 0x80800000), which contains the code, data, heap and stack of the current process (i.e., everything a user application needs).

However, PMP won't take any effects if we still run everything in the machine mode, so we need to switch privilege modes when switching the CPU context from the kernel back to a user application process.

Switch privilege modes

In short, you need to update mstatus.MPP in proc_yield:

c
void proc_yield() {
    ...
    /* Student's code goes here (Protection | Multicore & Locks). */

    /* Modify mstatus.MPP to enter machine or user mode after mret. */

    /* Student's code ends here. */
    ...
}

Recall that mstatus.MPP stands for bit#11 and bit#12 of the mstatus CSR:

Failed to load picture

You will need to set these bits as 11 if the next process scheduled is a kernel process (i.e., pid<GPID_USER_START) and set them as 00 for all the other processes. In RISC-V, 0 stands for user mode and 3 (i.e., 11 in binary) stands for machine mode. To see how it works, we need to introduce more about what happens when entering and exiting the kernel.

Upon an interrupt or exception, the CPU enters the kernel and it automatically switches the privilege mode to machine mode right before jumping to the handler function trap_entry. This allows the kernel to run in the machine mode and thus access the memory freely.

Upon executing mret in grass/kernel.s, the CPU exits the kernel and mret will switch the privilege mode according to mstatus.MPP. Therefore, if we set mstatus.MPP to 00 in proc_yield, the application code will run in the user mode after this mret.

Kill malicious applications

To test if you correctly set the PMP region and switch privilege modes, we have provided 2 malicious applications crash1 and crash2 under apps/user. The malicious applications would halt the whole operating system by corrupting the memory.

shell
>  make qemu
...
[CRITICAL] Choose a memory translation mechanism:
Enter 0: page tables
Enter 1: software TLB
[INFO] Software translation is chosen
...
[CRITICAL] Welcome to the egos-2000 shell!
 /home/yunhao crash1
_sbrk: heap grows too large
[FATAL] excp_entry: kernel got exception 7

Note that this FATAL happens at the end of function excp_entry. Your final task in P3 is to implement the following part of excp_entry.

c
static void excp_entry(uint id) {
    ...
    /* Student's code goes here (System Call & Protection). */

    /* Kill the process if curr_pid is a user application. */

    /* Student's code ends here. */
    FATAL("excp_entry: kernel got exception %d", id);
}

After excp_entry kills the malicious applications gracefully, you should see the following.

shell
# Make sure to choose software TLB
>  make qemu
...
> /home/yunhao crash1
_sbrk: heap grows too large
[INFO] process 6 terminated with exception 7
> /home/yunhao crash2
[INFO] process 7 terminated with exception 7
> /home/yunhao

In other words, memory protection should work and the malicious applications would run in the user mode and trigger memory exceptions when trying to corrupt the memory. And the kernel will kill these malicious applications when handling the exceptions.

Accomplishments

In terms of OS concepts, you have learned about exception handling, system calls, privilege modes and inter-process communication. In terms of code, you have read everything under grass and library/syscall. The grass directory is the "kernel" part of egos-2000, i.e., the core logic of an operating system.

You will read earth/cpu_mmu.c in P4. You will read everything under library/file in P6. You will read earth/dev_tty.c and earth/dev_disk.c in P5. Then you will finish reading essentially all the code of egos-2000. We are half way there!

"... any person ... any study."