System Call & Protection
You have seen how an operating system manages the life cycles of threads (or processes, as we don't distinguish the two right now), and schedules multiple threads in a preemptive way using timer interrupts. This project helps you understand two more things:
- how threads invoke system calls in order to communicate with each other;
- how an operating system protects its memory so that malicious threads cannot corrupt the system calls by modifying the code or data of the operating system.
System calls and memory protection rely on exception handling which is similar to interrupt handling. We thus start by introducing exception handling.
Exception handling
An exception happens if something goes wrong when the CPU executes an instruction. For example, an exception would happen when an instruction tries to access the memory at an invalid address. Instead of ignoring the problem and proceeding to the next instruction, the CPU automatically jumps to a special function called an exception handler, just like how the CPU jumps to the interrupt handler when receiving a timer interrupt. Indeed, we will use the same function to handle interrupts and exceptions, and then use the so-called mcause
CSR to identify what caused the CPU to jump to the handler function.
The mcause
CSR
Below are the screenshots of Table 22 and Table 23 from this document which describe the mcause
CSR. You can further read chapter 3.1.15 of the RISC-V reference manual. When an exception or interrupt happens, the CPU will set the value of mcause
before jumping to the handler function.
For example, mcause
is set to 0x80000007
when the CPU receives a timer interrupt. Bit#31 of mcause
is set to 1 because timer interrupt is an interrupt. The "exception code" bits are set to 0000000111
because 7 is the code for machine timer interrupt. When the CPU meets an illegal instruction (i.e., the 4 bytes pointed by the program counter cannot decode into a CPU instruction), mcause
will be set to 0x2
before the CPU jumps to the handler function.
TIP
Exceptions are different from interrupts. Exceptions are triggered by CPU instructions that cause something wrong. Interrupts are triggered by devices outside of the CPU, such as a timer, a disk, or a network interface controller. The mcause
CSR helps the operating system understand which particular interrupt or exception needs to be handled.
The ecall
exception
Most exceptions happen due to something wrong such as invalid memory access. However, RISC-V provides a special instruction, ecall
, which triggers the so-called environment call exception intentionally (see exception #8 and #11 in Table 23). This instruction is for system calls: if a thread invokes ecall
, the control flow will transfer to the operating system which will serve a system call for this thread.
In egos-2000, ecall
is used in library/syscall/syscall.c
. Specifically, asm("ecall")
will raise an environment call exception and then the CPU will jump to trap_entry
as what you have seen in P2. trap_entry
calls kernel_entry
which further calls excp_entry
, and excp_entry
handles system calls within the if statement using this condition: (id >= EXCP_ID_ECALL_U && id <= EXCP_ID_ECALL_M)
. EXCP_ID_ECALL_U
and EXCP_ID_ECALL_M
are defined as 8 and 11 respectively according to Table 23. Take a look at how mcause
has been read and used in trap_entry
and kernel_entry
.
TIP
The U-Mode and M-mode in Table 23 stand for user mode and machine mode. We will touch these privilege modes very soon when we start to explain memory protection.
A sketch of the "kernel"
We have been using the word "kernel" since P2, but we never explain what is an OS kernel. With the knowledge of mcause
, we can show you a sketch of the "kernel".
void kernel() {
int mcause_val, id;
asm("csrr %0, mcause" : "=r"(mcause_val));
id = mcause_val & 0x3FF;
if (mcause_val & (1 << 31)) {
if (id == 7) proc_yield();
} else {
if (id >= 8 && id <= 11) handle_system_call();
if (id == 1 || id == 5 || id == 7) handle_memory_access_fault();
}
}
The code above sketches the core of an operating system (aka. the kernel):
- handle thread scheduling upon a timer interrupt
- handle system calls when a thread invokes
ecall
- handle other exceptions such as memory access faults
We call it a sketch because a complete operating system needs to handle all the interrupts and exceptions, while the 3 bullets above are probably the most important ones. You have seen thread scheduling in P2, and P3 will give you hands-on experiences with the last two bullets from this kernel sketch.
Inter-process communication
In egos-2000, there are only 2 types of system calls which are designed for inter-process communication (i.e., sending and receiving messages). We now introduce the system call interface for applications and then explain what happens within the OS kernel.
Application-side interface
The code below is from library/syscall/syscall.h
and it defines the data structures for system calls in egos-2000.
enum syscall_type {
SYS_UNUSED,
SYS_RECV, /* 1 */
SYS_SEND, /* 2 */
};
struct syscall {
enum syscall_type type; /* SYS_SEND or SYS_RECV */
int sender; /* sender process ID */
int receiver; /* receiver process ID */
char content[SYSCALL_MSG_LEN];
enum { PENDING, DONE } status;
};
The content
field holds the message being sent or received. Say process A wants to send a message to process B through SYS_SEND
, this system call may not succeed immediately. It will only succeed after process B invokes the SYS_RECV
system call with process A as the sender
(i.e., process B is ready to receive a message from process A). For this reason, before process B invokes the SYS_RECV
system call, the SYS_SEND
system call by process A would be in status PENDING
instead of DONE
.
TIP
In other words, egos-2000 implements a blocking version of inter-process communication such that a system call would only return after a message has been successfully sent or received. It is certainly possible to implement a non-blocking version such that system calls return immediately. egos-2000 implements the blocking version for simplicity of the code.
With struct syscall
in mind, the system call interface sys_send
and sys_recv
defined in library/syscall/syscall.c
should be easy to understand.
static struct syscall* sc = (struct syscall*)SYSCALL_ARG;
void sys_send(int receiver, char* msg, uint size) {
sc->type = SYS_SEND;
sc->receiver = receiver;
memcpy(sc->content, msg, size);
asm("ecall");
}
void sys_recv(int from, int* sender, char* buf, uint size) {
sc->type = SYS_RECV;
sc->sender = from;
asm("ecall");
memcpy(buf, sc->content, size);
if (sender) *sender = sc->sender;
}
Again, the ecall
highlighted above will trigger an environment call exception and then the CPU will jump to the exception handler (i.e., trap_entry
) right after ecall
.
Kernel-side handling
As we have seen in P2, the exception handler function in egos-2000, trap_entry
, will call kernel_entry
which further calls excp_entry
for the environment call exception. We now explain the following if-statement for system calls.
if (id >= EXCP_ID_ECALL_U && id <= EXCP_ID_ECALL_M) {
/* Copy the system call arguments from user space to the kernel */
uint syscall_paddr = earth->mmu_translate(curr_pid, SYSCALL_ARG);
memcpy(&proc_set[curr_proc_idx].syscall, (void*)syscall_paddr,
sizeof(struct syscall));
proc_set[curr_proc_idx].mepc += 4;
proc_set[curr_proc_idx].syscall.status = PENDING;
proc_set_pending(curr_pid);
proc_try_syscall(&proc_set[curr_proc_idx]);
proc_yield();
return;
}
First of all, proc_set[curr_proc_idx]
is the struct process
of the current process which has just invoked the system call.
Note that this process initialized the
struct syscall
data structure at memory addressSYSCALL_ARG
. Line#3 and #4 copy this data structure into the kernel. You can ignore theearth->mmu_translate
for now and we will explain it in P4.Recall that
mepc
stands for the program counter when the exception occurs, and the value ofmepc
is read intoproc_set[curr_proc_idx].mepc
in functionkernel_entry
. Therefore, line#7 says that, after the system call is done, the kernel should return to the instruction right afterecall
in this process (i.e., skip the 4-byteecall
instruction).Line#8 sets the system call status to
PENDING
and line#9 sets the process status toPROC_PENDING_SYSCALL
. Line#10 attempts to handle theSYS_SEND
orSYS_RECV
for the current process and line#11 finds the next process to schedule.You can also find
proc_try_syscall
inproc_yield
because the scheduler will attempt a pending system call repeatedly until it succeeds.
Read proc_try_syscall
, proc_try_send
and proc_try_recv
yourself. There are only 39 lines of code, but they gracefully handle inter-process communication. On the high-level, if a process has a pending system call, proc_try_syscall
will attempt the system call again and set the process status to PROC_RUNNABLE
if the attempt succeeds.
TIP
At this point, you have finished reading grass/kernel.c
, including kernel_entry
, intr_entry
, excp_entry
, proc_yield
, proc_try_syscall
, proc_try_send
and proc_try_recv
.
Introduce process sleep
After getting familiar with the system call control flow, we now ask you to use system calls to introduce process sleep. As shown in library/syscall/servers.h
, the GPID_PROCESS
process in egos-2000 accepts 3 message types for spawning and terminating processes. Your job is to add a fourth message type PROC_SLEEP
to struct proc_request
so that the process which sends this message to GPID_PROCESS
will sleep for a certain amount of time before being scheduled again. Start from a fresh copy of egos-2000 and add the following code as a new file apps/user/sleep.c
.
#include "app.h"
int main() {
const uint usec_cnt = 5000000;
printf("Start to sleep for %d microseconds\n\r", usec_cnt);
sleep(usec_cnt);
printf("Woke up again after %d microseconds\n\r", usec_cnt);
}
Then run make qemu
and sleep
will be automatically added as a user command:
> make qemu
...
➜ /home/yunhao sleep
Start to sleep for 5000000 microseconds
Woke up again after 5000000 microseconds
However, you will see the second line of printing immediately after the first line because the sleep
function in library/syscall/servers.c
has not been implemented. The goal is to see the second line 5 seconds after the first line when you complete the steps below.
Update the
struct proc_request
and thesleep
function mentioned above so that thissleep
function sends aPROC_SLEEP
message to theGPID_PROCESS
process.In
apps/system/sys_proc.c
, add a case forPROC_SLEEP
and put a debug printing there temporarily, so you know thatGPID_PROCESS
succeeds in receiving the message.Add the
proc_sleep
function ingrass/kernel.c
to the grass interface (struct grass
inlibrary/egos.h
) and initialize it in thegrass_entry
function ofgrass/init.c
, just like theproc_alloc
andproc_free
functions.Invoke
grass->proc_sleep
in thePROC_SLEEP
case you have just added in step 2. Then add a debug printing in theproc_sleep
function ofgrass/kernel.c
, so you know thatproc_sleep
is called byGPID_PROCESS
with the correctpid
andusec
arguments.Implement this
proc_sleep
function which should put processpid
into sleep forusec
microseconds. This involves a few modifications to the kernel.- Add one or more fields to
struct process
and initialize them inproc_alloc()
. - Modify such fields for process
pid
inproc_sleep()
. In addition to argumentusec
, you also needmtime_get()
which returns the clock time in millisecond (on QEMU). - Modify the for-loop in
proc_yield()
and schedule a process only if it is not sleeping according to the fields instruct process
and the clock time frommtime_get()
.
- Add one or more fields to
After implementing the
proc_sleep
function, the kernel could meet a situation where no process can be scheduled. You will need some code within theif(CORE_IDLE)
block ofproc_yield()
according to the comments there.Remove the debug printings. Run
sleep
again in the egos-2000 shell and you shall see theWoke up ...
printing 5 seconds after the first line of printing.
Protect the OS memory
By far, all the code we have seen runs in the so-called machine mode which means that the code can freely access the memory. However, user applications should not be able to freely read or write the memory. Otherwise, a malicious application can corrupt the memory of the kernel and cause damages. On the high-level, we now ask you to do 3 things:
- specify the memory region that code in the user mode is allowed to access
- run the code of all user applications in the user mode instead of machine mode
- terminate a user application if it triggers an exception by trying to read or write a memory address outside of the regions allowed
Setup a PMP region
Read through chapter 3.7 of the RISC-V reference manual for Physical Memory Protection (PMP) and then write your code in earth/cpu_mmu.c
:
void mmu_init() {
/* Setup a PMP region for the whole 4GB address space */
asm("csrw pmpaddr0, %0" : : "r"(0x40000000));
asm("csrw pmpcfg0, %0" : : "r"(0xF));
/* Student's code goes here (System Call & Protection). */
/* Setup PMP NAPOT region 0x80400000 - 0x80800000 as r/w/x */
/* Student's code ends here. */
...
}
TIP
Your code should overwrite the two CSRs pmpaddr0
and pmpcfg0
, so the 4GB region no longer takes effect and it is replaced by the 4MB region [0x80400000, 0x80800000)
. As a result, you will now only be able to choose software TLB
when booting egos-2000, but that's OK.
Specifically, code running in the user mode cannot access any memory region by default. After you finish the code above, code running in the user mode will be able to access one and only one memory region, [0x80400000, 0x80800000)
, which contains the code, data, heap and stack of the current process (i.e., everything a user application needs).
However, PMP won't take any effects if we still run everything in the machine mode, so we need to switch privilege modes when switching the CPU context from the kernel back to a user application process.
Switch privilege modes
In short, you need to update mstatus.MPP
in proc_yield
:
void proc_yield() {
...
/* Student's code goes here (Protection | Multicore & Locks). */
/* Modify mstatus.MPP to enter machine or user mode after mret. */
/* Student's code ends here. */
...
}
Recall that mstatus.MPP
stands for bit#11 and bit#12 of the mstatus
CSR:
You will need to set these bits as 11
if the next process scheduled is a kernel process (i.e., pid<GPID_USER_START
) and set them as 00
for all the other processes. In RISC-V, 0 stands for user mode and 3 (i.e., 11
in binary) stands for machine mode. To see how it works, we need to introduce more about what happens when entering and exiting the kernel.
Upon an interrupt or exception, the CPU enters the kernel and it automatically switches the privilege mode to machine mode right before jumping to the handler function trap_entry
. This allows the kernel to run in the machine mode and thus access the memory freely.
Upon executing mret
in grass/kernel.s
, the CPU exits the kernel and mret
will switch the privilege mode according to mstatus.MPP
. Therefore, if we set mstatus.MPP
to 00
in proc_yield
, the application code will run in the user mode after this mret
.
Kill malicious applications
To test if you correctly set the PMP region and switch privilege modes, we have provided 2 malicious applications crash1
and crash2
under apps/user
. The malicious applications would halt the whole operating system by corrupting the memory.
> make qemu
...
[CRITICAL] Choose a memory translation mechanism:
Enter 0: page tables
Enter 1: software TLB
[INFO] Software translation is chosen
...
[CRITICAL] Welcome to the egos-2000 shell!
➜ /home/yunhao crash1
_sbrk: heap grows too large
[FATAL] excp_entry: kernel got exception 7
Note that this FATAL
happens at the end of function excp_entry
. Your final task in P3 is to implement the following part of excp_entry
.
static void excp_entry(uint id) {
...
/* Student's code goes here (System Call & Protection). */
/* Kill the process if curr_pid is a user application. */
/* Student's code ends here. */
FATAL("excp_entry: kernel got exception %d", id);
}
After excp_entry
kills the malicious applications gracefully, you should see the following.
# Make sure to choose software TLB
> make qemu
...
> /home/yunhao crash1
_sbrk: heap grows too large
[INFO] process 6 terminated with exception 7
> /home/yunhao crash2
[INFO] process 7 terminated with exception 7
> /home/yunhao
In other words, memory protection should work and the malicious applications would run in the user mode and trigger memory exceptions when trying to corrupt the memory. And the kernel will kill these malicious applications when handling the exceptions.
Accomplishments
In terms of OS concepts, you have learned about exception handling, system calls, privilege modes and inter-process communication. In terms of code, you have read everything under grass
and library/syscall
. The grass
directory is the "kernel" part of egos-2000, i.e., the core logic of an operating system.
You will read earth/cpu_mmu.c
in P4. You will read everything under library/file
in P6. You will read earth/dev_tty.c
and earth/dev_disk.c
in P5. Then you will finish reading essentially all the code of egos-2000. We are half way there!