System Call & Protection ​
By far, you have learned about how an operating system manages the life cycle of threads (or processes, as we don't distinguish the two right now), and schedules multiple threads in a preemptive fashion. This project helps you understand two more things:
- how threads invoke system calls in order to communicate with each other
- how an operating system protects its memory so that threads cannot corrupt the system calls by modifying the code or data of the operating system
System calls and memory protection rely on exception handling which is very similar to the concept of interrupt handling from P2. We thus start by introducing exception handling.
Exception handling ​
An exception happens if something goes wrong when the CPU executes an instruction. For example, an exception would happen when an instruction tries to access the memory at an invalid address. Instead of ignoring the problem and proceeding to the next instruction, the CPU automatically jumps to a special function called an exception handler, just like how the CPU jumps to the interrupt handler when receiving a timer interrupt. Indeed, we will use the same function to handle both interrupts and exceptions and use the mcause
CSR explained below to identify what has caused the CPU to call the handler function.
The mcause
CSR ​
Below is a screenshot of Table 22 and Table 23 from this CPU document which describe an important CSR, mcause
, designed for exception and interrupt handling. Upon an exception or interrupt, the CPU sets the value of mcause
before jumping to the handler function.
For example, mcause
is set to 0x80000007
when the CPU receives a timer interrupt. Bit#31 of mcause
is set to 1 because timer interrupt is an interrupt, and the lower bits are set to 7 because 7 is the code identifying machine timer interrupt. Similarly, if the CPU encounters an illegal instruction (i.e., the 4 bytes pointed by the program counter cannot decode into a CPU instruction), mcause
will be set to 0x2
before the CPU jumps to the handler function.
TIP
Exceptions are different from interrupts. Exceptions are triggered by CPU instructions that cause something wrong. Interrupts are triggered by devices outside of the CPU, such as a timer, a disk, or a network interface controller. The similarity is that both need to be handled by the operating system, and the mcause
CSR helps the operating system see what needs to be handled.
The ecall
exception ​
Most exceptions happen due to something wrong such as invalid memory access. When the code of a thread triggers an exception, the operating system typically terminates this thread and prints out an error message accordingly.
However, RISC-V provides a special instruction, ecall
, which will trigger the environment call exception intentionally (i.e., exception #8 or #11 in Table 23). This is the CPU instruction for system calls: when a thread invokes ecall
and triggers this exception, the operating system would serve a system call for this thread instead of terminating it.
In egos-2000, this happens in library/syscall/syscall.c
. Specifically, asm("ecall")
in the code will raise an environment call exception. The CPU then sets the program counter to trap_entry
as we have discussed in P2. trap_entry
further calls kernel_entry
, and kernel_entry
calls excp_entry
, where system calls are handled within the if statement with condition (id >= EXCP_ID_ECALL_U && id <= EXCP_ID_ECALL_M)
. EXCP_ID_ECALL_U
and EXCP_ID_ECALL_M
are defined as 8 and 11 respectively according to Table 23. Take a look at how mcause
has been read and used in trap_entry
and kernel_entry
.
TIP
The U-Mode and M-mode in Table 23 stand for user mode and machine mode. We will touch these privilege modes very soon when we start to explain memory protection.
A sketch of the "kernel" ​
We have been using the word "kernel" since P2, but we never explain what is an OS kernel. With the knowledge of mcause
, we are ready to show you a sketch of the "kernel".
void kernel() {
int mcause_val, id;
asm("csrr %0, mcause" : "=r"(mcause_val));
id = mcause_val & 0x3FF;
if (mcause_val & (1 << 31)) {
if (id == 7) proc_yield();
} else {
if (id >= 8 && id <= 11) handle_system_call();
if (id == 1 || id == 5 || id == 7) handle_memory_access_fault();
}
}
The code above sketches the core of an operating system:
- handle thread scheduling upon a timer interrupt
- handle system calls when a thread invokes
ecall
- handle other exceptions such as memory access faults
We call it a sketch because a complete operating system needs to handle all the interrupts and exceptions, while the 3 bullets above are the most important ones to handle. You have played with thread scheduling in P2, and P3 will give you hands-on experiences about the last two aspects of an OS kernel.
Inter-process communication ​
In egos-2000, there are only 2 types of system calls which are designed for inter-process communication (i.e., sending and receiving messages). We now introduce the system call interface for applications and then explain what happens within the OS kernel.
Application-side interface ​
The code below is from library/syscall/syscall.h
and it defines the data structures for system calls in egos-2000.
enum syscall_type {
SYS_UNUSED,
SYS_RECV, /* 1 */
SYS_SEND, /* 2 */
};
struct syscall {
enum syscall_type type; /* SYS_SEND or SYS_RECV */
int sender; /* sender process ID */
int receiver; /* receiver process ID */
char content[SYSCALL_MSG_LEN];
enum {PENDING, DONE} status;
};
The content
field holds the message being sent or received. Say process A wants to send a message to process B through SYS_SEND
, this system call may not succeed immediately. It will only succeed after process B invokes the SYS_RECV
system call with sender
being process A, meaning that process B is ready to receive a message from process A. For this reason, before process B invokes the SYS_RECV
system call, the SYS_SEND
system call by process A is in status PENDING
instead of DONE
.
TIP
In other words, egos-2000 implements a blocking version of inter-process communication such that a system call would only return after a message has been successfully sent or received. It is certainly possible to implement a non-blocking version such that system calls return immediately. egos-2000 implements the blocking version just for simplicity of the code.
With struct syscall
in mind, the system call interface sys_send
and sys_recv
defined in library/syscall/syscall.c
should be easy to understand.
static struct syscall* sc = (struct syscall*)SYSCALL_ARG;
void sys_send(int receiver, char* msg, uint size) {
sc->type = SYS_SEND;
sc->receiver = receiver;
memcpy(sc->content, msg, size);
asm("ecall");
}
void sys_recv(int from, int* sender, char* buf, uint size) {
sc->sender = from;
sc->type = SYS_RECV;
asm("ecall");
memcpy(buf, sc->content, size);
if (sender) *sender = sc->sender;
}
Again, the ecall
instructions highlighted above will trigger an environment call exception, so the CPU will jump to the exception handler right after the ecall
instruction.
Kernel-side handling ​
As we have seen in P2, the exception handler function in egos-2000, trap_entry
, will call kernel_entry
which further calls excp_entry
for the environment call exception. We now explain the first few lines of excp_entry
.
if (id >= EXCP_ID_ECALL_U && id <= EXCP_ID_ECALL_M) {
proc_set[curr_proc_idx].mepc += 4;
memcpy(&proc_set[curr_proc_idx].syscall, (void*)SYSCALL_ARG, sizeof(struct syscall));
proc_set[curr_proc_idx].syscall.status = PENDING;
proc_try_syscall(&proc_set[curr_proc_idx]);
proc_yield();
return;
}
First of all, proc_set[curr_proc_idx]
is the struct process
in the PCB representing the current process which has just invoked the system call.
Recall that
mepc
stands for the program counter when the exception occurs, and the value ofmepc
is read intoproc_set[curr_proc_idx].mepc
in functionkernel_entry
. Therefore, line#2 says that, after the system call is done, the kernel should return to the instruction right afterecall
in this process (i.e., skip the 4-byte instructionecall
).Note that the process initialized the
struct syscall
data structure at memory addressSYSCALL_ARG
. Line#3 copies this data structure into the PCB and line#4 sets the system call status asPENDING
. Line#5 tries to handle the system call and, as we just mentioned, the system call may not succeed immediately. Line#6 finds the next process to schedule just like what you have learned in P2.You can also find
proc_try_syscall
inproc_yield
because the scheduler will attempt a pending system call repeatedly until it succeeds.
Please read proc_try_syscall
, proc_try_send
and proc_try_recv
yourself. There are only 40 lines of code, but they gracefully handle the message passing between processes. On the high-level, if a process has a pending system call, proc_try_syscall
will attempt the system call and set the process status as PROC_RUNNABLE
or PROC_PENDING_SYSCALL
according to whether the attempt succeeds or not.
TIP
At this point, you have finished reading grass/kernel.c
, including kernel_entry
, intr_entry
, excp_entry
, proc_yield
, proc_try_syscall
, proc_try_send
and proc_try_recv
.
Syscall for process sleep ​
After getting familiar with the system call control flow, we now ask you to use system calls to enable process sleep. As shown in library/syscall.servers.h
, the GPID_PROCESS
process in egos-2000 accepts 3 message types for spawning and terminating processes. Your job is to add a fourth message type PROC_SLEEP
to struct proc_request
so that the process who sends this message to GPID_PROCESS
will sleep for a certain amount of clock ticks before being scheduled again. Specifically, you will
Start with a fresh copy of egos-2000.
Modify
struct process
to record how many clock ticks a process needs to wait before it finishes sleeping. Initialize this counter as 0 inproc_alloc
.Modify
proc_yield
or other parts of the scheduler to update the counters upon a timer interrupt and only schedule threads that no longer need to sleep anymore.Add a function
proc_sleep(pid, nticks)
in bothgrass/kernel.c
andstruct grass
, so applications can invokegrass->proc_sleep
and set the counter forpid
in the PCB. Note thatstruct grass
is initialized ingrass/init.c
.Modify
apps/system/sys_process.c
to handle the newPROC_SLEEP
message by callinggrass->proc_sleep
withpid
beingsender
.Add a helper function
sleep(nticks)
inlibrary/syscall/servers.c
which prepares aPROC_SLEEP
message containingnticks
and sends it to processGPID_PROCESS
. Other functions inlibrary/syscall/servers.c
do similar things for different system servers.Lastly, add an application
apps/user/sleep.c
to test thesleep
helper funciton.
#include "app.h"
const int nticks = 1000; /* You may adjust this number. */
int main() {
printf("Start to sleep for %d ticks\n\r", nticks);
sleep(nticks);
printf("Wake up after sleeping for %d ticks\n\r", nticks);
}
- You should not encounter a situation where no process can be scheduled. The reason is that
GPID_TERMINAL
should always be able to run and it never callssleep
.
TIP
At this point, you should have a full picture of system calls from applications to helper functions in library/syscall/servers.c
, to the OS kernel, and lastly to the system servers in apps/system
.
Protect the OS memory ​
By far, all the code we have seen runs in the so-called machine mode which means that the code can freely access the memory. However, user applications should not be able to freely read/write the memory. Otherwise, a malicious application could corrupt the memory of the kernel and cause damages. On the high-level, we now ask you to do 3 things.
- specify the memory region that code in the user mode is allowed to read/write
- run the code of all user applications in the user mode instead of the machine mode
- terminate a user application if it triggers an exception by trying to read/write an address outside of the memory region allowed for the user mode
You will only touch two privilege modes in P3 and you will learn about a third one called the supervisor mode in P4. In terms of privilege, machine > supervisor > user.
Setup a PMP region ​
Read through chapter 3.6 of the RISC-V reference mannual for Physical Memory Protection (PMP) and then write your code in earth/cpu_mmu.c
:
void mmu_init() {
...
/* Student's code goes here (PMP memory protection). */
/* Setup PMP NAPOT region 0x80400000 - 0x80800000 as r/w/x */
/* Student's code ends here. */
...
}
Specifically, code running in the user mode cannot access any memory region by default. After you finish the code in mmu_init
, code running in the user mode will be able to read/write one and only one memory region, [0x80400000, 0x80800000)
, which holds the code, data, heap and stack of the currently running process.
However, PMP won't take any effects if we still run everything in the machine mode, so we need to switch privilege modes when switching the CPU context from the kernel back to a user application.
Switch privilege modes ​
In short, you need to update mstatus.MPP
in proc_yield
:
void proc_yield() {
...
/* Student's code goes here (PMP, page table translation, and multi-core). */
/* Modify mstatus.MPP to enter machine or user mode during mret
* depending on whether curr_pid is a grass server or a user app
*/
/* Student's code ends here. */
...
}
Recall that mstatus.MPP
stands for bit#11 and bit#12 of the mstatus
CSR.
You will need to set these bits as 0b11
if the next process scheduled is a kernel process (i.e., pid<GPID_USER_START
) and set them as 0b00
for all the other processes. In RISC-V, 0b00
stands for user mode and 0b11
stands for machine mode. To see why it works, let us revise what happens when entering and exiting the kernel.
Upon an interrupt or exception, the CPU enters the kernel and it automatically switches the privilege mode to machine mode right before jumping to the handler function trap_entry
. This allows the kernel to run in the machine mode and thus access the memory freely.
Upon executing mret
in grass/kernel.s
, the CPU exits the kernel and mret
will switch the privilege mode according to mstatus.MPP
. Therefore, if we set mstatus.MPP
to 0b00
in proc_yield
, the application code will run in the user mode after executing mret
.
Kill malicious applications ​
To test if you correctly set the PMP region and switch privilege modes, we have provided 2 malicious applications crash1
and crash2
under apps/user
. The malicious applications would halt the whole operating system by corrupting the memory.
# Make sure to choose software TLB
> make qemu
...
[CRITICAL] Choose a memory translation mechanism:
Enter 0: page tables
Enter 1: software TLB
[INFO] Software translation is chosen
...
[CRITICAL] Welcome to the egos-2000 shell!
➜ /home/yunhao crash1
_sbrk: heap grows too large
[FATAL] excp_entry: kernel got exception 7
Note that the FATAL
happens at the end of function excp_entry
. The final coding task in this project is to implement the following part of excp_entry
.
static void excp_entry(uint id) {
...
/* Student's code goes here (system call and memory exception). */
/* Kill the process if curr_pid is a user application */
/* Student's code ends here. */
FATAL("excp_entry: kernel got exception %d", id);
}
After excp_entry
kills the malicious applications gracefully, you should see the following.
# Make sure to choose software TLB
> make qemu
...
> /home/yunhao crash1
_sbrk: heap grows too large
[INFO] process 6 terminated with exception 7
> /home/yunhao crash2
[INFO] process 7 terminated with exception 7
> /home/yunhao
In other words, memory protection should work and the malicious applications would run in the user mode and trigger memory exceptions when trying to corrupt the memory. And the kernel kills these malicious applications when handling the exceptions.
Accomplishments ​
In terms of OS concepts, you have learned about exception handling, system calls, privilege modes and inter-process communication. In terms of code, you have read everything under grass
and library/syscall
. The grass
directory is the "kernel" part of egos-2000, i.e., the core logic of an operating system.
You will read earth/cpu_mmu.c
in P4. You will read everything under library/file
in P6. You will read earth/dev_tty.c
and earth/dev_disk.c
in P5. Then you will finish reading essentially all the code of egos-2000. We are half way there!