Preemptive Scheduler ​
This project helps you understand the scheduler of egos-2000. Most of the egos-2000 scheduler logic is in the proc_yield
function from grass/kernel.c
which, similar to the thread_yield
function in P1, chooses the next thread to be scheduled when the current thread yields. However, egos-2000 has a preemptive scheduler meaning that the current thread is forced to yield even if it does not directly invoke the yield function. This is made possible by timer interrupts, so we start by introducing what is timer.
The screenshot above is a timer in MacOS. You can set the timer with an initial value (e.g., 15 seconds) and then start the timer. After this time period elapsed, the timer will raise an interrupt (e.g., the radial ringtone). This project helps you understand how to do something similar with a RISC-V CPU and, instead of raising ring tones, the CPU will set the program counter to the code of a special function when the period elapsed. This special function is called an interrupt handler which will further invoke proc_yield
, switching the context of the CPU to a different thread in egos-2000.
TIP
We do not distinguish thread and process for now, so you can simply view the struct process
and proc_yield
from egos-2000 as the struct thread
and thread_yield
from P1. We will start to distinguish the two when introducing virtual memory in P4.
Timer interrupt ​
We start by explaining the following demo program which prints Got a timer interrupt.
repeatedly every QUANTUM
time units (e.g., microseconds).
#define QUANTUM 50000
int main() {
asm("csrw mtvec, %0" ::"r"(handler));
mtimecmp_set(mtime_get() + QUANTUM);
int mstatus_val, mie_val;
asm("csrr %0, mstatus" : "=r"(mstatus_val));
asm("csrr %0, mie" : "=r"(mie_val));
asm("csrw mstatus, %0" ::"r"(mstatus_val | 0x8));
asm("csrw mie, %0" ::"r"(mie_val | 0x80));
/* When receiving a timer interrupt, the CPU behaves as if a
call to handler() is inserted into the while loop below. */
while(1);
}
void handler() {
printf("Got a timer interrupt.");
mtimecmp_set(mtime_get() + QUANTUM);
/* When invoking mret, the CPU behaves as if handler()
is returning to the while(1) in main(), i.e., the
program counter when receiving the timer interrupt. */
asm("mret");
}
Register the handler ​
The first line of the main function registers function handler
as the interrupt handler. The C language syntax might look a bit scary, but its job is actually very simple.
First, mtvec
is a Control and Status Register (CSR), meaning that it has special purposes unlike the general-purpose registers. The special purpose for mtvec
is to hold the starting address of the interrupt handler. In other words, the CPU would set the program counter to the value held by mtvec
when receiving an interrupt. You can learn more about mtvec
in chapter 3.1.12 of the RISC-V reference mannual.
Second, csrw
and csrr
are two CPU instructions for writing and reading a CSR. In this case, the value written to mtvec
is handler
, a function pointer holding the address of the first instruction in function handler
. Suppose the first instruction of function handler
is at address 0x8000002C
, we can directly do asm("csrw mtvec, %0" ::"r"(0x8000002C))
as well. By using handler
, we ask the compiler to figure out this value for us. Specifically, this line of C code would likely be compiled to the following assembly:
lui a5,0x80000 # a5 is now 0x8000_0000
addi a5,a5,44 # a5 is now 0x8000_002C
csrw mtvec,a5 # write the value of a5 to mtvec
The compiler can also choose registers and instructions other than a5
, lui
and addi
as long as the address of handler
's first instruction is written to mtvec
.
Set a time period ​
The second line of the main function sets a time period of QUANTUM
just like we set a 15-second period in the screenshot. You can find function mtimecmp_set
and mtime_get
in earth/cpu_intr.c
of egos-2000. They are copy-pasted from the CPU reference mannual and their job is to read or write 8-byte values at special memory addresses.
Specifically, mtime
and mtimecmp
represent two special memory addresses. mtime_get
reads 8 bytes from mtime
representing how many time units have elapsed since the CPU was powered on. Hence, this 8-byte value at mtime
is automatically incremented by the CPU as time proceeds, and the return value of mtime_get
depends on when this function is invoked. When the value at mtime
equals to the value at mtimecmp
, a timer interrupt will be raised, which is why this line of code sets a period of QUANTUM
for the timer.
TIP
You may regard the timer as a simple device controlled by mtime
and mtimecmp
. In addition to mtime
and mtimecmp
, there exist other special memory regions that are used to control various devices connected to the CPU. We will learn about another such region in P5 in which we will use memory-mapped I/O to control an SD card device.
Enable timer interrupt ​
The main function then reads two CSRs, mstatus
and mie
, into variables mstatus_val
and mie_val
using the csrr
instruction. And it writes to the two CSRs by setting bit#3 of mstatus
and bit#7 of mie
to 1 using the bitwise OR operation because 0x8
is 1<<3
and 0x80
is 1<<7
. According to Figure 3.6 and 3.12 from the the RISC-V reference mannual, bit#3 of mstatus
is called mstatus.MIE
where MIE
stands for Machine Interrupt Enable, and bit#7 of mie
is mie.MTIE
where MTIE
stands for Machine Timer Interrupt Enable.
For the purpose of P2, it is enough to know that setting the mstatus.MIE
and mie.MTIE
bits will enable timer interrupts on the CPU, while setting either of them to 0 would disable timer interrupts. We recommend you to scan through chapter 3.1.6 and 3.1.14 of the RISC-V reference mannual for mstatus
and mie
, but don't feel obliged to understand everything. We will touch the other bits of these CSRs in the next few projects, especially mstatus
.
After enabling timer interrupts, the main function enters an infinite loop. When receiving a timer interrupt, the CPU would behave as if a function call to handler
is inserted into this loop, and the handler
function would reset the timer with QUANTUM
again. Therefore, we should expect to see Got a timer interrupt.
on the screen periodically and infinitely.
Use of mepc
and mret
​
When handling a timer interrupt, the CPU would assign the value of its program counter to the mepc
CSR right before jumping to handler
, so mepc
should point to an instruction for while(1);
when handler
starts to run. When handler
invokes mret
, the CPU assigns the value of mepc
back to the program counter, so the CPU would resume the while loop.
The use of mepc
and mret
has two implications:
Normal function calls use the
ra
register and theret
instruction for similar purposes. By usingmepc
andmret
, the CPU does not change the value of registerra
when the CPU invokes the handler function.Right before the CPU jumps to
handler
, CPU interrupt is automatically disabled. Whenhandler
invokesmret
, CPU interrupt is automatically re-enabled. In other words, we typically want to handler one interrupt without being disrupted by another interrupt.
Below is a screenshot of chapter 8.2.1 of this CPU document which describes more details.
TIP
Take the time to read this screenshot of the CPU document. It is important to understanding how OS works, and we will revisit this screenshot in P3 especially the two bullets about privilege mode. Also, think about why the first implication above (the one about not changing ra
) is important.
Get started ​
You will start by reading earth/cpu_intr.c
and grass/kernel.s
from the main
branch of egos-2000. By default, function trap_entry
is set as the interrupt handler, and here is the high-level control flow during a timer interrupt:
- The CPU sets its program counter to
trap_entry
. - Switch the stack pointer to the kernel stack (i.e.,
0x80400000
). - Save the context of the current process just like the
SAVE_ALL_REGISTERS
in P1. - Call function
kernel_entry
which eventually callsproc_yield
.
Spend some time reading the control flow from trap_entry
to proc_yield
. For the rest of P2, we will focus on proc_yield
since it is the key function that you need to modify.
Process life cycle ​
The life cycle of a process in egos-2000 starts with (1) being allocated by proc_alloc
in grass/process.c
, then (2) being scheduled to run for multiple times by proc_yield
in grass/kernel.c
, and finally (3) being freed by proc_free
in grass/process.c
.
The life cycle of a process is reflected by proc_status
defined in grass/process.h
.
enum proc_status {
PROC_UNUSED,
PROC_LOADING, /* wait for loading code&data into process memory */
PROC_READY, /* finish loading code&data and wait for first running */
PROC_RUNNING,
PROC_RUNNABLE,
PROC_PENDING_SYSCALL /* wait for certain system calls to finish */
};
From the scheduler's perspective, proc_status
can be classified as:
- Only the currently running process has status
PROC_RUNNING
. - Processes with status
PROC_READY
orPROC_RUNNABLE
can be scheduled later. - When a process terminates, its status is set to
PROC_UNUSED
. - Processes with status
PROC_LOADING
orPROC_PENDING_SYSCALL
cannot be scheduled.
Process control block ​
Similar to the TCB in P1, process control block (or PCB) is a data structure maintaining the metadata of each process. Again, there is no need to distinguish thread and process at this point. In egos-2000, PCB is essentially an array of struct process
.
struct process{
int pid;
struct syscall syscall;
enum proc_status status;
uint mepc, saved_register[SAVED_REGISTER_NUM];
/* Student's code goes here (preemptive scheduler)
* Create data structures that hold scheduling information. */
/* Student's code ends here. */
};
In addition to an array of struct process
, curr_proc_idx
defined in grass/kernel.c
is the index in the PCB array for the currently running process. We further provide curr_pid
and curr_status
for accessing the process ID and status of the current process.
In P2, you need to add new fields in struct process
in order to collect scheduling metrics and save necessary information for a better scheduler, which we will explain next.
Collect scheduling metrics ​
You will measure and calculate four scheduling metrics for each process and print them out when a process terminates.
- turnaround time: the time between process creation and exit.
- response time: the time between process creation and the first time being scheduled.
- CPU time: the accumulated time that the process is actually running on the CPU.
- the number of schedules: an integer indicating how many times the process's status is set to
PROC_RUNNING
by the scheculer before the process exits.
For the first three, you can measure time by calling mtime_get
twice. The difference of the two return values is the time elapsed. Recall that we have introduced mtime_get
previously for setting timer interrupts. Below are some useful hints.
Measure time in the
proc_alloc
andproc_free
functions fromgrass/process.c
as when a process has been created or terminated. Add new fields instruct process
for recording the measurements and make sure to use the correct types for these fields.Measure time or increment some counter at the right places of
proc_yield
in order to keep track of the four scheduling metrics. Print out the metrics inproc_free
.When finished coding, you should see something like
➜ /home/yunhao ls
./ ../ README
[INFO] proc 5 died after 2 yields, turnaround time: 0.29, response time: 0.23, cputime: 0.03
➜ /home/yunhao echo Hello, World!
Hello, World!
[INFO] proc 6 died after 1 yields, turnaround time: 0.14, response time: 0.13, cputime: 0.01
Print out the raw measurement divided by
QUANTUM
: theturnaround time: 0.29
above means a duration of0.29*QUANTUM
. Since egos-2000 does not provide formatted output for floating point numbers, you can calculate two intergers separately and use"%d.%d"
inprintf
for real numbers like0.29
.The numbers for turanaround/response/CPU time could vary on different machines and environments. You may also see different number of yields, but try to explain what you have printed out and convince yourself why those numbers are reasonable.
Multilevel feedback queue ​
The current scheduler is inefficient for latency-sensitive applications. To see this, create file apps/user/loop.c
with the following code.
#include "app.h"
int main() {
INFO("Start to run an infinite loop.");
while(1);
}
Then do make qemu
and run loop
in the background using &
just like in any other shell.
> make qemu
...
[CRITICAL] Welcome to the egos-2000 shell!
➜ /home/yunhao loop &
[INFO] process 5 running in the background
[INFO] Start to run an infinite loop.
➜ /home/yunhao ls
# You will wait for a while to see the result of ls
To see why ls
becomes much slower, print out the process schedule and you will see the following processes being scheduled before ls
starts to run:
GPID_TERMINAL
reads your input stringls
and sends it toGPID_SHELL
;GPID_SHELL
parses the command and sends the command toGPID_PROCESS
;GPID_PROCESS
exchanges a few messages withGPID_FILE
and finds the executable file forls
;GPID_PROCESS
then allocates a new process, loads the executable file forls
into the process memory, and marks this new process asPROC_READY
;- At some point,
proc_yield
schedules this new process as the next one to run.
According to library/syscall/servers.h
, the pid
of the processes mentioned above are 1, 2, 3 and 4. They are called system processes (or system servers) because they are launched by the OS and they provide essential OS services. Other processes such as loop
and ls
are instead user processes.
TIP
You could take a quick detour at this point and read the code of the four system servers in apps/system
. In particular, start with sys_process.c
which is the first process spawned in egos-2000. This first process will spawn GPID_TERMINAL
, GPID_FILE
and GPID_SHELL
one-by-one.
The reason that ls
becomes much slower above is that the loop
process is scheduled many times amid the scheduling of system processes. And when loop
is scheduled, it will continue to run for QUANTUM
of time until a timer interrupt. For better user experience, the scheduler could prioritize processes that are involved in human interaction, and one way to achieve this is called Multilevel Feedback Queue (MLFQ). In this project, you will implement a simple MLFQ scheduler:
- There are three queues representing high/mid/low priorities.
- The scheduler will always run processes in higher priority queues; Within each queue, the processes run in a round-robin fashion. When a new process is created, it is placed at the high-priority queue.
- Once a process used up a
QUANTUM
of CPU time, the scheduler downgrades the process into a lower-priority queue. Therefore, the high-priority queue holds processes with CPU time[0, QUANTUM)
; the mid-priority queue holds processes with CPU time[QUANTUM, QUANTUM*2)
; and the low-priority queue holds all the other processes.
Given this simple intuition, below are some implementation details.
- All the 3 queues hold processes with status
PROC_RUNNABLE
orPROC_READY
. Processes with statusPROC_PENDING_SYSCALL
should not be in a queue. - The current
proc_yield
iterates throughproc_set
and invokesproc_try_syscall
for processes that are waiting for system calls to complete; You need to keep this logic and, if a process becomesPROC_RUNNABLE
afterproc_try_syscall
, put it back to one of the three queues. In general, be careful with process status when updating the queues. - In general,
proc_set
holds all thestruct process
while the queues only hold pointers toproc_set
elements. For example, when enquing the currently running process to one of the queues, you should enqueueproc_set+curr_proc_idx
. - Modify
proc_yield
and find the next process to run based on the MLFQ algorithm. You also need to maintain the MLFQ data structures inproc_alloc
andproc_free
. - Keep the system processes (i.e.,
pid<GPID_USER_START
) in the high-priority queue. - If all the queues are empty, keep running the current process if it is runnable. Otherwise, do
FATAL("No more process can be scheduled.");
. This is unlikely to happen though.
The code below skeches how you may implement a simple MLFQ data structure.
/* Implement a queue as a ring buffer with buffer[head_idx] as the first
element in the queue (if size!=0) */
struct queue {
uint size, head_idx, tail_idx;
struct process * buffer[MAX_NPROCESS];
};
/* For enqueue, set q->tail_idx as (q->tail_idx+1)%MAX_NPROCESS and do
q->buffer[tail_idx] = p. Similarly, update q->head_idx for dequeue.
Remember to update q->size accordingly. */
void enqueue( struct queue * q, struct process * p );
void dequeue( struct queue * q, struct process * p );
/* Define MLFQ as a global variable, so it is initialized as all 0s */
struct queue MLFQ[3];
When finished, you should see ls
running much faster:
➜ /home/yunhao loop &
[INFO] process 5 running in the background
[INFO] Start to run an infinite loop.
➜ /home/yunhao ls
# You should see the result of ls immediately
Accomplishments ​
You have started to learn about control and status registers which is the key CPU support for operating systems. You will touch more CSRs in later projects. You have also gained a precise understanding of timer interrupts and, combined with what you have learned in P1, you have learned how preemptive scheduling works.