Preemptive Scheduler

This project helps you understand the scheduler of egos-2000. Most of the egos-2000 scheduler logic is in the proc_yield function from grass/kernel.c which, similar to the thread_yield function in P1, chooses the next thread to be scheduled when the current thread yields. However, egos-2000 has a preemptive scheduler meaning that the current thread is forced to yield even if it does not directly invoke the yield function. This is made possible by timer interrupts, so we start by introducing what is timer.

Failed to load picture

The screenshot above is a timer in MacOS. You can set the timer with an initial value (e.g., 15 seconds) and then start the timer. After this time period elapsed, the timer will raise an interrupt (e.g., the radial ringtone). This project helps you understand how to do something similar with a RISC-V CPU and, instead of raising ringtones, the CPU will set the program counter to the code of a special function when the period elapsed. This special function is called an interrupt handler which will further invoke proc_yield, switching the context of the CPU to a different thread in egos-2000.

TIP

We do not distinguish thread and process for now, so you can simply regard the struct process and proc_yield from egos-2000 as the struct thread and thread_yield from P1. We start to distinguish the two when introducing virtual memory in P4.

Timer interrupt

We start by explaining the following demo program which prints Got a timer interrupt. repeatedly every QUANTUM time units (e.g., microseconds).

#define QUANTUM 1000000

int main() {
    /* Register the handler. */
    asm("csrw mtvec, %0" ::"r"(handler));

    /* Set a timer period. */
    mtimecmp_set(mtime_get() + QUANTUM);

    /* Enable timer interrupt. */
    asm("csrs mie, %0" ::"r"(0x80));
    asm("csrs mstatus, %0" ::"r"(0x8));

    /* When receiving a timer interrupt, the CPU behaves as if a
       call to handler() is inserted into the while loop below. */
    while(1);
}

void handler() {
    printf("Got a timer interrupt.");
    mtimecmp_set(mtime_get() + QUANTUM);
    /* When invoking mret, the CPU behaves as if handler()
       is returning to the while loop in main() above. */
    asm("mret");
}

Register the handler

The first line of the main function registers function handler as the interrupt handler. The C language syntax might look a bit scary, but its job is actually very simple.

First, mtvec is a Control and Status Register (CSR), meaning that it has special purposes unlike the general-purpose registers. The special purpose of mtvec is to hold the starting address of the interrupt handler. In other words, the CPU would set the program counter to the value held by mtvec when receiving an interrupt. You can learn more about mtvec in chapter 3.1.7 of the RISC-V reference manual.

Second, csrw is the CPU instruction for writing a CSR and, in this case, the value written to the mtvec CSR is the address of the first instruction in function handler. Say this address is 0x8000002C, we can directly write asm("csrw mtvec, %0" ::"r"(0x8000002C)) as well. By using function pointer handler, the compiler will figure out this address for us. And this line of C code would likely be compiled to the following assembly:

asm

lui     a5,0x80000  # a5 is now 0x8000_0000
addi    a5,a5,44    # a5 is now 0x8000_002C
csrw    mtvec,a5    # write the value of a5 to mtvec

The compiler can also choose registers and instructions other than a5, lui and addi as long as the address of handler's first instruction is written to mtvec.

Set a timer period

The second line of the main function sets a timer period of QUANTUM just like we set a 15-second period in the screenshot. You can find function mtimecmp_set and mtime_get in earth/cpu_intr.c of egos-2000. They are copy-pasted from the CPU reference manual and their job is to read or write 8-byte values at special memory addresses.

Specifically, mtime and mtimecmp represent two special memory addresses. mtime_get reads 8 bytes from mtime representing how many time units have elapsed since the CPU was powered on. Hence, this 8-byte value at mtime is automatically incremented by the CPU as time proceeds, and the return value of mtime_get depends on when this function is invoked. When the value at mtime equals the value at mtimecmp, a timer interrupt will be raised, which explains why this line of code sets a timer period of QUANTUM.

TIP

You may regard the timer as a simple device controlled by mtime and mtimecmp. In addition to mtime and mtimecmp, there exist other special memory regions that are used to control various devices connected to the CPU. You will learn about another such region in P5 in which we will use memory-mapped I/O to control an SD card.

Enable timer interrupt

The main function then sets bit#3 of mstatus and bit#7 of mie to 1 (i.e., 0x8 is 1<<3, 0x80 is 1<<7) with the CSR set bit instruction csrs. According to Figure 7 and Figure 16 from the RISC-V reference manual, bit#3 of mstatus is called mstatus.MIE where MIE stands for Machine Interrupt Enable, and bit#7 of mie is called mie.MTIE where MTIE stands for Machine Timer Interrupt Enable.

Failed to load picture

For the purpose of P2, it is enough to know that setting the mstatus.MIE and mie.MTIE bits will enable timer interrupts on the CPU while setting either of them to 0 would disable timer interrupts. We recommend you to scan through chapter 3.1.6 and 3.1.9 of the RISC-V reference manual for mstatus and mie, but don't feel obliged to understand everything. We will explain the other bits of these CSRs in the next few projects.

After enabling timer interrupts, the main function enters an infinite loop. When receiving a timer interrupt, the CPU would behave as if a function call to handler is inserted into this loop, and the handler function would reset the timer with QUANTUM again. Therefore, we expect to see Got a timer interrupt. on the screen periodically and infinitely.

Usage of `mepc` and `mret`

Right before handling a timer interrupt, the CPU assigns the value of its program counter to a CSR called mepc, so mepc should point to an instruction for while(1); when handler starts to run. Reversely, when handler invokes mret, the CPU assigns the value of mepc back to the program counter, so the CPU will resume the while loop. Moreover, right before the CPU jumps to handler, interrupts are automatically disabled. When handler invokes mret, CPU interrupts will be automatically re-enabled. In other words, we typically hope to handle one interrupt without being disrupted by another interrupt.

Below is a screenshot of chapter 8.2.1 in this CPU document which describes the details.

Failed to load picture

TIP

Take your time to read the screenshot above. It is a very important part of how operating systems work. We will revisit this screenshot in P3, especially the bullets about privilege mode.

Process lifecycle and priority

The goal of this project is to help you understand preemptive scheduling, the application of timer interrupt to process scheduling. As mentioned at the beginning, we do not distinguish thread and process for now and we will use the two terms interchangeably.

The code for timer interrupt in egos-2000 is in earth/cpu_intr.c which is essentially the same as the timer demo program above but registers trap_entry as the handler function. The trap_entry defined in grass/kernel.s is the starting point of process scheduling.

Save and restore registers

At first glance, most of the code in trap_entry looks similar to the SAVE_ALL_REGISTERS and RESTORE_ALL_REGISTERS in P1. Indeed, the high-level goal is the same: save the value of all general-purpose registers on the stack when a thread yields, and restore the value of such registers when switching back to this thread.

asm

trap_entry:
    csrw mscratch, sp
    li sp, 0x80400000
    addi sp, sp, -128
    SAVE_ALL_REGISTERS    /* Save registers for the current thread. */
    call kernel_entry     /* Pick the next thread. */
    RESTORE_ALL_REGISTERS /* Restore registers for the next thread. */
    mret

There are two differences. First, egos-2000 saves the registers on the operating system's stack which starts from 0x80400000. Specifically, the 128-byte region under 0x80400000 is used to save the registers, indicated by the li and addi instructions above. Since they modify register sp, the old value of sp is written to a CSR called mscratch and later read back in the SAVE_ALL_REGISTERS part, right before saving this value on the stack.

The second difference is that mret is used at the end of trap_entry instead of ret. In P1, threads need to explicitly call thread_yield which further calls ctx_switch, so the value of register ra saved on the stack by ctx_switch is an address in thread_yield. With this value of ra, ctx_switch can return to thread_yield simply by calling ret. In P2, threads are preempted by a timer interrupt instead of calling any function voluntarily. Therefore, mepc and mret are used to save and restore the program counter.

The kernel_entry highlighted above is defined in grass/kernel.c and it does 3 things:

Copy mepc and the saved registers of the current thread into its process control block, a data structure similar to the thread control block in P1.
Pick the next thread by calling proc_yield, a function similar to thread_yield in P1.
Restore mepc and the saved registers for the next thread from its process control block.

Process control block

In addition to the mepc and saved_registers highlighted below, the process control block of egos-2000 is defined in grass/process.h and contains a few other fields:

a process identifier pid
a data structure for system calls which will be explained in P3
the status of a process which is closely related to the lifecycle of a process

struct process {
    int pid;
    struct syscall syscall;
    enum proc_status status;
    uint mepc, saved_registers[SAVED_REGISTER_NUM];
};

The lifecycle of a process is illustrated by the enum proc_status in grass/process.h.

enum proc_status {
    PROC_UNUSED,
    PROC_LOADING,
    PROC_READY,
    PROC_RUNNING,
    PROC_RUNNABLE,
    PROC_PENDING_SYSCALL  /* We will explain this one in P3. */
};

Specifically, the status in a struct process would experience the following stages:

When creating a new process, the status is set from PROC_UNUSED to PROC_LOADING. This happens in the proc_alloc function in grass/process.c.
The elf_load function in library/elf/elf.c loads the code and data for the process from disk into memory, and then the status is set from PROC_LOADING to PROC_READY. We will explain the details of elf_load in P4 and you can ignore it for now.
After loading the code and data, the proc_yield in grass/kernel.c will schedule this new process and set the status from PROC_READY to PROC_RUNNING.
Upon a timer interrupt, the control flow enters proc_yield again and, if it picks another process to run next, the status of the current process will be set from PROC_RUNNING to PROC_RUNNABLE. The status then switches back and forth between PROC_RUNNABLE and PROC_RUNNING because of executing proc_yield upon every timer interrupt.
When a process terminates, the proc_free function in grass/process.c will be called and proc_free sets the status back to PROC_UNUSED.

These bullets help you understand the lifecycle of a process. Your first task in this project is to collect some statistics about the lifecycle of every process.

Collect lifecycle statistics

Add new fields in struct process and update these fields at the right places, so you can collect the following statistics for every process.

turnaround time: the time between process creation and termination
response time: the time between process creation and the first time scheduled
CPU time: the accumulated time that the process is actually running on the CPU
number of timer interrupts encountered: how many times the CPU control flow transfers from the process code to trap_entry due to a timer interrupt

Process creation, termination and first schedule correspond to bullet 1, 5 and 3 in the above explanation for process lifecycles. To measure time, call the mtime_get function twice and the difference of the two return values is the time elapsed in microsecond (on QEMU). Note that mtime_get is also the function previously used by the timer demo program.

Print the statistics when a process terminates. Note that the ms below is millisecond (1000 microseconds), and you can assume that all the numbers printed will not overflow the int type (i.e., you can use %d within printf or INFO).

shell

➜ /home/yunhao echo Hello, World!
Hello, World!
[INFO] process 6 terminated after 0 timer interrupts, turnaround time: 174ms, response time: 170ms, CPU time: 1ms
➜ /home/yunhao ls
./     ../     README
[INFO] process 7 terminated after 0 timer interrupts, turnaround time: 233ms, response time: 215ms, CPU time: 1ms
➜ /home/yunhao loop 500 silent
[INFO] process 8 terminated after 12 timer interrupts, turnaround time: 1515ms, response time: 233ms, CPU time: 1280ms
➜ /home/yunhao loop 500
loop #0
loop #1
...
loop #499
[INFO] process 9 terminated after 0 timer interrupts, turnaround time: 2269ms, response time: 240ms, CPU time: 1670ms

TIP

Try to explain the different statistics between loop 500 and loop 500 silent. For example, why loop 500 encounters no timer interrupt and loop 500 silent encounters 12, although the CPU time is close? Note that QEMU can run faster or slower on different machines, so you can see time statistics different from above. The above statistics are printed by running egos-2000 on MacBook Air (M2, 2022) and QEMU v7.2.5 released by xPack.

Multilevel feedback queue

Your final task in this project is to implement the notion of process priority in the scheduler, which is also known as the multilevel feedback queue (MLFQ) scheduler. Specifically, you need to implement a variant of MLFQ with simple rules:

There are 5 levels, level #0 to #4. All processes start at level #0 when created.
Processes at level #i can run on the CPU for (i+1)*100 milliseconds. For example, if a process at level #0 has run on the CPU for more than 100ms, it will move to level #1. After running for another 200ms, it will move to level #2. Since there are 5 levels, all processes at level #4 will remain at this level without moving further.
The scheduler in proc_yield should always choose a runnable process with the lowest level number (i.e., highest priority) as the next process.
Every 10 seconds or so, the scheduler in proc_yield should move all processes to level #0. Also, if there is keyboard input for the shell, move GPID_SHELL to level #0.

Some related definitions have been provided in egos-2000:

#define MLFQ_NLEVELS          5
#define MLFQ_RESET_PERIOD     10000000         /* 10 seconds */
#define MLFQ_LEVEL_RUNTIME(x) (x + 1) * 100000 /* e.g., 100ms for level 0 */

void mlfq_update_level(struct process* p, ulonglong runtime) {
    /* Your code goes here. */
}

void mlfq_reset_level() {
    if (!earth->tty_input_empty()) {
       /* Your code goes here. */
    }
    static ulonglong MLFQ_last_reset_time = 0;
    /* Your code goes here. */
}

You can implement the MLFQ scheduler by implementing the two functions above and then call these functions in proc_yield. Specifically, add new fields in struct process which record the level and remaining runtime on the level. Assuming that a process p has run for another runtime microseconds, mlfq_update_level updates such fields according to the rule #2 above. When calling this function, reuse your time measurement for CPU time.

And the mlfq_reset_level function moves processes to level #0 according to the rule #4 above. Use earth->tty_input_empty() to see whether the shell gets new keyboard input and use the MLFQ_last_reset_time variable to see whether an MLFQ_RESET_PERIOD has elapsed since the last time moving all processes to level #0.

Lastly, you need to modify the for loop in proc_yield() according to the rule #3 above. Make sure to call proc_try_syscall() for processes with status PROC_PENDING_SYSCALL just like the current code. After you finish, test your code with the shell commands below.

shell

> make qemu
...
[CRITICAL] Welcome to the egos-2000 shell!
➜ /home/yunhao loop &
[INFO] process 6 running in the background
➜ /home/yunhao loop &
[INFO] process 7 running in the background
➜ /home/yunhao ls
./     ../     README
➜ /home/yunhao killall
➜ /home/yunhao ls
./     ../     README

You can certainly run the commands above with the current scheduler and with your MLFQ scheduler. With the current scheduler, the first ls would take a long time because this ls is interleaved with the background loops. After the killall, ls would become fast again. With your MLFQ scheduler, the two background loops should quickly move to lower priority, so the first ls should be fast just like the second ls. You can confirm such improvement made by MLFQ using the turnaround time printed when ls terminates.

Accomplishments

You have started to learn about control and status registers which is the key CPU support for operating systems. You will touch more CSRs in later projects. You have also finished reading earth/cpu_intr.c, grass/kernel.s and half of grass/kernel.c in egos-2000.

Preemptive Scheduler ​

Timer interrupt ​

Register the handler ​

Set a timer period ​

Enable timer interrupt ​

Usage of mepc and mret ​

Process lifecycle and priority ​

Save and restore registers ​

Process control block ​

Collect lifecycle statistics ​

Multilevel feedback queue ​

Accomplishments ​

Preemptive Scheduler

Timer interrupt

Register the handler

Set a timer period

Enable timer interrupt

Usage of `mepc` and `mret`

Process lifecycle and priority

Save and restore registers

Process control block

Collect lifecycle statistics

Multilevel feedback queue

Accomplishments