Skip to content

Preemptive Scheduler ​

This project helps you understand the scheduler of egos-2000. Most of the egos-2000 scheduler logic is in the proc_yield function from grass/kernel.c which, similar to the thread_yield function in P1, chooses the next thread to be scheduled when the current thread yields. However, egos-2000 has a preemptive scheduler meaning that the current thread is forced to yield even if it does not directly invoke the yield function. This is made possible by timer interrupts, so we start by introducing what is timer.

Failed to load picture

The screenshot above is a timer in MacOS. You can set the timer with an initial value (e.g., 15 seconds) and then start the timer. After this time period elapsed, the timer will raise an interrupt (e.g., the radial ringtone). This project helps you understand how to do something similar with a RISC-V CPU and, instead of raising ring tones, the CPU will set the program counter to the code of a special function when the period elapsed. This special function is called an interrupt handler which will further invoke proc_yield, switching the context of the CPU to a different thread in egos-2000.

TIP

We do not distinguish thread and process for now, so you can simply view the struct process and proc_yield from egos-2000 as the struct thread and thread_yield from P1. We will start to distinguish the two when introducing virtual memory in P4.

Timer interrupt ​

We start by explaining the following demo program which prints Got a timer interrupt. repeatedly every QUANTUM time units (e.g., microseconds).

c
#define QUANTUM 50000

int main() {
    asm("csrw mtvec, %0" ::"r"(handler));
    mtimecmp_set(mtime_get() + QUANTUM);

    int mstatus_val, mie_val;
    asm("csrr %0, mstatus" : "=r"(mstatus_val));
    asm("csrr %0, mie" : "=r"(mie_val));
    asm("csrw mstatus, %0" ::"r"(mstatus_val | 0x8));
    asm("csrw mie, %0" ::"r"(mie_val | 0x80));

    /* When receiving a timer interrupt, the CPU behaves as if a
       call to handler() is inserted into the while loop below. */
    while(1);
}

void handler() {
    printf("Got a timer interrupt.");
    mtimecmp_set(mtime_get() + QUANTUM);
    /* When invoking mret, the CPU behaves as if handler()
       is returning to the while(1) in main(), i.e., the
       program counter when receiving the timer interrupt. */
    asm("mret");
}

Register the handler ​

The first line of the main function registers function handler as the interrupt handler. The C language syntax might look a bit scary, but its job is actually very simple.

First, mtvec is a Control and Status Register (CSR), meaning that it has special purposes unlike the general-purpose registers. The special purpose for mtvec is to hold the starting address of the interrupt handler. In other words, the CPU would set the program counter to the value held by mtvec when receiving an interrupt. You can learn more about mtvec in chapter 3.1.12 of the RISC-V reference mannual.

Second, csrw and csrr are two CPU instructions for writing and reading a CSR. In this case, the value written to mtvec is handler, a function pointer holding the address of the first instruction in function handler. Suppose the first instruction of function handler is at address 0x8000002C, we can directly do asm("csrw mtvec, %0" ::"r"(0x8000002C)) as well. By using handler, we ask the compiler to figure out this value for us. Specifically, this line of C code would likely be compiled to the following assembly:

asm
lui     a5,0x80000  # a5 is now 0x8000_0000
addi    a5,a5,44    # a5 is now 0x8000_002C
csrw    mtvec,a5    # write the value of a5 to mtvec

The compiler can also choose registers and instructions other than a5, lui and addi as long as the address of handler's first instruction is written to mtvec.

Set a time period ​

The second line of the main function sets a time period of QUANTUM just like we set a 15-second period in the screenshot. You can find function mtimecmp_set and mtime_get in earth/cpu_intr.c of egos-2000. They are copy-pasted from the CPU reference mannual and their job is to read or write 8-byte values at special memory addresses.

Specifically, mtime and mtimecmp represent two special memory addresses. mtime_get reads 8 bytes from mtime representing how many time units have elapsed since the CPU was powered on. Hence, this 8-byte value at mtime is automatically incremented by the CPU as time proceeds, and the return value of mtime_get depends on when this function is invoked. When the value at mtime equals to the value at mtimecmp, a timer interrupt will be raised, which is why this line of code sets a period of QUANTUM for the timer.

TIP

You may regard the timer as a simple device controlled by mtime and mtimecmp. In addition to mtime and mtimecmp, there exist other special memory regions that are used to control various devices connected to the CPU. We will learn about another such region in P5 in which we will use memory-mapped I/O to control an SD card device.

Enable timer interrupt ​

The main function then reads two CSRs, mstatus and mie, into variables mstatus_val and mie_val using the csrr instruction. And it writes to the two CSRs by setting bit#3 of mstatus and bit#7 of mie to 1 using the bitwise OR operation because 0x8 is 1<<3 and 0x80 is 1<<7. According to Figure 3.6 and 3.12 from the the RISC-V reference mannual, bit#3 of mstatus is called mstatus.MIE where MIE stands for Machine Interrupt Enable, and bit#7 of mie is mie.MTIE where MTIE stands for Machine Timer Interrupt Enable.

Failed to load pictureFailed to load picture

For the purpose of P2, it is enough to know that setting the mstatus.MIE and mie.MTIE bits will enable timer interrupts on the CPU, while setting either of them to 0 would disable timer interrupts. We recommend you to scan through chapter 3.1.6 and 3.1.14 of the RISC-V reference mannual for mstatus and mie, but don't feel obliged to understand everything. We will touch the other bits of these CSRs in the next few projects, especially mstatus.

After enabling timer interrupts, the main function enters an infinite loop. When receiving a timer interrupt, the CPU would behave as if a function call to handler is inserted into this loop, and the handler function would reset the timer with QUANTUM again. Therefore, we should expect to see Got a timer interrupt. on the screen periodically and infinitely.

Use of mepc and mret ​

When handling a timer interrupt, the CPU would assign the value of its program counter to the mepc CSR right before jumping to handler, so mepc should point to an instruction for while(1); when handler starts to run. When handler invokes mret, the CPU assigns the value of mepc back to the program counter, so the CPU would resume the while loop.

The use of mepc and mret has two implications:

  1. Normal function calls use the ra register and the ret instruction for similar purposes. By using mepc and mret, the CPU does not change the value of register ra when the CPU invokes the handler function.

  2. Right before the CPU jumps to handler, CPU interrupt is automatically disabled. When handler invokes mret, CPU interrupt is automatically re-enabled. In other words, we typically want to handler one interrupt without being disrupted by another interrupt.

Below is a screenshot of chapter 8.2.1 of this CPU document which describes more details.

Failed to load picture

TIP

Take the time to read this screenshot of the CPU document. It is important to understanding how OS works, and we will revisit this screenshot in P3 especially the two bullets about privilege mode. Also, think about why the first implication above (the one about not changing ra) is important.

Get started ​

You will start by reading earth/cpu_intr.c and grass/kernel.s from the main branch of egos-2000. By default, function trap_entry is set as the interrupt handler, and here is the high-level control flow during a timer interrupt:

  1. The CPU sets its program counter to trap_entry.
  2. Switch the stack pointer to the kernel stack (i.e., 0x80400000).
  3. Save the context of the current process just like the SAVE_ALL_REGISTERS in P1.
  4. Call function kernel_entry which eventually calls proc_yield.

Spend some time reading the control flow from trap_entry to proc_yield. For the rest of P2, we will focus on proc_yield since it is the key function that you need to modify.

Process life cycle ​

The life cycle of a process in egos-2000 starts with (1) being allocated by proc_alloc in grass/process.c, then (2) being scheduled to run for multiple times by proc_yield in grass/kernel.c, and finally (3) being freed by proc_free in grass/process.c.

The life cycle of a process is reflected by proc_status defined in grass/process.h.

c
enum proc_status {
    PROC_UNUSED,
    PROC_LOADING, /* wait for loading code&data into process memory */
    PROC_READY, /* finish loading code&data and wait for first running */
    PROC_RUNNING,
    PROC_RUNNABLE,
    PROC_PENDING_SYSCALL /* wait for certain system calls to finish */
};

From the scheduler's perspective, proc_status can be classified as:

  • Only the currently running process has status PROC_RUNNING.
  • Processes with status PROC_READY or PROC_RUNNABLE can be scheduled later.
  • When a process terminates, its status is set to PROC_UNUSED.
  • Processes with status PROC_LOADING or PROC_PENDING_SYSCALL cannot be scheduled.

Process control block ​

Similar to the TCB in P1, process control block (or PCB) is a data structure maintaining the metadata of each process. Again, there is no need to distinguish thread and process at this point. In egos-2000, PCB is essentially an array of struct process.

c
struct process{
    int pid;
    struct syscall syscall;
    enum proc_status status;
    uint mepc, saved_register[SAVED_REGISTER_NUM];

    /* Student's code goes here (preemptive scheduler)
     * Create data structures that hold scheduling information. */

    /* Student's code ends here. */
};

In addition to an array of struct process, curr_proc_idx defined in grass/kernel.c is the index in the PCB array for the currently running process. We further provide curr_pid and curr_status for accessing the process ID and status of the current process.

In P2, you need to add new fields in struct process in order to collect scheduling metrics and save necessary information for a better scheduler, which we will explain next.

Collect scheduling metrics ​

You will measure and calculate four scheduling metrics for each process and print them out when a process terminates.

  1. turnaround time: the time between process creation and exit.
  2. response time: the time between process creation and the first time being scheduled.
  3. CPU time: the accumulated time that the process is actually running on the CPU.
  4. the number of schedules: an integer indicating how many times the process's status is set to PROC_RUNNING by the scheculer before the process exits.

For the first three, you can measure time by calling mtime_get twice. The difference of the two return values is the time elapsed. Recall that we have introduced mtime_get previously for setting timer interrupts. Below are some useful hints.

  • Measure time in the proc_alloc and proc_free functions from grass/process.c as when a process has been created or terminated. Add new fields in struct process for recording the measurements and make sure to use the correct types for these fields.

  • Measure time or increment some counter at the right places of proc_yield in order to keep track of the four scheduling metrics. Print out the metrics in proc_free.

  • When finished coding, you should see something like

shell
➜ /home/yunhao ls
./     ../     README
[INFO] proc 5 died after 2 yields, turnaround time: 0.29, response time: 0.23, cputime: 0.03
➜ /home/yunhao echo Hello, World!
Hello, World!
[INFO] proc 6 died after 1 yields, turnaround time: 0.14, response time: 0.13, cputime: 0.01
  • Print out the raw measurement divided by QUANTUM: the turnaround time: 0.29 above means a duration of 0.29*QUANTUM. Since egos-2000 does not provide formatted output for floating point numbers, you can calculate two intergers separately and use "%d.%d" in printf for real numbers like 0.29.

  • The numbers for turanaround/response/CPU time could vary on different machines and environments. You may also see different number of yields, but try to explain what you have printed out and convince yourself why those numbers are reasonable.

Multilevel feedback queue ​

The current scheduler is inefficient for latency-sensitive applications. To see this, create file apps/user/loop.c with the following code.

c
#include "app.h"

int main() {
    INFO("Start to run an infinite loop.");
    while(1);
}

Then do make qemu and run loop in the background using & just like in any other shell.

shell
> make qemu
...
[CRITICAL] Welcome to the egos-2000 shell!
➜ /home/yunhao loop &
[INFO] process 5 running in the background
[INFO] Start to run an infinite loop.
➜ /home/yunhao ls
# You will wait for a while to see the result of ls

To see why ls becomes much slower, print out the process schedule and you will see the following processes being scheduled before ls starts to run:

  1. GPID_TERMINAL reads your input string ls and sends it to GPID_SHELL;
  2. GPID_SHELL parses the command and sends the command to GPID_PROCESS;
  3. GPID_PROCESS exchanges a few messages with GPID_FILE and finds the executable file for ls; GPID_PROCESS then allocates a new process, loads the executable file for ls into the process memory, and marks this new process as PROC_READY;
  4. At some point, proc_yield schedules this new process as the next one to run.

According to library/syscall/servers.h, the pid of the processes mentioned above are 1, 2, 3 and 4. They are called system processes (or system servers) because they are launched by the OS and they provide essential OS services. Other processes such as loop and ls are instead user processes.

TIP

You could take a quick detour at this point and read the code of the four system servers in apps/system. In particular, start with sys_process.c which is the first process spawned in egos-2000. This first process will spawn GPID_TERMINAL, GPID_FILE and GPID_SHELL one-by-one.

The reason that ls becomes much slower above is that the loop process is scheduled many times amid the scheduling of system processes. And when loop is scheduled, it will continue to run for QUANTUM of time until a timer interrupt. For better user experience, the scheduler could prioritize processes that are involved in human interaction, and one way to achieve this is called Multilevel Feedback Queue (MLFQ). In this project, you will implement a simple MLFQ scheduler:

  1. There are three queues representing high/mid/low priorities.
  2. The scheduler will always run processes in higher priority queues; Within each queue, the processes run in a round-robin fashion. When a new process is created, it is placed at the high-priority queue.
  3. Once a process used up a QUANTUM of CPU time, the scheduler downgrades the process into a lower-priority queue. Therefore, the high-priority queue holds processes with CPU time [0, QUANTUM); the mid-priority queue holds processes with CPU time [QUANTUM, QUANTUM*2); and the low-priority queue holds all the other processes.

Given this simple intuition, below are some implementation details.

  1. All the 3 queues hold processes with status PROC_RUNNABLE or PROC_READY. Processes with status PROC_PENDING_SYSCALL should not be in a queue.
  2. The current proc_yield iterates through proc_set and invokes proc_try_syscall for processes that are waiting for system calls to complete; You need to keep this logic and, if a process becomes PROC_RUNNABLE after proc_try_syscall, put it back to one of the three queues. In general, be careful with process status when updating the queues.
  3. In general, proc_set holds all the struct process while the queues only hold pointers to proc_set elements. For example, when enquing the currently running process to one of the queues, you should enqueue proc_set+curr_proc_idx.
  4. Modify proc_yield and find the next process to run based on the MLFQ algorithm. You also need to maintain the MLFQ data structures in proc_alloc and proc_free.
  5. Keep the system processes (i.e., pid<GPID_USER_START) in the high-priority queue.
  6. If all the queues are empty, keep running the current process if it is runnable. Otherwise, do FATAL("No more process can be scheduled.");. This is unlikely to happen though.

The code below skeches how you may implement a simple MLFQ data structure.

c
/* Implement a queue as a ring buffer with buffer[head_idx] as the first 
   element in the queue (if size!=0) */
struct queue {
    uint size, head_idx, tail_idx;
    struct process * buffer[MAX_NPROCESS];
};

/* For enqueue, set q->tail_idx as (q->tail_idx+1)%MAX_NPROCESS and do
   q->buffer[tail_idx] = p. Similarly, update q->head_idx for dequeue.
   Remember to update q->size accordingly. */
void enqueue( struct queue * q, struct process * p );
void dequeue( struct queue * q, struct process * p );

/* Define MLFQ as a global variable, so it is initialized as all 0s */
struct queue MLFQ[3];

When finished, you should see ls running much faster:

shell
➜ /home/yunhao loop &
[INFO] process 5 running in the background
[INFO] Start to run an infinite loop.
➜ /home/yunhao ls
# You should see the result of ls immediately

Accomplishments ​

You have started to learn about control and status registers which is the key CPU support for operating systems. You will touch more CSRs in later projects. You have also gained a precise understanding of timer interrupts and, combined with what you have learned in P1, you have learned how preemptive scheduling works.

"... any person ... any study."