Cooperative Threads

This project helps you understand the following C program.

void child(void* arg) {
    printf("Child thread is running.\n\r");
}

int main() {
    thread_init();
    thread_create(child, NULL);
    printf("Main thread is running.\n\r");
    thread_exit();
}

Recall that you have implemented printf in P0. In P1, you will implement the thread_* functions used by this program and then you shall see its output:

txt

# Possible output #1
Child thread is running.
Main thread is running.

# Possible output #2
Main thread is running.
Child thread is running.

The two possibilities demonstrate the concepts of threads and concurrency. Consider a thread as executing a function. Initially, there is only one thread running function main; After main invokes thread_create, another thread is created running function child. Let's call them the main thread and the child thread. Concurrency means that the program does not enforce any ordering between the two threads. Therefore, if the child thread does the printing first, we should observe the first possible output above. Otherwise, if the main thread does the printing first, we should observe the second possibility.

TIP

We will explain the meaning of cooperative later. The key is that, everything in P1 will only use the three types of CPU instructions explained in P0 (i.e., integer computation, control transfer and memory access). Again, we have been seriously striving to help you gain a precise understanding of OS concepts (e.g., threads) by connecting the concepts with the exact CPU instructions.

Get started

You will start from the thread branch of egos-2000 which provides printf and _sbrk for malloc just like P0. Feel free to incorporate your P0 solution for better debug printing. This branch also adds a new source file context.s, which contains two important helper functions ctx_start and ctx_switch:

asm

ctx_start:
    addi sp,sp,-128
    SAVE_ALL_REGISTERS
    sw sp,0(a0)
    mv sp,a1
    call ctx_entry

ctx_switch:
    addi sp,sp,-128
    SAVE_ALL_REGISTERS
    sw sp,0(a0)
    mv sp,a1
    RESTORE_ALL_REGISTERS
    addi sp,sp,128
    ret

The SAVE_ALL_REGISTERS and RESTORE_ALL_REGISTERS are assembly macros with simple definitions in context.s. Next, we explain how to use ctx_start to start the execution of a thread and use ctx_switch to transfer the control flow to a thread that has already been executing. After the explanations, you should be able to implement the thread_* functions and then run programs with multiple threads.

Context switch

We explain the usage of ctx_start and ctx_switch using one possible execution of the program shown at the beginning, in which

The main thread calls thread_create;
The child thread starts execution, does the printing, and exits;
The main thread returns from thread_create, does the printing, and exits.

The picture below illustrates the memory layout during the three steps, which contain the four types of memory regions explained in P0: code, data, heap and stack.

Failed to load picture

What is context?

In step #1, thread_create is preparing to launch the child thread, and an important part of the preparation is to allocate a piece of memory for the stack of the child thread, shown as the shaded area. In other words, we now have two stacks in the memory, which is the key feature of multithreaded programs: one stack for each thread.

The sp in the picture indicates the stack pointer. At step #1, the stack pointer points to the stack of the main thread, and the program counter points to the code for thread_create. Therefore, we say that the CPU is in the context of the main thread. The move from step #1 to step #2 is an example of context switch: the stack pointer now points to the stack of the child thread and the program counter points to printf. Therefore, in step #2, the CPU is in the context of the child thread.

The control flow

Here are some details about the control flow from step #1 to step #2, and then to step #3.

thread_create invokes ctx_start, and the ctx_start region in the picture should be exactly 128 bytes because the first instruction in ctx_start is addi sp,sp,-128. The number is 128=32*4 because a 32-bit RISC-V CPU core has 32 4-byte registers and SAVE_ALL_REGISTERS is saving the value of all registers on the stack.
The sw sp,0(a0) in ctx_start corresponds to the saved pointer in the picture, where a0 is a memory address specified by the first argument of ctx_start. By saving sp at a special memory address, we can swich the CPU context back later.
The mv sp,a1 in ctx_start moves the stack pointer to a memory address specified by a1, the second argument of ctx_start, shown as the init sp in the picture.
call ctx_entry invokes function ctx_entry which you will implement yourself later. Intuitively, ctx_entry finds that the child thread is the next one to run and therefore it invokes function child. After child returns, ctx_entry will invoke thread_exit.
Lastly, thread_exit invokes ctx_switch which switches the CPU context back to the main thread. The assembly code of ctx_switch is similar to ctx_start, so please try to figure out yourself what is happening during ctx_switch.
After ctx_start and thread_create both return, the main thread continues to invoke printf and does the printing as shown in step #3 of the picture.

Some details are still missing: How does thread_create decide the two arguments that it passes to ctx_start? How does ctx_entry know that child is the function to invoke? These questions lead to the concept of Thread Control Block (TCB).

Thread control block

Thread control block is a data structure maintaining the metadata of each thread, including

thread id: a unique identifier of a thread
status: the thread’s current execution state
stack pointer: value of register sp before switching to another thread
entry function: the “main” function of a thread (e.g., child function for the child thread)
entry function arguments: argument(s) of the thread’s “main” function
other metadata that could be useful

In P1, you can implement the thread control block as follows.

struct thread {
    int id;
    void* sp;
    ...                // Design the rest of struct thread yourself.
};
struct thread TCB[32]; // Assume at most 32 threads for simplicity.
int current_idx;       // TCB[current_idx] is the currently running thread.

The code below shows how to use the TCB for ctx_start in thread_create.

void * child_stack = malloc(16 * 1024);
ctx_start(&TCB[current_idx].sp, child_stack + 16 * 1024);

TCB[current_idx].sp, as a pointer in a 32-bit program, occupies 4 bytes in the memory. &TCB[current_idx].sp is the address of these 4 bytes, and the sw sp,0(a0) instruction in ctx_start stores the 4 bytes of register sp into TCB[current_idx].sp. Unlike int* which stores the address of an integer, void* is used when we simply need to store a 32-bit memory address without any type information. In this case, we simply need to store the address shown by the text saved in step #2 of the picture above.

Note that malloc returns the lowest address of the allocated memory, so child_stack + 16 * 1024 gives us the highest address shown as init sp in the picture. As the second argument of ctx_start, this value is stored in register a1 and thus mv sp,a1 moves the stack pointer from the main thread's stack to the child thread's stack.

Try to figure out yourself how to store and use the entry function in struct thread. You will need the concept of function pointer in C. In short, a function pointer is simply 4 bytes storing the address of a function's first instruction.

Thread yield

Before you start to implement functions thread_init, thread_create and thread_exit, let us introduce the last function you would implement for multithreading: thread_yield. In the 3-step execution that we discussed previously, the CPU context would only switch back to the main thread after the child thread exits. However, if the child thread yields, the child thread will give up the CPU control and thus the main thread can resume execution earlier. Below is an example of two threads yielding to each other.

void child(void* arg) {
    for (int i = 0; i < 10; i++) {
        printf("%s is in for loop i=%d\n\r", arg, i);
        thread_yield();
    }
}

int main() {
    thread_init();
    thread_create(child, "Child thread");
    for (int i = 0; i < 10; i++) {
        printf("Main thread is in for loop i=%d\n\r", i);
        thread_yield();
    }
    thread_exit();
}

In this example, each thread does more printing and, since they yield to each other in the loop, we could see interleaving printings from the two threads. Intuitively, the key of an OS is to insert yields into different tasks so that different tasks can interleave. In P1, we invoke thread_yield manually in different threads just like this example, while we will see how to insert yields automatically by the CPU in P2. Such difference is why P1 is called cooperative multithreading, while P2 is doing preemptive multithreading.

Now, you can start to implement the TCB and the following four functions. After you finish, you should be able to run the example programs above and see the expected output.

// Initialize the TCB data structures
void thread_init();

// Create a new thread with a 16KB stack
void thread_create(void (*entry)(void *arg), void *arg);

// Yield to another thread (if any)
void thread_yield();

// Terminate the current thread
void thread_exit();

Conditional variable

You can further test your multithreading code by implementing a synchronization primitive called conditional variable, which is widely used in real-world software practice. Consider a conditional variable as a C variable of type struct cv with two associated operations: wait and signal, shown as cv_wait and cv_signal below.

struct cv {
    // Design this struct yourself.
};

void cv_wait(struct cv* condition);
void cv_signal(struct cv* condition);

The cv_wait function puts the current thread into sleep because certain condition is false. When other threads find that the condition becomes true, they will invoke cv_signal with the same conditional variable in order to wake up the sleeping thread. Below is an example of what the certain condition could be for a conditional variable.

Producer/consumer

A classic problem in computer science is the bounded-buffer producer/consumer problem. In the example below, we have a buffer with a fixed size of 3. A program may spawn many threads, each being a producer or a consumer (i.e., running produce or consume as the entry function). Producers will put items into the buffer and consumers remove items. The constraint is that producers need to wait if the buffer is full and consumers need to wait if the buffer is empty. Therefore, we have two conditional variables nonempty and nonfull, and the associated conditions are highlighted in the code below.

void* buffer[3];
int count = 0;
int head = 0, tail = 0;
struct cv nonempty, nonfull;

void produce(void* item) {
    for (int i = 0; i < 10; i++) {
        while (count == 3) cv_wait(&nonfull);
        // At this point, the buffer is not full.
        buffer[tail] = item;
        tail = (tail + 1) % 3;
        count += 1;
        cv_signal(&nonempty);
    }
}

void* consume() {
    while (1) {
        while (count == 0) cv_wait(&nonempty);
        // At this point, the buffer is not empty.
        void* result = buffer[head];
        head = (head + 1) % 3;
        count -= 1;
        cv_signal(&nonfull);
    }
}

Your job is to implement struct cv and the two operations, wait and signal. Also, feel free to update your code of the TCB and the thread_* functions. For example, you may add a new status WAITING for a thread that is waiting for a conditional variable. And if no thread inovkes cv_signal on the same conditional variable, your code should not switch back to this WAITING thread. After you finish, try the producer/consumer code above.

Accomplishments

You have finished reading most of grass/kernel.s in egos-2000, which is very similar to ctx_switch. The thread_yield function is similar to proc_yield in grass/kernel.c of egos-2000. You will finish reading grass/kernel.c after P3, in which we will introduce the other key function in this file called proc_try_syscall.

TIP

"Cooperative multitasking was the primary scheduling scheme for 16-bit applications employed by Microsoft Windows before Windows 95 and Windows NT, and by the classic Mac OS."

-- the Wikipedia page for cooperative multitasking.

Cooperative Threads ​

Get started ​

Context switch ​

What is context? ​

The control flow ​

Thread control block ​

Thread yield ​

Conditional variable ​

Producer/consumer ​

Accomplishments ​