Skip to content

Cooperative Threads

This project helps you understand the following C program.

c
void child(void* arg) {
    printf("Child thread is running.\n\r");
}

int main() {
    thread_init();
    thread_create(child, NULL);
    printf("Main thread is running.\n\r");
    thread_exit();
}

You have implemented printf in P0 and you will now implement functions thread_init, thread_create and thread_exit used by this program. Then you shall see its output:

txt
# Possible output #1
Child thread is running.
Main thread is running.

# Possible output #2
Main thread is running.
Child thread is running.

The two possibilities demonstrate the concepts of threads and concurrency. Consider a thread as executing a function. Initially, there is only one thread running function main; After main invokes thread_create, another thread is created running function child. Let's call them the main thread and the child thread. Concurrency means that the program does not enforce any ordering between the two threads. Therefore, if the child thread does the printing first, we will observe the first output above. Otherwise, if the main thread does the printing first, we will observe the second possibility.

TIP

We will explain the meaning of cooperative later. The key here is that, everything in P1 will only use the three types of CPU instructions explained in P0 (i.e., integer computation, control transfer and memory access). Again, we have been seriously striving to help you gain a precise understanding of OS concepts (e.g., threads) by connecting the concepts with the exact CPU instructions.

Get started

You will start from the thread branch of egos-2000 which provides printf and _sbrk for malloc just like P0. Feel free to incorporate your P0 solutions in thread.c for debug printing. thread.c has also listed all the functions that you need to implement in P1.

In addition, queue.h and queue.c are added which sketch some code for the queue data structure and can be very useful when you implement the functions in thread.c. You can decide yourself whether to complete the code in queue.c and use the code in thread.c. Completing the code in queue.c is not mandatory.

Lastly, an important source file is added, namely context.s, which contains two important helper functions ctx_start and ctx_switch:

asm
ctx_start:
    addi sp,sp,-128
    SAVE_ALL_REGISTERS
    sw sp,0(a0)
    mv sp,a1
    call ctx_entry

ctx_switch:
    addi sp,sp,-128
    SAVE_ALL_REGISTERS
    sw sp,0(a0)
    mv sp,a1
    RESTORE_ALL_REGISTERS
    addi sp,sp,128
    ret

The SAVE_ALL_REGISTERS and RESTORE_ALL_REGISTERS are assembly macros with simple definitions in context.s. Next, we explain how to use ctx_start to start the execution of a thread and use ctx_switch to transfer the control flow to a thread that has already been executing. After the explanations, you should be able to implement all the functions used by the multithreaded program above.

and then run programs with multiple threads.

Context switch

We explain the usage of ctx_start and ctx_switch using one possible execution of the program shown at the beginning:

  1. The main thread calls thread_create;
  2. The child thread starts execution, does the printing, and exits;
  3. The main thread returns from thread_create, does the printing, and exits.

The picture below illustrates the memory layout during the three steps, which contain the four types of memory regions explained in P0 (i.e., code, data, heap and stack).

Failed to load picture

What is context?

In step #1, thread_create is preparing to launch the child thread, and an important part of such preparation is to allocate a piece of memory for the stack of the child thread, shown as the shaded region. Therefore, we now have two stacks in the memory which is the key feature of multithreaded programs: one stack for each thread.

The sp in the picture indicates the stack pointer. At step #1, the stack pointer points to the stack of the main thread, and the program counter points to the code of thread_create(). Therefore, we say that the CPU is in the context of the main thread. The move from step #1 to step #2 is an example of context switch: the stack pointer now points to the stack of the child thread and the program counter points to printf. Therefore, in step #2, the CPU is in the context of the child thread.

Control flow details

Here are more details about the control flow from step #1 to step #2, and then to step #3.

  • thread_create invokes ctx_start, and the ctx_start region in the picture should be 128 bytes (i.e., the first instruction in ctx_start is addi sp,sp,-128). This number 128 is 32*4 because 32-bit RISC-V defines 32 4-byte registers and SAVE_ALL_REGISTERS is saving the value of all these registers on the stack.
  • The sw sp,0(a0) in ctx_start saves the 4-byte value of register sp (i.e., the saved in the picture) at memory address a0, the first argument of ctx_start. The purpose of saving sp at this point is to allow us to switch the CPU context back later.
  • The mv sp,a1 in ctx_start moves the stack pointer to a memory address specified by a1, the second argument of ctx_start, shown as init sp in the picture.
  • call ctx_entry invokes the ctx_entry function in thread.c. Intuitively, ctx_entry should find out that the child thread is the one to spawn and therefore it invokes function child. After child returns, ctx_entry should invoke thread_exit.
  • Lastly, thread_exit invokes ctx_switch which switches the CPU context back to the main thread. The assembly code of ctx_switch is similar to ctx_start, so try to figure out yourself what is happening during ctx_switch.
  • After ctx_start and thread_create both return, the main thread continues to invoke printf and does the printing as shown in step #3 of the picture.

A few details are still missing: How does thread_create decide the two arguments that it should pass to ctx_start? How does ctx_entry figure out that child is the function to invoke? These questions lead to the concept of Thread Control Block (TCB).

Thread control block

Thread control block is a data structure maintaining the metadata of each thread, including

  • thread id: a unique identifier of a thread
  • status: the thread’s current execution status
  • stack pointer: the saved value of register sp before switching to another thread
  • entry function: the “main” function of a thread (e.g., child() for the child thread)
  • entry function arguments: the arguments for the thread’s entry function
  • other metadata that could be useful

You can implement the thread control block as follows.

c
struct thread {
    int id;
    void* sp;
    ...                // Design the rest of struct thread yourself.
};
struct thread TCB[32]; // Represent TCB as a simple array. Alternatively,
                       // you can do `queue_t TCB;` and use the queue data
                       // structure functions defined in queue.c.
int current_idx;       // TCB[current_idx] is the currently running thread.

The code below shows how to use the TCB for ctx_start in thread_create.

c
char* child_stack = malloc(16 * 1024);
ctx_start(&TCB[current_idx].sp, child_stack + 16 * 1024);

TCB[current_idx].sp, as a pointer in a 32-bit program, occupies 4 bytes in the memory. &TCB[current_idx].sp is the address of these 4 bytes, and the sw sp,0(a0) instruction in ctx_start stores the 4 bytes of register sp into TCB[current_idx].sp. Unlike int* which stores the address of an integer, void* is used when we simply need to store a 32-bit memory address without any type information. In this case, we simply need to store the address shown by the text saved in step #2 of the picture above.

Note that malloc returns the lowest address of the allocated memory, so child_stack + 16 * 1024 gives us the highest address shown as init sp in the picture. As the second argument of ctx_start, this value is stored in register a1 and thus mv sp,a1 moves the stack pointer from the main thread's stack to the child thread's stack.

Try to figure out yourself how to store and use the entry function in struct thread. You will need the concept of function pointer in C. In short, a function pointer is simply 4 bytes storing the address of a function's first instruction in the code memory region.

Yield

Before you start to implement functions thread_init, thread_create and thread_exit, let us introduce the last function you would implement for multithreading, thread_yield. In the 3-step execution we discussed above, the CPU context would only switch back to the main thread after the child thread exits. However, if the child thread yields, the child thread would give up the CPU control and the main thread could resume execution sooner. Here is an example of two threads yielding to each other.

c
void child(void* arg) {
    for (int i = 0; i < 10; i++) {
        printf("%s is in for loop i=%d\n\r", arg, i);
        thread_yield();
    }
}

int main() {
    thread_init();
    thread_create(child, "Child thread");
    for (int i = 0; i < 10; i++) {
        printf("Main thread is in for loop i=%d\n\r", i);
        thread_yield();
    }
    thread_exit();
}

In this example, both threads do more printing and, because they yield to each other in the loop, we could see interleaving printings from the two threads. Intuitively, the key of an OS is to insert yields into different tasks so that different tasks can interleave. In P1, we invoke thread_yield explicitly in different threads just like the code above and, in P2, we will see how the CPU forces a thread to yield without the thread yielding voluntarily. This difference is why P1 is called cooperative threading while P2 is doing preemptive threading.

You can now start to implement the TCB and the following functions in thread.c. Note that struct thread is defined in thread.h which also contains detailed descriptions of these functions. After you finish, you should be able to run the multithreaded programs above and observe the expected output.

c
void ctx_entry();
void thread_init();
void thread_create(void (*entry)(void *arg), void *arg);
void thread_yield();
void thread_exit();

TIP

When all the threads have terminated and therefore no thread from the TCB could run, your code should call _end() which is defined in thread.s. Your code should also free the stack memory of a thread after this thread terminates. This can be tricky because you should not free the stack memory of the currently running thread when this thread still needs its stack for thread_exit() and ctx_switch(). Moreover, the main thread's stack is not from calling malloc(), so it should not be freed (i.e., when the main thread terminates, do nothing to its stack memory).

Conditional variable

In a multithreaded program, threads may need to wait until certain condition becomes true. For example, in your web browser, a thread that renders the webpage on the screen needs to wait until enough network packets for the webpage have been received. Such waiting is typically handled by a synchronization primitive called conditional variable which is widely used in real-world software practice. Consider a conditional variable as a C variable of type struct cv used by multiple threads with 3 operations:

c
struct cv {
/* Define this data structure yourself */
};

void cv_wait(struct cv* condition);
void cv_signal(struct cv* condition);
void cv_broadcast(struct cv* condition);

Let cond denote a condition that some threads need to wait for and let cond_cv denote a conditional variable defined in the program for cond. A thread could wait for this condition using a loop while(!cond) cv_wait(&cond_cv);. After such a loop, the thread can assume that condition cond is true and thus proceed with its work. If cond is true right before the loop, the thread will skip the loop without calling cv_wait.

If cond is false before the loop, the thread will call cv_wait which makes the thread yield. And if no other thread calls cv_signal or cv_broadcast on &cond_cv, the CPU context will never switch back to the waiting thread. The point is that, if another thread finds cond to become true some time later, this other thread can call cv_signal or cv_broadcast on &cond_cv in order to wake up the waiting thread.

Technically, struct cv should maintain an array or queue of struct thread just like the TCB. Upon cv_wait, the struct thread of the currently running thread is removed from the TCB and added into the conditional variable. This thread will never be switched back to until its struct thread is added back to the TCB. To this end, cv_signal will remove one struct thread from a conditional variable and add it back to the TCB. And cv_broadcast does the same for all struct thread from a conditional variable.

In this project, we only ask you to implement cv_wait and cv_signal so that you can run the following multithreaded program.

Producer-consumer

A classic problem in computer science is the bounded-buffer producer-consumer problem. In the example below, we have a buffer with a fixed size of 3. A program may spawn many threads, each being a producer or a consumer (i.e., running produce or consume as the entry function). Producers will put items into the buffer and consumers remove items. The constraint is that producers need to wait if the buffer is full and consumers need to wait if the buffer is empty. Therefore, we have two conditional variables nonempty and nonfull, and the associated conditions are highlighted in the code below.

c
void* buffer[3];
int count = 0;
int head = 0, tail = 0;
struct cv nonempty, nonfull;

void produce(void* item) {
    for (int i = 0; i < 10; i++) {
        while (count == 3) cv_wait(&nonfull);
        // At this point, the buffer is not full.
        buffer[tail] = item;
        tail = (tail + 1) % 3;
        count += 1;
        cv_signal(&nonempty);
    }
}

void* consume() {
    while (1) {
        while (count == 0) cv_wait(&nonempty);
        // At this point, the buffer is not empty.
        void* result = buffer[head];
        head = (head + 1) % 3;
        count -= 1;
        cv_signal(&nonfull);
    }
}

Your job is to implement struct cv and the two operations, wait and signal. You may need to modify the code for TCB and the thread_* functions (e.g., if the current thread invokes thread_yield() in cv_wait(), do not add its struct thread back to the TCB). However, you should not modify the function interface defined in thread.h. After you finish, you can try the above code and add some printing to see what happened.

Accomplishments

You have finished reading most of grass/kernel.s in egos-2000 which is very similar to ctx_switch. The proc_yield function in grass/kernel.c is similar to thread_yield. You will read and modify proc_yield in P2 and finish reading the rest of grass/kernel.c in P3. Indeed, you will finish reading everything under grass after P3.

TIP

Here is a quote from the Wikipedia page for cooperative multitasking, "Cooperative multitasking was the primary scheduling scheme for 16-bit applications employed by Microsoft Windows before Windows 95 and Windows NT, and by the classic Mac OS."

"... any person ... any study."