Cooperative Threads
This project helps you understand the following C program.
void child(void* arg) {
printf("Child thread is running.\n\r");
}
int main() {
thread_init();
thread_create(child, NULL);
printf("Main thread is running.\n\r");
thread_exit();
}
Recall that you have implemented printf
in P0. In P1, you will implement the thread_*
functions used by this program and then you shall see its output:
# Possible output #1
Child thread is running.
Main thread is running.
# Possible output #2
Main thread is running.
Child thread is running.
The two possibilities demonstrate the concepts of threads and concurrency. Consider a thread as executing a function. Initially, there is only one thread running function main
; After main
invokes thread_create
, another thread is created running function child
. Let's call them the main thread and the child thread. Concurrency means that the program does not enforce any ordering between the two threads. Therefore, if the child thread does the printing first, we should observe the first possible output above. Otherwise, if the main thread does the printing first, we should observe the second possibility.
TIP
We will explain the meaning of cooperative later. The key is that, everything in P1 will only use the three types of CPU instructions explained in P0 (i.e., integer computation, control transfer and memory access). Again, we have been seriously striving to help you gain a precise understanding of OS concepts (e.g., threads) by connecting the concepts with the exact CPU instructions.
Get started
You will start from the thread
branch of egos-2000 which provides printf
and _sbrk
for malloc
just like P0. Feel free to incorporate your P0 solution for better debug printing. This branch also adds a new source file context.s
, which contains two important helper functions ctx_start
and ctx_switch
:
ctx_start:
addi sp,sp,-128
SAVE_ALL_REGISTERS
sw sp,0(a0)
mv sp,a1
call ctx_entry
ctx_switch:
addi sp,sp,-128
SAVE_ALL_REGISTERS
sw sp,0(a0)
mv sp,a1
RESTORE_ALL_REGISTERS
addi sp,sp,128
ret
The SAVE_ALL_REGISTERS
and RESTORE_ALL_REGISTERS
are assembly macros with simple definitions in context.s
. Next, we explain how to use ctx_start
to start the execution of a thread and use ctx_switch
to transfer the control flow to a thread that has already been executing. After the explanations, you should be able to implement the thread_*
functions and then run programs with multiple threads.
Context switch
We explain the usage of ctx_start
and ctx_switch
using one possible execution of the program shown at the beginning, in which
- The main thread calls
thread_create
; - The child thread starts execution, does the printing, and exits;
- The main thread returns from
thread_create
, does the printing, and exits.
The picture below illustrates the memory layout during the three steps, which contain the four types of memory regions explained in P0: code, data, heap and stack.
What is context?
In step #1, thread_create
is preparing to launch the child thread, and an important part of the preparation is to allocate a piece of memory for the stack of the child thread, shown as the shaded area. In other words, we now have two stacks in the memory, which is the key feature of multithreaded programs: one stack for each thread.
The sp
in the picture indicates the stack pointer. At step #1, the stack pointer points to the stack of the main thread, and the program counter points to the code for thread_create
. Therefore, we say that the CPU is in the context of the main thread. The move from step #1 to step #2 is an example of context switch: the stack pointer now points to the stack of the child thread and the program counter points to printf
. Therefore, in step #2, the CPU is in the context of the child thread.
The control flow
Here are some details about the control flow from step #1 to step #2, and then to step #3.
thread_create
invokesctx_start
, and thectx_start
region in the picture should be exactly 128 bytes because the first instruction inctx_start
isaddi sp,sp,-128
. The number is128=32*4
because a 32-bit RISC-V CPU core has 32 4-byte registers andSAVE_ALL_REGISTERS
is saving the value of all registers on the stack.- The
sw sp,0(a0)
inctx_start
corresponds to thesaved
pointer in the picture, wherea0
is a memory address specified by the first argument ofctx_start
. By savingsp
at a special memory address, we can swich the CPU context back later. - The
mv sp,a1
inctx_start
moves the stack pointer to a memory address specified bya1
, the second argument ofctx_start
, shown as theinit sp
in the picture. call ctx_entry
invokes functionctx_entry
which you will implement yourself later. Intuitively,ctx_entry
finds that the child thread is the next one to run and therefore it invokes functionchild
. Afterchild
returns,ctx_entry
will invokethread_exit
.- Lastly,
thread_exit
invokesctx_switch
which switches the CPU context back to the main thread. The assembly code ofctx_switch
is similar toctx_start
, so please try to figure out yourself what is happening duringctx_switch
. - After
ctx_start
andthread_create
both return, the main thread continues to invokeprintf
and does the printing as shown in step #3 of the picture.
Some details are still missing: How does thread_create
decide the two arguments that it passes to ctx_start
? How does ctx_entry
know that child
is the function to invoke? These questions lead to the concept of Thread Control Block (TCB).
Thread control block
Thread control block is a data structure maintaining the metadata of each thread, including
- thread id: a unique identifier of a thread
- status: the thread’s current execution state
- stack pointer: value of register
sp
before switching to another thread - entry function: the “main” function of a thread (e.g.,
child
function for the child thread) - entry function arguments: argument(s) of the thread’s “main” function
- other metadata that could be useful
In P1, you can implement the thread control block as follows.
struct thread {
int id;
void* sp;
... // Design the rest of struct thread yourself.
};
struct thread TCB[32]; // Assume at most 32 threads for simplicity.
int current_idx; // TCB[current_idx] is the currently running thread.
The code below shows how to use the TCB for ctx_start
in thread_create
.
void * child_stack = malloc(16 * 1024);
ctx_start(&TCB[current_idx].sp, child_stack + 16 * 1024);
TCB[current_idx].sp
, as a pointer in a 32-bit program, occupies 4 bytes in the memory. &TCB[current_idx].sp
is the address of these 4 bytes, and the sw sp,0(a0)
instruction in ctx_start
stores the 4 bytes of register sp
into TCB[current_idx].sp
. Unlike int*
which stores the address of an integer, void*
is used when we simply need to store a 32-bit memory address without any type information. In this case, we simply need to store the address shown by the text saved
in step #2 of the picture above.
Note that malloc
returns the lowest address of the allocated memory, so child_stack + 16 * 1024
gives us the highest address shown as init sp
in the picture. As the second argument of ctx_start
, this value is stored in register a1
and thus mv sp,a1
moves the stack pointer from the main thread's stack to the child thread's stack.
Try to figure out yourself how to store and use the entry function in struct thread
. You will need the concept of function pointer in C. In short, a function pointer is simply 4 bytes storing the address of a function's first instruction.
Thread yield
Before you start to implement functions thread_init
, thread_create
and thread_exit
, let us introduce the last function you would implement for multithreading: thread_yield
. In the 3-step execution that we discussed previously, the CPU context would only switch back to the main thread after the child thread exits. However, if the child thread yields, the child thread will give up the CPU control and thus the main thread can resume execution earlier. Below is an example of two threads yielding to each other.
void child(void* arg) {
for (int i = 0; i < 10; i++) {
printf("%s is in for loop i=%d\n\r", arg, i);
thread_yield();
}
}
int main() {
thread_init();
thread_create(child, "Child thread");
for (int i = 0; i < 10; i++) {
printf("Main thread is in for loop i=%d\n\r", i);
thread_yield();
}
thread_exit();
}
In this example, each thread does more printing and, since they yield to each other in the loop, we could see interleaving printings from the two threads. Intuitively, the key of an OS is to insert yields into different tasks so that different tasks can interleave. In P1, we invoke thread_yield
manually in different threads just like this example, while we will see how to insert yields automatically by the CPU in P2. Such difference is why P1 is called cooperative multithreading, while P2 is doing preemptive multithreading.
Now, you can start to implement the TCB and the following four functions. After you finish, you should be able to run the example programs above and see the expected output.
// Initialize the TCB data structures
void thread_init();
// Create a new thread with a 16KB stack
void thread_create(void (*entry)(void *arg), void *arg);
// Yield to another thread (if any)
void thread_yield();
// Terminate the current thread
void thread_exit();
Conditional variable
You can further test your multithreading code by implementing a synchronization primitive called conditional variable, which is widely used in real-world software practice. Consider a conditional variable as a C variable of type struct cv
with two associated operations: wait and signal, shown as cv_wait
and cv_signal
below.
struct cv {
// Design this struct yourself.
};
void cv_wait(struct cv* condition);
void cv_signal(struct cv* condition);
The cv_wait
function puts the current thread into sleep because certain condition is false. When other threads find that the condition becomes true, they will invoke cv_signal
with the same conditional variable in order to wake up the sleeping thread. Below is an example of what the certain condition could be for a conditional variable.
Producer/consumer
A classic problem in computer science is the bounded-buffer producer/consumer problem. In the example below, we have a buffer with a fixed size of 3. A program may spawn many threads, each being a producer or a consumer (i.e., running produce
or consume
as the entry function). Producers will put items into the buffer and consumers remove items. The constraint is that producers need to wait if the buffer is full and consumers need to wait if the buffer is empty. Therefore, we have two conditional variables nonempty
and nonfull
, and the associated conditions are highlighted in the code below.
void* buffer[3];
int count = 0;
int head = 0, tail = 0;
struct cv nonempty, nonfull;
void produce(void* item) {
for (int i = 0; i < 10; i++) {
while (count == 3) cv_wait(&nonfull);
// At this point, the buffer is not full.
buffer[tail] = item;
tail = (tail + 1) % 3;
count += 1;
cv_signal(&nonempty);
}
}
void* consume() {
while (1) {
while (count == 0) cv_wait(&nonempty);
// At this point, the buffer is not empty.
void* result = buffer[head];
head = (head + 1) % 3;
count -= 1;
cv_signal(&nonfull);
}
}
Your job is to implement struct cv
and the two operations, wait and signal. Also, feel free to update your code of the TCB and the thread_*
functions. For example, you may add a new status WAITING
for a thread that is waiting for a conditional variable. And if no thread inovkes cv_signal
on the same conditional variable, your code should not switch back to this WAITING
thread. After you finish, try the producer/consumer code above.
Accomplishments
You have finished reading most of grass/kernel.s
in egos-2000, which is very similar to ctx_switch
. The thread_yield
function is similar to proc_yield
in grass/kernel.c
of egos-2000. You will finish reading grass/kernel.c
after P3, in which we will introduce the other key function in this file called proc_try_syscall
.
TIP
"Cooperative multitasking was the primary scheduling scheme for 16-bit applications employed by Microsoft Windows before Windows 95 and Windows NT, and by the classic Mac OS."
-- the Wikipedia page for cooperative multitasking.