Cooperative Threads
This project helps you understand the following C program.
void child(void* arg) {
printf("Child thread is running.\n\r");
}
int main() {
thread_init();
thread_create(child, NULL);
printf("Main thread is running.\n\r");
thread_exit();
}
You have implemented printf
in P0 and you will now implement functions thread_init
, thread_create
and thread_exit
used by this program. Then you shall see its output:
# Possible output #1
Child thread is running.
Main thread is running.
# Possible output #2
Main thread is running.
Child thread is running.
The two possibilities demonstrate the concepts of threads and concurrency. Consider a thread as executing a function. Initially, there is only one thread running function main
; After main
invokes thread_create
, another thread is created running function child
. Let's call them the main thread and the child thread. Concurrency means that the program does not enforce any ordering between the two threads. Therefore, if the child thread does the printing first, we will observe the first output above. Otherwise, if the main thread does the printing first, we will observe the second possibility.
TIP
We will explain the meaning of cooperative later. The key here is that, everything in P1 will only use the three types of CPU instructions explained in P0 (i.e., integer computation, control transfer and memory access). Again, we have been seriously striving to help you gain a precise understanding of OS concepts (e.g., threads) by connecting the concepts with the exact CPU instructions.
Get started
You will start from the thread
branch of egos-2000 which provides printf
and _sbrk
for malloc
just like P0. Feel free to incorporate your P0 solutions in thread.c
for debug printing. thread.c
has also listed all the functions that you need to implement in P1.
In addition, queue.h
and queue.c
are added which sketch some code for the queue data structure and can be very useful when you implement the functions in thread.c
. You can decide yourself whether to complete the code in queue.c
and use the code in thread.c
. Completing the code in queue.c
is not mandatory.
Lastly, an important source file is added, namely context.s
, which contains two important helper functions ctx_start
and ctx_switch
:
ctx_start:
addi sp,sp,-128
SAVE_ALL_REGISTERS
sw sp,0(a0)
mv sp,a1
call ctx_entry
ctx_switch:
addi sp,sp,-128
SAVE_ALL_REGISTERS
sw sp,0(a0)
mv sp,a1
RESTORE_ALL_REGISTERS
addi sp,sp,128
ret
The SAVE_ALL_REGISTERS
and RESTORE_ALL_REGISTERS
are assembly macros with simple definitions in context.s
. Next, we explain how to use ctx_start
to start the execution of a thread and use ctx_switch
to transfer the control flow to a thread that has already been executing. After the explanations, you should be able to implement all the functions used by the multithreaded program above.
and then run programs with multiple threads.
Context switch
We explain the usage of ctx_start
and ctx_switch
using one possible execution of the program shown at the beginning:
- The main thread calls
thread_create
; - The child thread starts execution, does the printing, and exits;
- The main thread returns from
thread_create
, does the printing, and exits.
The picture below illustrates the memory layout during the three steps, which contain the four types of memory regions explained in P0 (i.e., code, data, heap and stack).
What is context?
In step #1, thread_create
is preparing to launch the child thread, and an important part of such preparation is to allocate a piece of memory for the stack of the child thread, shown as the shaded region. Therefore, we now have two stacks in the memory which is the key feature of multithreaded programs: one stack for each thread.
The sp
in the picture indicates the stack pointer. At step #1, the stack pointer points to the stack of the main thread, and the program counter points to the code of thread_create()
. Therefore, we say that the CPU is in the context of the main thread. The move from step #1 to step #2 is an example of context switch: the stack pointer now points to the stack of the child thread and the program counter points to printf
. Therefore, in step #2, the CPU is in the context of the child thread.
Control flow details
Here are more details about the control flow from step #1 to step #2, and then to step #3.
thread_create
invokesctx_start
, and thectx_start
region in the picture should be 128 bytes (i.e., the first instruction inctx_start
isaddi sp,sp,-128
). This number 128 is32*4
because 32-bit RISC-V defines 32 4-byte registers andSAVE_ALL_REGISTERS
is saving the value of all these registers on the stack.- The
sw sp,0(a0)
inctx_start
saves the 4-byte value of registersp
(i.e., thesaved
in the picture) at memory addressa0
, the first argument ofctx_start
. The purpose of savingsp
at this point is to allow us to switch the CPU context back later. - The
mv sp,a1
inctx_start
moves the stack pointer to a memory address specified bya1
, the second argument ofctx_start
, shown asinit sp
in the picture. call ctx_entry
invokes thectx_entry
function inthread.c
. Intuitively,ctx_entry
should find out that the child thread is the one to spawn and therefore it invokes functionchild
. Afterchild
returns,ctx_entry
should invokethread_exit
.- Lastly,
thread_exit
invokesctx_switch
which switches the CPU context back to the main thread. The assembly code ofctx_switch
is similar toctx_start
, so try to figure out yourself what is happening duringctx_switch
. - After
ctx_start
andthread_create
both return, the main thread continues to invokeprintf
and does the printing as shown in step #3 of the picture.
A few details are still missing: How does thread_create
decide the two arguments that it should pass to ctx_start
? How does ctx_entry
figure out that child
is the function to invoke? These questions lead to the concept of Thread Control Block (TCB).
Thread control block
Thread control block is a data structure maintaining the metadata of each thread, including
- thread id: a unique identifier of a thread
- status: the thread’s current execution status
- stack pointer: the saved value of register
sp
before switching to another thread - entry function: the “main” function of a thread (e.g.,
child()
for the child thread) - entry function arguments: the arguments for the thread’s entry function
- other metadata that could be useful
You can implement the thread control block as follows.
struct thread {
int id;
void* sp;
... // Design the rest of struct thread yourself.
};
struct thread TCB[32]; // Represent TCB as a simple array. Alternatively,
// you can do `queue_t TCB;` and use the queue data
// structure functions defined in queue.c.
int current_idx; // TCB[current_idx] is the currently running thread.
The code below shows how to use the TCB for ctx_start
in thread_create
.
char* child_stack = malloc(16 * 1024);
ctx_start(&TCB[current_idx].sp, child_stack + 16 * 1024);
TCB[current_idx].sp
, as a pointer in a 32-bit program, occupies 4 bytes in the memory. &TCB[current_idx].sp
is the address of these 4 bytes, and the sw sp,0(a0)
instruction in ctx_start
stores the 4 bytes of register sp
into TCB[current_idx].sp
. Unlike int*
which stores the address of an integer, void*
is used when we simply need to store a 32-bit memory address without any type information. In this case, we simply need to store the address shown by the text saved
in step #2 of the picture above.
Note that malloc
returns the lowest address of the allocated memory, so child_stack + 16 * 1024
gives us the highest address shown as init sp
in the picture. As the second argument of ctx_start
, this value is stored in register a1
and thus mv sp,a1
moves the stack pointer from the main thread's stack to the child thread's stack.
Try to figure out yourself how to store and use the entry function in struct thread
. You will need the concept of function pointer in C. In short, a function pointer is simply 4 bytes storing the address of a function's first instruction in the code memory region.
Yield
Before you start to implement functions thread_init
, thread_create
and thread_exit
, let us introduce the last function you would implement for multithreading, thread_yield
. In the 3-step execution we discussed above, the CPU context would only switch back to the main thread after the child thread exits. However, if the child thread yields, the child thread would give up the CPU control and the main thread could resume execution sooner. Here is an example of two threads yielding to each other.
void child(void* arg) {
for (int i = 0; i < 10; i++) {
printf("%s is in for loop i=%d\n\r", arg, i);
thread_yield();
}
}
int main() {
thread_init();
thread_create(child, "Child thread");
for (int i = 0; i < 10; i++) {
printf("Main thread is in for loop i=%d\n\r", i);
thread_yield();
}
thread_exit();
}
In this example, both threads do more printing and, because they yield to each other in the loop, we could see interleaving printings from the two threads. Intuitively, the key of an OS is to insert yields into different tasks so that different tasks can interleave. In P1, we invoke thread_yield
explicitly in different threads just like the code above and, in P2, we will see how the CPU forces a thread to yield without the thread yielding voluntarily. This difference is why P1 is called cooperative threading while P2 is doing preemptive threading.
You can now start to implement the TCB and the following functions in thread.c
. Note that struct thread
is defined in thread.h
which also contains detailed descriptions of these functions. After you finish, you should be able to run the multithreaded programs above and observe the expected output.
void ctx_entry();
void thread_init();
void thread_create(void (*entry)(void *arg), void *arg);
void thread_yield();
void thread_exit();
TIP
When all the threads have terminated and therefore no thread from the TCB could run, your code should call _end()
which is defined in thread.s
. Your code should also free the stack memory of a thread after this thread terminates. This can be tricky because you should not free the stack memory of the currently running thread when this thread still needs its stack for thread_exit()
and ctx_switch()
. Moreover, the main thread's stack is not from calling malloc()
, so it should not be freed (i.e., when the main thread terminates, do nothing to its stack memory).
Conditional variable
In a multithreaded program, threads may need to wait until certain condition becomes true. For example, in your web browser, a thread that renders the webpage on the screen needs to wait until enough network packets for the webpage have been received. Such waiting is typically handled by a synchronization primitive called conditional variable which is widely used in real-world software practice. Consider a conditional variable as a C variable of type struct cv
used by multiple threads with 3 operations:
struct cv {
/* Define this data structure yourself */
};
void cv_wait(struct cv* condition);
void cv_signal(struct cv* condition);
void cv_broadcast(struct cv* condition);
Let cond
denote a condition that some threads need to wait for and let cond_cv
denote a conditional variable defined in the program for cond
. A thread could wait for this condition using a loop while(!cond) cv_wait(&cond_cv);
. After such a loop, the thread can assume that condition cond
is true and thus proceed with its work. If cond
is true right before the loop, the thread will skip the loop without calling cv_wait
.
If cond
is false before the loop, the thread will call cv_wait
which makes the thread yield. And if no other thread calls cv_signal
or cv_broadcast
on &cond_cv
, the CPU context will never switch back to the waiting thread. The point is that, if another thread finds cond
to become true some time later, this other thread can call cv_signal
or cv_broadcast
on &cond_cv
in order to wake up the waiting thread.
Technically, struct cv
should maintain an array or queue of struct thread
just like the TCB. Upon cv_wait
, the struct thread
of the currently running thread is removed from the TCB and added into the conditional variable. This thread will never be switched back to until its struct thread
is added back to the TCB. To this end, cv_signal
will remove one struct thread
from a conditional variable and add it back to the TCB. And cv_broadcast
does the same for all struct thread
from a conditional variable.
In this project, we only ask you to implement cv_wait
and cv_signal
so that you can run the following multithreaded program.
Producer-consumer
A classic problem in computer science is the bounded-buffer producer-consumer problem. In the example below, we have a buffer with a fixed size of 3. A program may spawn many threads, each being a producer or a consumer (i.e., running produce
or consume
as the entry function). Producers will put items into the buffer and consumers remove items. The constraint is that producers need to wait if the buffer is full and consumers need to wait if the buffer is empty. Therefore, we have two conditional variables nonempty
and nonfull
, and the associated conditions are highlighted in the code below.
void* buffer[3];
int count = 0;
int head = 0, tail = 0;
struct cv nonempty, nonfull;
void produce(void* item) {
for (int i = 0; i < 10; i++) {
while (count == 3) cv_wait(&nonfull);
// At this point, the buffer is not full.
buffer[tail] = item;
tail = (tail + 1) % 3;
count += 1;
cv_signal(&nonempty);
}
}
void* consume() {
while (1) {
while (count == 0) cv_wait(&nonempty);
// At this point, the buffer is not empty.
void* result = buffer[head];
head = (head + 1) % 3;
count -= 1;
cv_signal(&nonfull);
}
}
Your job is to implement struct cv
and the two operations, wait and signal. You may need to modify the code for TCB and the thread_*
functions (e.g., if the current thread invokes thread_yield()
in cv_wait()
, do not add its struct thread
back to the TCB). However, you should not modify the function interface defined in thread.h
. After you finish, you can try the above code and add some printing to see what happened.
Accomplishments
You have finished reading most of grass/kernel.s
in egos-2000 which is very similar to ctx_switch
. The proc_yield
function in grass/kernel.c
is similar to thread_yield
. You will read and modify proc_yield
in P2 and finish reading the rest of grass/kernel.c
in P3. Indeed, you will finish reading everything under grass
after P3.
TIP
Here is a quote from the Wikipedia page for cooperative multitasking, "Cooperative multitasking was the primary scheduling scheme for 16-bit applications employed by Microsoft Windows before Windows 95 and Windows NT, and by the classic Mac OS."