Virtual Memory ​
An important goal of virtual memory is to make sure that a user process can only access its own memory. In particular, it should not be allowed to access the kernel memory or memory of other processes. In P3, you have learned about PMP which could be a building block for virtual memory. We thus start by introducing a simple implementation of virtual memory that combines PMP with software TLB, a scheme that takes only 20 lines of code.
This simple implementation helps you understand precisely the interface and functionalities of virtual memory. We then introduce page table translation, a more complex but also more widely-used CPU support for virtual memory. Your job is to implement virtual memory using page table translation and provide the same functionalities as PMP + software TLB.
TIP
We will use process instead of thread throughout this project. Typically, threads can access the memory of each other just like what you have implemented in P1. In contrast, a process typically cannot access the memory of another process and again such isolation is enforced by virtual memory. This project will demonstrate such difference precisely using concrete data structures.
Software TLB ​
Consider two processes and they wish to use the same memory region. For example, all the processes in egos-2000 use 0x80800000
as the top of their stack (i.e., APPS_STACK_TOP
in library/egos.h
). Virtual memory means that the operating system creates an illusion to a process that it can exclusively use the memory below 0x80800000
for its stack.
For example, suppose two processes both use the load instruction with 0x807FFFFC
as the address to read the highest 4 bytes of their stack. To provide the illusion, this 0x807FFFFC
will be translated into two different addresses when accessing the memory. In other words, even though the two processes execute exactly the same instruction, the CPU can actually read different parts of the memory and return different 4 bytes. This makes sense because it is likely that different processes put different 4 bytes at the top of their stack.
The illusion here is that a process can only see that it reads address 0x807FFFFC
while the operating system knows the translated address being used to actually access the memory. We call 0x807FFFFC
the virtual address in this case and, for every process, it is translated into a physical address for the actual memory access.
Address translation ​
Let's read a data structure that records the translation from virtual to physical addresses.
/* library/egos.h */
#define PAGE_SIZE 4096
#define RAM_END 0x81000000
#define APPS_PAGES_BASE 0x80800000
/* earth/cpu_mmu.c */
#define APPS_PAGES_CNT (RAM_END - APPS_PAGES_BASE) / PAGE_SIZE
struct page_info {
int use; /* Is this page free or allocated? */
int pid; /* Which process owns this page? */
uint vpage_no; /* Which virtual page of pid maps to this physial page? */
} page_info_table[APPS_PAGES_CNT];
In egos-2000, the 8MB region of [APPS_PAGES_BASE, RAM_END)
holds the code, data, heap and stack of all processes. Therefore, if a process reads a virtual address like 0x807FFFFC
, it will be translated into some physical address in this 8MB region. This 8MB is splitted into 2048 pages (i.e., APPS_PAGES_CNT=2048
) and each page is 4KB (i.e., PAGE_SIZE
).
There is a struct page_info
for each of the 2048 pages recording translation information. Suppose the operating system allocates page #6 (i.e., [0x80806000, 0x80807000)
) for the stack top of a process with pid=7
.
The virtual address region
[0x807FF000, 0x80800000)
of this process directly translates to physical address region[0x80806000, 0x80807000)
. In other words,0x807FFXXX
will be translated to physical address0x80806XXX
for anyXXX
in range[0, PAGE_SIZE)
.To record such translation in the data structure,
page_info_table[6].pid
should be set to7
andpage_info_table[6].vpage_no
should be set to0x807FF
. We call0x807FF
the virtual page number, call0x80806
the physical page number, call6
the physical page ID, and lastly callXXX
the offset.
TIP
To make sure you understand, here is a simple exercise. Suppose the operating system allocates page #1995 out of the 2048 pages for the stack top of a process with pid=12
. And suppose this process reads virtual address 0x807FF234
in it stack. What is the virtual page number, physical page number, physical page ID, and offset respectively for the address translation?
You should now have a concrete understanding of how the same virtual address (e.g., one in the stack) for different processes can be translated into different physical addresses.
The mmu_map
interface ​
The mmu_map
interface in struct earth
is responsible for recording such translations and soft_tlb_map
is an implementation of this interface. The translation example above can be done by calling earth->mmu_map(7, 0x807FF, 6)
.
/* library/egos.h */
struct earth {
...
void (*mmu_map)(int pid, uint vpage_no, uint ppage_id);
...
};
/* earth/cpu_mmu.c */
void soft_tlb_map(int pid, uint vpage_no, uint ppage_id) {
page_info_table[ppage_id].pid = pid;
page_info_table[ppage_id].vpage_no = vpage_no;
}
...
earth->mmu_map = soft_tlb_map; /* when earth->translation is SOFT_TLB */
...
Next, let's read how earth->mmu_map
is used in library/elf/elf.c
where the operating system initializes the memory for every new process. On the high-level, when spawning a process, the operating system needs to allocate some pages out of the 2048 for the code, data, heap and stack of the new process, leading to the mmu_alloc
function.
/* earth/cpu_mmu.c */
uint mmu_alloc() {
for (uint i = 0; i < APPS_PAGES_CNT; i++)
if (!page_info_table[i].use) {
page_info_table[i].use = 1;
return i;
}
FATAL("mmu_alloc: no more free memory");
}
/* library/elf/elf.c */
void elf_load(int pid, elf_reader reader, int argc, void** argv) {
...
ppage_id = earth->mmu_alloc();
earth->mmu_map(pid, VPAGE_NO, ppage_id);
/* This pattern appears 6 times in the code for different VPAGE_NO */
...
}
The mmu_alloc
function returns a physical page ID. VPAGE_NO
stands for the virtual page number. For code, data and heap, VPAGE_NO
starts at 0x80400
and the number of pages allocated/mapped depends on the size of the application binary. For arguments of main()
and system call arguments, two pages are allocated/mapped with VPAGE_NO
0x80600
and 0x80601
. For stack, we map two pages with VPAGE_NO
being 0x807FE
and 0x807FF
, and we assume that processes use at most 8KB of stack memory just for simplicity. The picture below illustrates elf_load
for an application with 4 pages of code/data/heap.
Note that the "physical pages" area above simply shows one possibility of how mmu_alloc
picks the physical page ID out of [0, 2048)
for each page allocation.
TIP
There are a few other places referencing these numbers. The ORIGIN = 0x80400000
in library/elf/app.lds
indicates that the compiler puts the application binary code starting at virtual page 0x80400
. The li sp,0x80800000
in apps/app.s
indicates that all applications use virtual page 0x807FF
for the top of stack. The APPS_ARG
and SYSCALL_ARG
in library/egos.h
corresponds to the virtual pages 0x80600
and 0x80601
respectively.
The mmu_switch
interface ​
As you may have noticed, proc_yield
invokes earth->mmu_switch
for the process which runs next. And soft_tlb_switch
is a simple implementation of earth->mmu_switch
. When switching from process A to process B, soft_tlb_switch
does two simple things.
First, it copies all the pages mapped for process A back to the buffer of 2048 pages.
Second, it copies all the pages mapped for process B back to [0x80400000, 0x80800000)
.
Note that, in the two pictures above, all the areas indicate physical memory. In other words, we use the 12MB physical memory region [0x80400000, 0x81000000)
to create an illusion that every process owns a 4MB virtual memory region [0x80400000, 0x80800000)
.
Whenever proc_yield
picks a process to run next, soft_tlb_switch
makes sure that the process can access its own bytes when invoking a load or store instruction with an address in region [0x80400000, 0x80800000)
. Moreover, recall that you could set a PMP region for [0x80400000, 0x80800000)
in P3. As a result, if a process is running in the user mode, this process cannot access the memory pages of other processes which, as shown above, have been kept in memory region [0x80800000, 0x81000000)
. This achieves the important goal of memory isolation described at the beginning of this project.
The code of soft_tlb_switch
contains two simple loops that query the page_info_table
array. Read the code and make sure that you can map the code with the pictures above.
Page table translation ​
You have seen an implementation of virtual memory using PMP and software TLB (i.e., the memory copying conducted by soft_tlb_switch
). However, frequent memory copying is the performance bottleneck of many programs including operating systems. To avoid such bottleneck, we introduce page table translation, the CPU support leading to a more efficient implementation of the virtual memory interface, mmu_map
and mmu_switch
.
Take the time and read Chapter 12.1.11 and 12.3 of the RISC-V Instruction Set Manual. These chapters describe a page table translation scheme in RISC-V called Sv32. Next, we will only sketch the key concepts of Sv32 and give an example of translation.
Root and leaf page tables ​
In Sv32, a page table holds 1024 entries where each entry holds 4 bytes. Therefore, a page table occupies exactly 4KB of memory (i.e., one page). Page tables further form a two-level tree data structure with one root page table and several leaf page tables. The picture below shows an example of such tree data structure in the memory with 3 page tables.
When tracing this tree data structure, the CPU first reads a CSR called satp
which stores the address of the root page table. The picture shows a case where 1022 entries in the root page table are empty and 2 entries are non-empty, holding the address of a leaf page table. When we say "the address" of a page table, we mean the physical address which the CPU actually uses to access the memory instead of a virtual address belonging to a process.
Translation details ​
On the high-level, the operating system is in charge of allocating the pages for page tables, initializing the page table entries, and writing to the satp
CSR. When a process is running and reading/writing the memory, the CPU is in charge of reading satp
and the page tables in order to translate memory addresses.
Below is an example of how the operating system could initialize the page tables such that the CPU will translate virtual address 0x807F_EXXX
into physical address 0x808A_DXXX
for a process. Again, the XXX
here stands for any offset in range [0, PAGE_SIZE)
.
Initialize the satp
CSR ​
As the first step, the operating system invokes mmu_alloc
allocating a page for the root page table. As shown above, this page is 0x80EF_1000
(i.e., the return value of mmu_alloc
is (0x80EF1000-0x80800000)/PAGE_SIZE
, the physical page ID). Since the page number for 0x80EF_1000
is 0x80EF1
, we can write value 0x80080EF1
into the satp
CSR in order to help the CPU find this root page table, according to Figure 63 of the CPU manual.
Note that the MODE
bit is set to 1 meaning that page table translation is enabled. The ASID
field is for performance optimization which we can ignore for now. Lastly, we put 0x80EF1
into the PPN
field of satp
.
TIP
You may have noticed that 0x80EF1
has 20 bits but the PPN
field of satp
holds 22 bits. You can explore a bit about why this is the case.
Initialize the root page table ​
As the second step, we need to find an entry in the root page table and initialize it using the address of the leaf page table. To find the entry, read Figure 65 of the CPU manual.
Specifically, we wish to translate virtual address 0x807F_EXXX
and, according to Figure 65, VPN[1]
in address 0x807F_EXXX
is 0b1000000001
or 0x201
(i.e., the most significant ten bits). Therefore, in the picture, we see that entry #513 out of the 1024 in the root page table is selected for translating 0x807F_EXXX
. To see how to initialize this entry, read Figure 66 & 67 of the CPU manual.
Note that the leaf page table is at physical address 0x80B1_5000
, so the PPN is 0x80B15
. According to Figure 67, we should use 0x80B15
as bit#10 to bit#31 of the page table entry, so we can set the page table entry as 0x202C541F
(i.e., (0x80B15<<10) | 0x1F
) where the 0x1F
here means that we are setting the V
, R
, W
, X
and U
bits to 1. Setting these bits to 1 means that we allow code running in the user mode to read, write or execute this page of memory after translation. We can now use page tables to control memory access instead of PMP. To summarize, we have done the following things by far.
asm("csrw satp, %0" ::"r"(0x80080EF1));
uint *root = (uint*)0x80EF1000; /* decided by earth->mmu_alloc */
root[513] = 0x202C541F;
The CPU can now find the leaf page table when translating virtual address 0x807F_EXXX
.
Initialize the leaf page table ​
Figure 65 also tells us that the VPN[0]
of 0x807F_EXXX
is 0b1111111110
or 0x3FE
. As a result, we will initialize entry #1022 (i.e., 0x3FE) of the leaf page table and, intuitively, we want to initialize this entry with physical address 0x808A_D000
to complete the translation. Again, according to Figure 67, we should use the PPN 0x808AD
as bit #10 to bit #31 of this page table entry. We thus write 0x2022B41F
(i.e., (0x808AD<<10) | 0x1F
) into the entry.
uint *leaf = (uint*)0x80B15000; /* decided by earth->mmu_alloc */
leaf[1022] = 0x2022B41F;
At this point, the operating system has finished the initialization such that the CPU can now translate virtual address 0x807F_EXXX
into physical address 0x808A_DXXX
.
Get started ​
You can now use page table translation to implement virtual memory in earth/cpu_mmu.c
. As before, start from a fresh copy of egos-2000.
Identity map and S-mode ​
Read the setup_identity_region
and pagetable_identity_map
functions which setup an identity map. Identity map means that a virtual address is translated into a physical address of the same value (e.g., map 0x807F_EXXX
to 0x807F_EXXX
). These two functions help you understand how to setup page table entries with C code.
Then read the page_table_map
and page_table_switch
functions. You can see that they simply call soft_tlb_map
and soft_tlb_switch
. In other words, when you choose "page tables" as shown below, egos-2000 will enable page table translation with an identity map, and still use software TLB to conduct address translation.
> cd egos-2000
> make qemu
...
[CRITICAL] Choose a memory translation mechanism:
Enter 0: page tables
Enter 1: software TLB
[INFO] Page table translation is chosen
...
At this point, the CPU is conducting page table translation, but in a pretty useless way. Still, this can help us debug whether we have entered the correct privilege mode for page table translation. Specifically, you can find the following code in earth/boot.c
.
uint mstatus, M_MODE = 3, S_MODE = 1; /* U_MODE = 0 */
uint GRASS_MODE = (earth->translation == SOFT_TLB) ? M_MODE : S_MODE;
asm("csrr %0, mstatus" : "=r"(mstatus));
asm("csrw mstatus, %0" ::"r"((mstatus & ~(3 << 11)) |
(GRASS_MODE << 11) | (1 << 18)));
...
asm("mret");
Recall that you have used mstatus
and mret
to switch privilege modes in P3. The code here is doing something similar, but switch the CPU to the supervisor mode (or S-mode). The mret
above is the turning point: if "page tables" is chosen in egos-2000, all the code before this mret
accesses the physical memory directly and all the code after this mret
uses page table translation for memory access. In other words, page table translation only applies to code in the S-mode or U-mode, but not the M-mode.
TIP
This actually triggers a minor issue since timer interrupts will still bring the CPU to M-mode, but the kernel may still wish to enjoy page table translation. The solution is given at the end of grass/kernel.s
. Specifically, if we set the mstatus.MPRV
bit to 1, the CPU will do page table translation in M-mode for all memory access as if the CPU is under privilege mode mstatus.MPP
.
Page tables per process ​
After understanding the existing code for page table translation in egos-2000, your job is to create one set of page tables for every process. The comments in page_table_map
gives the detailed guidance. Roughly speaking, you should use identity map for the kernel and all the system processes while, for user processes, you should only map very few pages that belong to this process. Therefore, user processes will enjoy address translation while they won't be able to access the memory of the kernel or other processes.
In addition, you need to update function page_table_switch
by modifying the satp
CSR with the address of the root page table of process pid
. Recall that this function is invoked by proc_yield
every time the kernel switches the running process.
TIP
In real-world operating systems like Linux, the difference between process and thread is more on the conceptual level instead of the data structure level. Specifically, if two processes share the same root page table, we can call them two threads in one process since they share the memory.
Memory overhead ​
We have never talked about how to free page tables when a process terminates. The key is that the proc_free
function in grass/process.c
invokes earth->mmu_free(pid)
.
For all page tables allocated for process pid
, if we set the pid
field of the corresponding page_info_table
entries correctly, these page tables should be freed by mmu_free
just like the other pages of the terminated process. Nothing special needs to be done.
However, given that page table translation incurs memory overhead, we ask you to print out such overhead in mmu_free
as follows.
> /home/yunhao echo Hello, World!
Hello, World!
[INFO] mmu_free released 11 pages (2 page tables) for process 6
Make sure you can explain the numbers that you print out in mmu_free
.
Helper functions ​
Most of your modifications should be in the four functions we mentioned: page_table_map
, page_table_switch
, pagetable_identity_map
and setup_identity_region
. However, it can certainly be very useful to write helper functions like the one below. In general, feel free to modify or add anything in earth/cpu_mmu.c
.
/* This function walks the page table. Specifically, it returns the
**address** of the entry in the root page table that is used when
translating virtual address "va".
if alloc is non-zero, allocate the pages if necessary.
if alloc is zero, FATAL when cannot find the page. */
uint* walk(uint* root, uint* va, int alloc);
Accomplishments ​
You have finished reading library/elf
and earth/cpu_mmu.c
in egos-2000. Through the code, you have gained a precise understanding of how to create the virtual memory illusion for different processes. You have also learned the difference of thread and process which is closely related to virtual memory (specifically, the root page table).