Skip to content

Virtual Memory

An important goal of virtual memory is to make sure that a user process can only access its own memory. In particular, it should not be allowed to access the kernel memory or memory of another process. In P3, you have learned about PMP and hence we start by introducing a simple implementation of virtual memory that combines PMP with a scheme called software TLB which takes only 20 lines of code without any special CPU support.

This simple implementation helps you understand precisely the interface and functionalities of virtual memory. We then introduce page table translation, a more complex but also more widely-used CPU support for virtual memory. Your job is to implement virtual memory using page table translation and provide the same functionalities as PMP + software TLB.

TIP

We will use process instead of thread throughout this project. Typically, threads can access the memory of each other just like what you have implemented in P1. In contrast, a process typically cannot access the memory of another process and such isolation is enforced by virtual memory. This project will demonstrate such difference precisely with page table translation.

Software TLB

Consider two processes and they wish to use the same memory region. For example, all the processes in egos-2000 use 0x80800000 as the top of their stack (i.e., APPS_STACK_TOP in library/egos.h). Virtual memory means that the operating system creates an illusion to a process that it can exclusively use the memory below 0x80800000 for its stack.

For example, suppose two processes both use the load instruction with 0x807FFFFC as the address to read the highest 4 bytes of their stack. To provide the illusion, this 0x807FFFFC will be translated into two different addresses when accessing the memory. In other words, even though the two processes execute exactly the same instruction, the CPU can actually read different parts of the memory and return different 4 bytes. This makes sense because it is likely that different processes put different 4 bytes at the top of their stack.

The illusion here is that a process can only see that it reads address 0x807FFFFC while the operating system knows the translated address being used to actually access the memory. We call 0x807FFFFC the virtual address in this case and, for every process, it is translated into a physical address for the actual memory access.

Address translation

Let's read a data structure that records the translation from virtual to physical addresses.

c
/* library/egos.h */
#define PAGE_SIZE         4096
#define RAM_END           0x81000000
#define APPS_PAGES_BASE   0x80800000

/* earth/cpu_mmu.c */
#define APPS_PAGES_CNT (RAM_END - APPS_PAGES_BASE) / PAGE_SIZE
struct page_info {
    int  use;      /* Is this page free or allocated? */
    int  pid;      /* Which process owns this page? */
    uint vpage_no; /* Which virtual page of pid maps to this physial page? */
} page_info_table[APPS_PAGES_CNT];

In egos-2000, the 8MB region of [APPS_PAGES_BASE, RAM_END) holds the code, data, heap and stack of all processes. Therefore, if a process reads a virtual address like 0x807FFFFC, it will be translated into some physical address in this 8MB region. This 8MB is splitted into 2048 pages (i.e., APPS_PAGES_CNT=2048) and each page is 4KB (i.e., PAGE_SIZE).

There is a struct page_info for each of the 2048 pages recording translation information. Suppose the operating system allocates page #6 (i.e., [0x80806000, 0x80807000)) for the stack top of a process with pid=7.

  • The virtual address region [0x807FF000, 0x80800000) of this process directly translates to physical address region [0x80806000, 0x80807000). In other words, 0x807FFXXX will be translated to physical address 0x80806XXX for any XXX in range [0, PAGE_SIZE).

  • To record such translation in the data structure, page_info_table[6].pid should be set to 7 and page_info_table[6].vpage_no should be set to 0x807FF. We call 0x807FF the virtual page number, call 0x80806 the physical page number, call 6 the physical page ID, and lastly call XXX the offset within the page.

TIP

To make sure you understand, here is a simple exercise. Suppose the operating system allocates page #1995 out of the 2048 pages for the stack top of a process with pid=12. And suppose this process reads virtual address 0x807FF234 in it stack. What is the virtual page number, physical page number, physical page ID, and offset respectively for the address translation?

You should now have a concrete understanding of how the same virtual address (e.g., one in the stack) for different processes can be translated into different physical addresses.

The mmu_map interface

The mmu_map interface in struct earth is responsible for recording such translations and soft_tlb_map is an implementation of this interface. The translation example above can be done by calling earth->mmu_map(7, 0x807FF, 6).

c
/* library/egos.h */
struct earth {
    ...
    void (*mmu_map)(int pid, uint vpage_no, uint ppage_id);
    ...
};

/* earth/cpu_mmu.c */
void soft_tlb_map(int pid, uint vpage_no, uint ppage_id) {
    page_info_table[ppage_id].pid      = pid;
    page_info_table[ppage_id].vpage_no = vpage_no;
}
...
earth->mmu_map = soft_tlb_map;  /* when earth->translation is SOFT_TLB */
...

Next, let's read how earth->mmu_map is used in library/elf/elf.c where the operating system initializes the memory for every new process. Intuitively, when spawning a process, the operating system needs to allocate some pages from the 2048 for the code, data, heap and stack of the new process, leading to the mmu_alloc function.

c
/* earth/cpu_mmu.c */
uint mmu_alloc() {
    for (uint i = 0; i < APPS_PAGES_CNT; i++)
        if (!page_info_table[i].use) {
            page_info_table[i].use = 1;
            return i;
        }
    FATAL("mmu_alloc: no more free memory");
}

/* library/elf/elf.c */
void elf_load(int pid, elf_reader reader, int argc, void** argv) {
   ...
   ppage_id = earth->mmu_alloc();
   earth->mmu_map(pid, VPAGE_NO, ppage_id);
   /* This pattern appears 5 times in the code for different VPAGE_NO */
   ...
}

The mmu_alloc function returns a physical page ID. VPAGE_NO stands for the virtual page number. For code, data and heap, VPAGE_NO starts at 0x80400 and the number of pages allocated/mapped depends on the size of the application binary. For arguments of main() and system call arguments, two pages are allocated/mapped with VPAGE_NO 0x80600 and 0x80601. For stack, we map two pages with VPAGE_NO being 0x807FE and 0x807FF, and we assume that processes use at most 8KB of stack memory just for simplicity. The picture below illustrates elf_load for an application with 4 pages of code, data and heap.

Failed to load picture

Note that the "physical pages" area simply shows one possibility of how mmu_alloc picks the physical page ID out of [0, 2048) for each of the 8 page allocation.

TIP

There are a few other places referencing these numbers. The ORIGIN = 0x80400000 in library/elf/app.lds indicates that the compiler puts the application binary code starting at virtual page 0x80400. The li sp,0x80800000 in apps/app.s indicates that all applications use virtual page 0x807FF for the top of stack. The APPS_ARG and SYSCALL_ARG in library/egos.h corresponds to the virtual pages 0x80600 and 0x80601 respectively.

The mmu_switch interface

As you may have noticed, proc_yield invokes earth->mmu_switch for the process which runs next. And soft_tlb_switch is a simple implementation of earth->mmu_switch. When switching from process A to process B, soft_tlb_switch does two simple things.

First, it copies all the pages mapped for process A back to the buffer of 2048 pages.

Failed to load picture

Second, it copies all the pages mapped for process B back to [0x80400000, 0x80800000).

Failed to load picture

Note that, in the two pictures above, all the areas indicate physical memory. In other words, we use the 12MB physical memory region [0x80400000, 0x81000000) to create an illusion that every process owns memory region [0x80400000, 0x80800000) exclusively.

Whenever proc_yield picks a process to run next, soft_tlb_switch makes sure that the process can access its own bytes when invoking a load or store instruction with an address in region [0x80400000, 0x80800000). Moreover, recall that you could set a PMP region for [0x80400000, 0x80800000) in P3. As a result, if a process is running in the user mode, this process cannot access the memory pages of other processes which, as shown above, have been kept in memory region [0x80800000, 0x81000000). This achieves the important goal of memory isolation described at the beginning of this handout.

Function soft_tlb_switch has two simple loops which query page_info_table. Read this function yourself and make sure that you can connect the code with the pictures above.

Page table translation

You have seen an implementation of virtual memory using PMP and software TLB (i.e., the memory copying conducted by soft_tlb_switch). However, frequent memory copying is the performance bottleneck of many programs including operating systems. To avoid such bottleneck, we introduce page table translation, the CPU support leading to a more efficient implementation of the virtual memory interface (mmu_map and mmu_switch).

Take the time and read Chapter 12.1.11 and 12.3 of the RISC-V Instruction Set Manual. These chapters describe a page table translation scheme in RISC-V called Sv32. Next, we will only sketch the key concepts of Sv32 and give an example of translation.

Root and leaf page tables

In Sv32, a page table holds 1024 entries where each entry holds 4 bytes. Therefore, a page table occupies exactly 4KB of memory (i.e., one page). Page tables further form a two-level tree data structure with one root page table and several leaf page tables. The picture below shows an example of such tree data structure in the memory with 3 page tables.

Failed to load picture

When tracing this tree data structure, the CPU first reads a CSR called satp which stores the address of the root page table. The picture shows a case where 1022 entries in the root page table are empty and 2 entries are non-empty, holding the address of a leaf page table. When we say "the address" of a page table, we mean the physical address which the CPU actually uses to access the memory instead of a virtual address belonging to a process.

Translation details

On the high-level, the operating system is in charge of allocating the pages for page tables, initializing the page table entries, and writing to the satp CSR. When a process is running and reading/writing the memory, the CPU is in charge of reading satp and the page tables in order to translate memory addresses.

Below is an example of how the operating system could initialize the page tables such that the CPU will translate virtual address 0x807F_EXXX into physical address 0x808A_DXXX for a process. Again, the XXX here stands for any offset in range [0, PAGE_SIZE).

Failed to load picture

Initialize the satp CSR

As the first step, the operating system invokes mmu_alloc and allocates a page for the root page table. As shown above, this page is 0x80EF_1000 (i.e., the return value of mmu_alloc is 0x80EF1-0x80800, the physical page ID). Given the physical page number 0x80EF1, we can write value 0x80080EF1 into the satp CSR in order to help the CPU find this root page table, according to Figure 63 of the RISC-V Manual.

Failed to load picture

Note that the MODE bit is set to 1 meaning that page table translation is enabled. The ASID field is for performance optimization which we can ignore for now. Lastly, we put 0x80EF1 into the PPN field of satp.

TIP

You may have noticed that 0x80EF1 has 20 bits but the PPN field of satp holds 22 bits. You can explore yourself about why PPN holds 22 bits in 32-bit RISC-V.

Initialize the root page table

As the second step, we need to find an entry in the root page table and initialize it using the address of the leaf page table. To find the entry, read Figure 65 of the RISC-V Manual.

Failed to load picture

Specifically, we wish to translate virtual address 0x807F_EXXX and, according to Figure 65, VPN[1] in address 0x807F_EXXX is 0b1000000001 or 0x201 (i.e., the most significant ten bits). Therefore, in the picture above, we see that entry #513 from the 1024 in the root page table is used to translate address 0x807F_EXXX. To see how this entry should be initialized, read Figure 66 & 67 of the RISC-V Manual.

Failed to load picture

Note that the leaf page table is at physical address 0x80B1_5000, so the PPN is 0x80B15. According to Figure 67, we should use 0x80B15 as bit#10 to bit#31 of the page table entry, so we can set the page table entry as 0x202C541F (i.e., (0x80B15<<10) | 0x1F) where the 0x1F here means that we are setting the V, R, W, X and U bits to 1. Setting these bits to 1 means that we allow code running in the user mode to read, write or execute this page of memory after translation. We can now use page tables to control memory access instead of PMP. To summarize, we have done the following things by far.

c
asm("csrw satp, %0" ::"r"(0x80080EF1));
uint *root = (uint*)0x80EF1000; /* decided by earth->mmu_alloc */
root[513]  = 0x202C541F;

The CPU can now find the leaf page table when translating virtual address 0x807F_EXXX.

Initialize the leaf page table

Figure 65 also tells us that the VPN[0] of 0x807F_EXXX is 0b1111111110 or 0x3FE. As a result, we will initialize entry #1022 (i.e., 0x3FE) of the leaf page table and, intuitively, we want to initialize this entry with physical address 0x808A_D000 to complete the translation. According to Figure 67, we should use the PPN 0x808AD as bit #10 to bit #31 of this entry. We thus write 0x2022B41F (i.e., (0x808AD<<10) | 0x1F) into entry #1022.

c
uint *leaf = (uint*)0x80B15000; /* decided by earth->mmu_alloc */
leaf[1022] = 0x2022B41F;

At this point, the operating system has finished the initialization such that the CPU will now translate virtual address 0x807F_EXXX into physical address 0x808A_DXXX.

Get started

You can now use page table translation to implement virtual memory in earth/cpu_mmu.c. Like before, start from a fresh copy of egos-2000.

Identity mapping

Read the setup_identity_region and pagetable_identity_map functions which setup an identity mapping. Identity mapping means that a virtual address is translated into a physical address of the same value (e.g., map 0x807F_EXXX to 0x807F_EXXX). These two functions help you understand how to setup page table entries with C code.

Then read the page_table_map and page_table_switch functions. You can see that they simply call soft_tlb_map and soft_tlb_switch. Hence, when you choose "page tables" as shown below, egos-2000 will enable page table translation with an identity mapping, and still use software TLB to conduct address translation.

shell
> cd egos-2000
> make qemu
...
[CRITICAL] Choose a memory translation mechanism:
Enter 0: page tables
Enter 1: software TLB
[INFO] Page table translation is chosen
...

With identity mapping, the CPU will conduct page table translations in a pretty useless way. Still, if we make mistakes in the identity mapping code, there could be memory exceptions after page table translation starts to take effect. In particular, page table translation starts to take effect in egos-2000 after the mret in grass/init.c.

c
uint mstatus, M_MODE = 3, U_MODE = 0;
uint GRASS_MODE = (earth->translation == SOFT_TLB) ? M_MODE : U_MODE;
asm("csrr %0, mstatus" : "=r"(mstatus));
mstatus = (mstatus & ~(3 << 11)) | (GRASS_MODE << 11);
asm("csrw mstatus, %0" ::"r"(mstatus));
...
asm("mret");

Code running in the machine mode always accesses the memory with physical addresses. If earth->translation == PAGE_TABLE, the CPU will switch to user mode after mret and run the code of the first process, GPID_PROCESS. Since we have enabled Sv32 in the satp CSR and page table translation certainly takes effects in user mode, page table translation starts to take effect right after this mret. In general, all processes should run in user mode with page table translation taking effect, and the kernel should always run in machine mode using physical addresses.

TIP

There actually exists a third privilege mode called supervisor mode (S_MODE = 1) and the s in satp stands for supervisor mode, just like the m in mstatus stands for machine mode. Code in machine mode can access the CSRs for supervisor mode, but not vice versa. We are not using this supervisor mode because of simplicity -- you only need to reason about two privilege modes.

Page tables per process

After understanding the existing code for page table translation in egos-2000, your job is to create one set of page tables for every process. Read the comments in page_table_map. When invoking page_table_map for process pid for the first time, you should initialize the root and leaf page tables for pid. For system processes, this means that you should setup an identity mapping just like before. For user processes, setup an identity mapping only for page SYSCALL_ARG + PAGE_SIZE which is the address of workdir in apps/app.h. All the processes in egos-2000 share the same work directory, which is the directory shown in the shell. Save the address of the root page table in pid_to_pagetable_base[pid], and modify the page table entries based on the arguments of page_table_map.

In addition, you need to update function page_table_switch by modifying the satp CSR with the address of the root page table of process pid. Recall that this function is invoked by proc_yield for context switch. Lastly, implement the page_table_translate function which returns the physical address mapped from vaddr for process pid.

It is possible to reuse functions setup_identity_region and pagetable_identity_map as is, but feel free to modify them if you find it necessary. After you finish, you should be able to boot up egos-2000 normally and run commands like cat README or echo Hello!.

TIP

In real-world operating systems like Linux, the difference between process and thread is more on the conceptual level instead of the code level. Specifically, if two processes share the same root page table, we can call them two threads within one process since they share the memory.

Kill malicious applications

Recall that, in P3, your code for memory protection can prevent malicious applications like crash1 and crash2 from halting the operating system. You can do the same using page table translation. Terminate the current process in excp_entry() of grass/kernel.c if it raises an exception, and make sure that the OS proceeds normally after crash1/crash2.

> make qemu
...
[CRITICAL] Choose a memory translation mechanism:
Enter 0: page tables
Enter 1: software TLB
[INFO] Page table translation is chosen
...
[CRITICAL] Welcome to the egos-2000 shell!
➜ /home/yunhao crash1
_sbrk: heap grows too large
[INFO] process 6 terminated with exception 15
➜ /home/yunhao crash2
[INFO] process 7 terminated with exception 15
...

Memory overhead

We have never talked about how to free page tables when a process terminates. The key is that the proc_free function in grass/process.c invokes earth->mmu_free(pid).

For all page tables allocated for process pid, if we set the pid field of the corresponding page_info_table entries correctly, the page tables should be freed by mmu_free just like the other pages of the terminated process. Nothing special needs to be done.

However, given that page table translation incurs memory overhead, we ask you to print out such overhead in mmu_free as follows.

> make qemu
...
➜ /home/yunhao echo Hello, World!
Hello, World!
[INFO] mmu_free released 11 pages (2 page tables) for process 6

Make sure you can explain the numbers that you print out in mmu_free.

Accomplishments

You have finished reading library/elf and earth/cpu_mmu.c in egos-2000. Through the code, you have gained a precise understanding of how to create the virtual memory illusion for different processes. You have also learned the difference of thread and process which is closely related to virtual memory (specifically, the root page table).

"... any person ... any study."