Hello, World! ​
This project helps you understand the following C program.
int main() {
printf("Hello, World!");
return 0;
}
Specifically, you will learn about 3 types of CPU instructions (integer computation, control transfer, memory access) and 4 types of memory regions (code, data, heap, stack), which are fundamental to every computer program.
Prerequisites: an entry-level systems course equivalent to Cornell CS3410 or CMU 15-213 is typically required before learning OS. This project aims at revisiting the concepts of CPU instructions and memory regions instead of taking the place of an entry-level course.
Get started ​
Follow the instructions of step1 and step2 in USAGES.md. After you can compile egos-2000 and run it on QEMU, switch to the hello
branch by git checkout hello
. Run this branch by make qemu
and then you will see
-------- Compile Hello, World! --------
riscv64-unknown-elf-gcc -march=rv32ima -mabi=ilp32 -Wl,--gc-sections -ffunction-sections -fdata-sections -fdiagnostics-show-option hello.s hello.c -Thello.lds -nostdlib -lc -lgcc -o hello.elf
riscv64-unknown-elf-objdump --source --all-headers --demangle --line-numbers --wide hello.elf > hello.lst
riscv64-unknown-elf-objcopy -O binary hello.elf hello.bin
-------- Run Hello-World on QEMU --------
qemu-system-riscv32 -nographic -machine sifive_u -smp 2 -bios hello.bin
Hello, World!
The code in this branch simply prints Hello, World!
to the screen and then stalls. You can exit QEMU by typing ctrl+a
and then x
, after which you will see QEMU: Terminated
.
Next, we explain the code in this hello
branch which gives concrete examples of CPU instructions (integer computation, control transfer, memory access) and memory regions (code, data, heap, stack). After the explanation, you will start from this code and implement the printf
function as an exercise.
Instructions and memory ​
Before main()
​
Let's start with the first two lines of hello.lds
.
OUTPUT_ARCH("riscv")
ENTRY(_start)
The first line says that the output architecture for this computer program is RISC-V. This book will focus on RISC-V since it is widely used for education, but there also exist other influential CPU architectures such as ARM and Intel x86. The second line says that the entry point of this program is _start
which is defined in hello.s
. We will move our attention to hello.s
and ignore the rest of hello.lds
for now.
According to _start
in hello.s
, the first two CPU instructions in this program are
_start:
csrr t0, mhartid
bnez t0, _end
You could regard a hart
as a CPU core. Say a CPU has 4 cores, the mhartid
would be 0, 1, 2 and 3. The first instruction loads the mhartid
to t0
, a general-purpose register. This book assumes a 32-bit RISC-V CPU, which means that each general-purpose register holds 4 bytes and each CPU core holds 31 general-purpose registers.
Note that mhartid
is a Control and Status Register (CSR) which is not a general-purpose register. You will meet more CSRs later in this book and, for now, it is enough to know that the csrr
instruction moves the value of mhartid
into t0
.
The second instruction, bnez
, stands for branch-if-not-equal-to-zero and it is an example of control transfer instructions. Therefore, only core #0 will execute the instruction next to bnez
while all the other cores will jump to _end
and stall there. There are two instructions next to bnez
which core#0 will execute:
li sp, 0x80400000
call main
The li
instruction loads an integer, 0x80400000
in this case, to the stack pointer register sp
, which is another general-purpose register. li
is an example of integer computation instructions because it modifies a register with an integer. Other examples include addition, multiplication, bitwise shift, bitwise AND/OR/XOR operations, etc.
After setting sp
, the call
instruction transfers the control to main()
in hello.c
. The call
instruction does two things: (1) point the program counter to the first instruction of main()
; (2) set the ra
register to the address of call _end
in hello.s
, which is the instruction next to call main
. Intuitively, sp
specifies the memory that main()
can use and ra
specifies the return address, i.e., the program counter after main()
returns. We make this intuition concrete by reading the code of main()
.
In main()
​
The C code of main()
in hello.c
is simple:
int main() {
char* msg = "Hello, World!\n\r";
terminal_write(msg, 15);
return 0;
}
After make qemu
, you can find the CPU instructions for main()
in hello.lst
:
80000084 <main>:
main():
80000084: fe010113 addi sp,sp,-32
80000088: 00112e23 sw ra,28(sp)
8000008c: 00812c23 sw s0,24(sp)
80000090: 02010413 addi s0,sp,32
80000094: 800007b7 lui a5,0x80000
80000098: 0c878793 addi a5,a5,200 # 800000c8 <main+0x44>
8000009c: fef42623 sw a5,-20(s0)
800000a0: 00f00593 li a1,15
800000a4: fec42503 lw a0,-20(s0)
800000a8: f71ff0ef jal ra,80000018 <terminal_write>
800000ac: 00000793 li a5,0
800000b0: 00078513 mv a0,a5
800000b4: 01c12083 lw ra,28(sp)
800000b8: 01812403 lw s0,24(sp)
800000bc: 02010113 addi sp,sp,32
800000c0: 00008067 ret
There are 16 instructions for main
starting at memory address 0x80000084
and each of them occupies 4 bytes of memory. Therefore, the call main
instruction in hello.s
sets the program counter to 0x80000084
which points to an addi
instruction. The binary form of this addi
instruction is 0xfe010113
and such binary encoding is explained in the RISC-V Instruction Set Manual. For our purpose, we can ignore the binary encodings and focus on the instructions themselves. Let's start by reading the 7 highlighted instructions.
Common structure ​
The two addi
instructions at address 0x80000084
and 0x800000bc
add integers -32
and 32
to the sp
register, which indicates that the 32-byte memory region [0x80400000 - 32, 0x80400000)
is used for main()
. The ret
at the end will cause a control transfer to the instruction pointed by ra
, i.e., the call _end
in hello.s
as we explained earlier. Such usage of sp
and ra
is the common structure of every function in C.
Note that ra
should still hold the address of call _end
when executing ret
, which is ensured by these two instructions:
80000088: 00112e23 sw ra,28(sp)
...
800000b4: 01c12083 lw ra,28(sp)
28(sp)
means adding 28 to sp
as a memory address. Since sp
equals to 0x80400000 - 32
, the sw
instruction stores the 4 bytes of ra
into memory region [0x80400000 - 4, 0x80400000)
and the lw
instruction loads 4 bytes from the region back to ra
. As long as this region is unmodified, ra
can be freely modified during the execution of main()
.
These are examples of memory access instructions and another example is lb/sb
which loads/stores one byte from/to the memory while lw/sw
loads/stores a word of 4 bytes.
TIP
By this point, we have seen examples of all 3 types of CPU instructions (integer computation, control transfer, memory access) which constitute the user-level instructions of RISC-V. P1 is called User-level Threads because you will learn about threads solely based on these instructions. After P1, you will learn about more instructions that support an operating system.
First line ​
Next, we explain why the first line of main()
:
char* msg = "Hello, World!\n\r";
is compiled to the following instructions.
80000090: 02010413 addi s0,sp,32
80000094: 800007b7 lui a5,0x80000
80000098: 0c878793 addi a5,a5,200 # 800000c8 <main+0x44>
8000009c: fef42623 sw a5,-20(s0)
lui
stands for load-upper-immediate which means that the upper 20 bits of a5
are set to 0x80000
while the lower 12 bits are set to 0. Therefore, these 4 instructions set s0
to 0x80400000
, set a5
to 0x800000c8
, and stores the 4-byte value 0x800000c8
to memory address 0x80400000 - 20
. This means two things:
The compiler puts the
"Hello, World!\n\r"
string at memory address0x800000c8
.The compiler decides that
msg
, as a local variable ofmain()
, occupies the 4 bytes of memory starting at0x80400000 - 20
in the 32-bytemain()
stack.
TIP
This is a good point to revise the concept of pointers. In C, a char
is simply one byte storing the ASCII encoding of a character. For example, the ASCII encoding for "H" and "e" are 72 and 101 respectively so that memory address 0x800000c8
holds 72 and 0x800000c9
holds 101. The type of msg
, namely char*
, is called char pointer, meaning that variable msg
holds the memory address of a char. This book targets 32-bit CPUs so that every memory address is represented by 4 bytes (i.e., all pointers hold 4 bytes). One can dereference the msg
pointer with syntax *msg
, e.g., *msg == 'H'
and *(msg + 1) == 'e'
.
Second and third line ​
Next, we explain why the last two lines of main()
:
terminal_write(msg, 15);
return 0;
are compiled to the following instructions.
800000a0: 00f00593 li a1,15
800000a4: fec42503 lw a0,-20(s0)
800000a8: f71ff0ef jal ra,80000018 <terminal_write>
800000ac: 00000793 li a5,0
800000b0: 00078513 mv a0,a5
In short, registers a0
and a1
hold the first two arguments when calling a function, and a0
further holds the return value. Therefore, a1
is set to 15 and a0
is set to 0x800000c8
by loading 4 bytes from address 0x80400000 - 20
. jal
is an alias of call
and the first instruction of terminal_write
is at address 0x80000018
. After terminal_write
returns, a0
is set to 0 as the return value of main()
. It is possible to do li a0,0
directly without touching a5
if the compiler did more optimizations.
In terminal_write()
​
For terminal_write()
, we focus on explaining the two lines of code below and you could read all its instructions in hello.lst
as an exercise.
while ( *(int*)(0x10010000UL) & (1 << 31) );
*(int*)(0x10010000UL) = str[i];
The first line waits for the terminal to become ready for printing a character. The second line prints a character by writing the ASCII code to a special memory address. At this point, it is enough to know that writing to this special address will print a character on the screen. When you reach P6, you will learn more about the architectural support for device I/O.
*(int*)(0x10010000UL)
converts 0x10010000
to an int pointer and then dereferences the pointer, so that it stands for the 4 bytes starting at memory address 0x10010000
. The first line above would read these 4 bytes while the second line writes to these 4 bytes.
We have examined the two functions in this program. We now review the memory regions touched by the CPU instructions.
Code, data and stack ​
Let's start with the following part of hello.lst
:
Sections:
Idx Name Size VMA LMA File off Algn Flags
0 .init 00000014 80000000 80000000 00001000 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text 000000ac 80000018 80000018 00001018 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .rodata 00000010 800000c8 800000c8 000010c8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
A section is simply a memory region specified by a starting address and its size.
The .init
section starts at 0x80000000
and holds all the instructions in hello.s
. The RISC-V machine emulated by QEMU uses 0x80000000
to initialize its program counter. The .text
section holds the instructions for terminal_write
starting at 0x80000018
and the instructions for main
starting at 0x80000084
. Together, they form the code region of this program, [0x80000000, 0x800000c8)
.
The 16 bytes starting at 0x800000c8
form the data region of this program. Typically, the data region would contain 3 sections:
.rodata
holds read-only data such as the"Hello, World!\n\r"
string..data
holds global variables with non-zero initial values..bss
holds global variables with zero as their initial value.
This simple program does not have any global variables so we only see .rodata
. The end of .rodata
is 0x800000d8
, which is the start of the heap region. You will learn about this region later in the dynamic memory allocation section.
Lastly, the stack region starts at 0x80400000
and grows down. Specifically, the stack for main
is [0x80400000 - 32, 0x80400000)
and the stack for terminal_write
is the 48-byte region [0x80400000 - 32 - 48, 0x80400000 - 32)
. The choice of 0x80400000
is not mandatory and it is possible to set sp
to some different value in hello.s
.
Formatted output ​
The terminal_write
function takes a string and prints it on the screen, but we may further wish to print integers, pointers in hexadecimal format, etc. We thus introduce the printf
function which converts a format into a string and prints it on the screen.
printf("%d is a number", 1234);
printf("%s is a string", "abcd");
printf("%s-%d is awesome!", "egos", 2000);
These examples show that printf
takes a string as the first argument which has special meanings: %d
and %s
mean placeholders for integers and strings. For example, the last printf
would print "egos-2000 is awesome!" according to the 3 arguments.
Now, uncomment the printf
from the main
function of hello.c
as well as the code block containing format_to_str
and printf
. Do make qemu
and then you will see:
Hello, World!
egos-2000 is awesome!
Let's start by reading the first 3 lines of this code block:
#include <string.h> // for strlen() and strcat()
#include <stdlib.h> // for itoa()
#include <stdarg.h> // for va_start(), va_end() and va_arg()
Intuitively, the code is including part of the C standard library. For example, strlen
takes a string and returns its length, and the code of strlen
is provided by the string.h
library in the C compiler so that you don't need to write it yourself. Similarly, strcat
in this library concatenates two strings into a single string and itoa
in stdlib.h
converts an integer to a string in ASCII encoding (i.e., integer-to-ASCII).
Lastly, stdarg.h
provides va_start
, va_end
and va_arg
, where va
means variable argument (i.e., the argument list of printf
does not have a fixed length so that different invocations of printf
can take different numbers of arguments). In format_to_str
, the va_arg(args, int)
reads an int
-type argument from the list and the itoa
converts this integer to a string, appending to the end of the out
string.
TIP
Please do read the assembly code of format_to_str
and printf
in hello.lst
. You can skip library functions like strlen
or strcat
, and focus on instructions within these two functions. You will get a precise understanding by connecting the C code with the CPU instructions, which verbal explanations cannot give to you. For example, you will see how va_arg
is compiled to a few load and store instructions without doing any function calls.
After gaining a precise understanding of format_to_str
and printf
, you need to modify format_to_str
so that the code below works correctly.
printf("%c is character $", '$');
printf("%c is character 0", (char)48);
printf("%x is integer 1234 in hexadecimal", 1234);
printf("%u is the maximum of unsigned int", (unsigned int)0xFFFFFFFF);
printf("%p is the hexadecimal address of the hello-world string", msg);
Dynamic memory allocation ​
The char buf[512]
in printf
allocates 512 bytes on the stack of printf
to hold the output string generated by format_to_str
. However, the output string can be longer than 512 bytes and, during compile time, the length of the output string is usually unknown. This leads to the need of dynamic memory allocation:
int printf(const char* format, ...) {
/* TODO: calculate the length of the output string as buf_len */
char *buf = malloc(buf_len);
...
terminal_write(buf, strlen(buf));
free(buf);
/* the memory region returned by malloc() is no longer needed */
return 0;
}
In this last part of P0, you need to implement the TODO
above which initializes an integer variable buf_len
as the argument of malloc
, a function provided by stdlib.h
. After malloc
, the buf
pointer holds an address such that the buf_len
bytes starting at this address can be freely used. After invoking free
, this region cannot be used any more.
Heap ​
Before explaining more about malloc
and free
, we explain the heap memory region with the code below which you need to uncomment from hello.c
.
extern char __heap_start, __heap_max;
static char* brk = &__heap_start;
char* _sbrk(int size) {
if (brk + size > (char*)&__heap_max) {
terminal_write("_sbrk: heap grows too large\r\n", 29);
return NULL;
}
char* old_brk = brk;
brk += size;
return old_brk;
}
__heap_start
and __heap_max
are defined in hello.lds
: __heap_start
is the end of the data region (i.e., the end of the .bss
section) while __heap_max
is hard-coded to 0x80200000
, just like the initial stack pointer is hard-coded to 0x80400000
in hello.s
.
The purpose here is to ensure that the heap and stack do not overlap when the heap grows up and the stack grows down, which spawns the concept of break pointer (i.e., the brk
variable above pointing to the end of the heap). As long as the break pointer is lower than the stack pointer, the stack and the heap do not overlap.
In this project, we simply assume that sp
is never lower than 0x80200000
(i.e., the stack is never larger than 2MB) and the brk
variable is never greater than 0x80200000
.
Malloc and free ​
The initial heap size is 0 since brk
is initialized as __heap_start
and the heap region is defined as [brk, __heap_start)
. Say malloc(512)
is the first time of invoking malloc
. Three things will happen within malloc
:
- Invoke
_sbrk
with an argument greater than512
, say4096
(i.e., 4KB); - With the return value of
_sbrk
, initialize some data structures in the 4KB heap region; - With the data structures, find
512
bytes in the heap region and return its address.
Say malloc(32)
is invoked next. It is likely that malloc
can find 32 bytes using the data structures without having to invoke _sbrk
again and increase the heap. During free
, the data structures are updated so that the freed memory could be reused by a later malloc
. For example, multiple invocations of printf
may use the same memory region to hold the output string (i.e., malloc
returns the same pointer multiple times).
The data structure operations for malloc
and free
are implemented in the stdlib.h
library from the C compiler so that you don't have to know the details, just like you don't have to know the details of itoa
or strcat
. However, you are more than welcome to design your own data structures and write your own malloc/free
functions.
Accomplishments ​
You have finished reading library/libc
in egos-2000. You also get the intuition behind library/elf
which initializes the 4 memory regions for a program, given the executable file of this program in the ELF format.