Hello, World! ​
This project helps you understand the following C program.
int main() {
printf("Hello, World!");
return 0;
}
Specifically, you will learn about 3 types of CPU instructions (integer computation, control transfer, memory access) and 4 types of memory regions (code, data, heap, stack), which are fundamental to every computer program.
Prerequisites: an entry-level systems course equivalent to Cornell CS3410 or CMU 15-213 is typically required before learning OS. This project aims at revisiting the concepts of CPU instructions and memory regions instead of taking the place of an entry-level course.
Get started ​
Follow the instructions of step1 and step2 in USAGES.md. After you can compile egos-2000 and run it on QEMU, switch to the hello
branch by git checkout hello
. Run this branch by make qemu
and then you will see
-------- Compile Hello, World! --------
riscv-none-elf-gcc -march=rv32ima_zicsr -mabi=ilp32 -Wl,--gc-sections -ffunction-sections -fdata-sections -fdiagnostics-show-option hello.s hello.c -Thello.lds -nostdlib -lc -lgcc -o hello.elf
riscv-none-elf-objdump --source --all-headers --demangle --line-numbers --wide hello.elf > hello.lst
riscv-none-elf-objcopy -O binary hello.elf hello.bin
-------- Run Hello-World on QEMU --------
qemu-system-riscv32 -nographic -machine virt -smp 1 -bios hello.bin
Hello, World!
The code in this branch simply prints Hello, World!
to the screen and then stalls. You can exit QEMU by typing ctrl+a
and then x
, after which you will see QEMU: Terminated
.
Next, we explain the code in this hello
branch which gives concrete examples of CPU instructions (integer computation, control transfer, memory access) and memory regions (code, data, heap, stack). After the explanation, you will start from this code and implement the printf
function as an exercise.
Instructions and memory ​
Before main()
​
Let's start with the first two lines of hello.lds
.
OUTPUT_ARCH("riscv")
ENTRY(_start)
The first line says that the output architecture for this computer program is RISC-V. This book will focus on RISC-V since it is widely used for education, but there also exist other influential CPU architectures such as ARM and Intel x86. The second line says that the entry point of this program is _start
, defined in hello.s
. We will move our attention to hello.s
and ignore the rest of hello.lds
for now.
According to _start
in hello.s
, the first two CPU instructions in this program are
li sp, 0x80400000
call main
The li
instruction loads the 4-byte integer 0x80400000
to the stack pointer register sp
. li
is an example of integer computation instructions because it modifies a register with an integer. Other examples of integer computation include addition, multiplication, division, bitwise shift, bitwise AND/OR/XOR operations, etc.
After setting sp
, the call
instruction transfers the control to main()
in hello.c
. The call
instruction is an example of control transfer instructions, and it does two things: (1) point the program counter to the first instruction of main()
; (2) set the ra
register to the address of call _end
in hello.s
, which is the instruction next to call main
. Intuitively, sp
specifies the memory that main()
can use on the stack and ra
specifies the return address, i.e., the program counter after main()
returns. We make this intuition concrete by reading the code of main()
.
In main()
​
The C code of main()
in hello.c
is simple:
int main() {
char* msg = "Hello, World!\n\r";
terminal_write(msg, 15);
return 0;
}
After make
, you can find the CPU instructions for main()
in hello.lst
:
80000078 <main>:
main():
80000078: fe010113 addi sp,sp,-32
8000007c: 00112e23 sw ra,28(sp)
80000080: 00812c23 sw s0,24(sp)
80000084: 02010413 addi s0,sp,32
80000088: 800007b7 lui a5,0x80000
8000008c: 0b878793 addi a5,a5,184 # 800000b8 <main+0x40>
80000090: fef42623 sw a5,-20(s0)
80000094: 00f00593 li a1,15
80000098: fec42503 lw a0,-20(s0)
8000009c: f75ff0ef jal 80000010 <terminal_write>
800000a0: 00000793 li a5,0
800000a4: 00078513 mv a0,a5
800000a8: 01c12083 lw ra,28(sp)
800000ac: 01812403 lw s0,24(sp)
800000b0: 02010113 addi sp,sp,32
800000b4: 00008067 ret
There are 16 instructions for main
starting at memory address 0x80000078
and each of them occupies 4 bytes of memory. Therefore, the call main
instruction in hello.s
sets the program counter to 0x80000078
which points to an addi
instruction. The binary form of this addi
instruction is 0xfe010113
and such binary encoding is explained in the RISC-V Instruction Set Manual. For our purpose, we can ignore the binary encodings and focus on the instructions themselves. Let's start by reading the 7 highlighted instructions.
Common structure ​
The two addi
instructions at address 0x80000078
and 0x800000b0
add integers -32
and 32
to the sp
register, which indicates that the 32-byte memory region [0x80400000 - 32, 0x80400000)
is used for main()
. The ret
at the end will cause a control transfer to the instruction pointed by ra
, i.e., the call _end
in hello.s
as we explained earlier. Such usage of sp
and ra
is the common structure of every function in C.
Note that ra
should still hold the address of call _end
when executing ret
, which is ensured by these two instructions:
8000007c: 00112e23 sw ra,28(sp)
...
800000a8: 01c12083 lw ra,28(sp)
28(sp)
means adding 28 to sp
as a memory address. Since sp
equals to 0x80400000 - 32
, the sw
instruction stores the 4 bytes of ra
into memory region [0x80400000 - 4, 0x80400000)
and the lw
loads 4 bytes from the region back to ra
. As long as this 4-byte region is unmodified, register ra
can be freely modified during the execution of main()
.
These are examples of memory access instructions and another example is lb/sb
which loads/stores one byte from/to the memory while lw/sw
loads/stores a word of 4 bytes.
TIP
By this point, we have seen examples of all the 3 types of CPU instructions (integer computation, control transfer, memory access) which constitute the user-level instructions of RISC-V. P0 and P1 will only use user-level instructions. Starting from P2, you will see more instructions and registers that support an operating system.
First line of main()
​
char* msg = "Hello, World!\n\r";
is compiled to the following instructions.
80000084: 02010413 addi s0,sp,32
80000088: 800007b7 lui a5,0x80000
8000008c: 0b878793 addi a5,a5,184 # 800000b8 <main+0x40>
80000090: fef42623 sw a5,-20(s0)
lui
stands for load-upper-immediate which means that the upper 20 bits of a5
are set to 0x80000
while the lower 12 bits are set to 0. Therefore, these 4 instructions (1) set s0
to 0x80400000
; (2) set a5
to 0x800000b8
; (3) store the 4-byte of 0x800000c8
at memory address 0x80400000 - 20
. This means two things:
The compiler puts the
"Hello, World!\n\r"
string at memory address0x800000b8
.The compiler decides that
msg
, as a local variable ofmain()
, occupies the 4 bytes of memory starting at0x80400000 - 20
in the 32-bytemain()
stack.
TIP
This is a good point to revise the concept of pointers. In C, a char
is simply one byte storing the ASCII encoding of a character. For example, the ASCII encoding for "H" and "e" are 72 and 101 respectively so that memory address 0x800000b8
holds 72 and 0x800000b9
holds 101. The type of msg
, namely char*
, is called char pointer, meaning that variable msg
holds the memory address of a char. This book targets 32-bit CPUs so that every memory address is represented by 4 bytes (i.e., all pointers hold 4 bytes). One can dereference the msg
pointer with syntax *msg
, e.g., *msg == 'H'
and *(msg + 1) == 'e'
.
Second and third line ​
terminal_write(msg, 15);
return 0;
are compiled to the following instructions.
80000094: 00f00593 li a1,15
80000098: fec42503 lw a0,-20(s0)
8000009c: f75ff0ef jal 80000010 <terminal_write>
800000a0: 00000793 li a5,0
800000a4: 00078513 mv a0,a5
In short, registers a0
and a1
hold the first two arguments when calling a function, and a0
further holds the return value. Therefore, a1
is set to 15 and a0
is set to 0x800000b8
by loading 4 bytes from address 0x80400000 - 20
. jal
is an alias of call
and the first instruction of terminal_write
is at address 0x80000010
. After terminal_write
returns, a0
is set to 0 as the return value of main()
. It is possible to do li a0,0
directly without touching a5
if the compiler did more optimizations.
In terminal_write()
​
For terminal_write()
, we only explain the C code and you could read all its instructions in hello.lst
as an exercise.
void terminal_write(const char *str, int len) {
for (int i = 0; i < len; i++) {
*(char*)(0x10000000UL) = str[i];
}
}
At this point, it is enough to know that writing a byte to the special address 0x10000000
will print a character on the screen. P5 will introduce the architectural support for device I/O.
*(char*)(0x10000000UL)
converts 0x10000000
into a char pointer and then dereferences the pointer, so it stands for the byte at memory address 0x10010000
.
Code, data and stack ​
We have examined the two functions in this program. We now review the memory regions touched by the CPU instructions. Let's start with the following part of hello.lst
:
Sections:
Idx Name Size VMA LMA File off Algn Flags
0 .init 0000000c 80000000 80000000 00001000 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text 000000a8 80000010 80000010 00001010 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .rodata 00000010 800000b8 800000b8 000010b8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA
A section is simply a memory region specified by a starting address and its size.
The .init
section starts at 0x80000000
and holds the three instructions in hello.s
. By default, 0x80000000
is used to initialize the program counter in QEMU. The .text
section holds instructions for terminal_write()
and main()
starting at 0x80000010
and ending at 0x800000b8
. Together, .init
and .text
form the code region of this program.
The 16 bytes starting at 0x800000b8
form the data region of this program. The data region would typically contain 3 sections:
.rodata
holds read-only data such as the"Hello, World!\n\r"
string..data
holds global variables with non-zero initial values..bss
holds global variables with zero as their initial value.
This simple program does not have any global variables, so we only see .rodata
. The end of .rodata
is 0x800000c8
, which is the start of the heap region. You will learn about heap later in the dynamic memory allocation section.
Lastly, the stack region starts at 0x80400000
and grows down. Specifically, the stack for main
is [0x80400000 - 32, 0x80400000)
and the stack for terminal_write
is the 48-byte region [0x80400000 - 32 - 48, 0x80400000 - 32)
. The choice of 0x80400000
is not mandatory and it is possible to set sp
with a different value in hello.s
.
Formatted output ​
The terminal_write
function takes a string and prints it on the screen, but we may further wish to print integers, pointers in hexadecimal format, etc. We thus introduce the printf
function which converts a format into a string and prints it on the screen.
printf("%d is a number", 1234);
printf("%s is a string", "abcd");
printf("%s-%d is awesome!", "egos", 2000);
These examples show that printf
takes a string as the first argument which has special meanings: %d
and %s
mean placeholders for integers and strings. For example, the last printf
would print "egos-2000 is awesome!" according to the 3 arguments.
Uncomment the printf
from the main
function of hello.c
, and uncomment functions format_to_str
and printf
. Do make qemu
again and you will see:
Hello, World!
egos-2000 is awesome!
Let's start by reading the first 3 lines of this code block:
#include <string.h> // for strlen() and strcat()
#include <stdlib.h> // for itoa()
#include <stdarg.h> // for va_start(), va_end() and va_arg()
Intuitively, the code is including part of the C standard library. For example, strlen
takes a string and returns its length, and the code of strlen
is provided by the string.h
library in the C compiler, so you don't need to write it yourself. Similarly, strcat
in the string.h
library concatenates two strings into a single one. And the itoa
in stdlib.h
converts an integer to a string in ASCII encoding (i.e., integer-to-ASCII).
Lastly, stdarg.h
provides va_start
, va_end
and va_arg
, where va
means variable argument (i.e., the argument list of printf
does not have a fixed length so that different invocations of printf
can take different numbers of arguments). In format_to_str
, the va_arg(args, int)
reads an int
-type argument from the list and the itoa
converts the integer to a string, appending to the end of the out
string.
TIP
Please do read the assembly code of format_to_str
and printf
in hello.lst
. You can skip library functions like strlen
or strcat
, and focus on instructions within these two functions. You will get a precise understanding by connecting the C code with the CPU instructions, which verbal explanations cannot give to you. For example, you will see how va_arg
is compiled to a few load and store instructions without doing any function calls.
After gaining a precise understanding of format_to_str
and printf
, you need to modify format_to_str
so that the code below works correctly.
printf("%c is character $", '$');
printf("%c is character 0", (char)48);
printf("%x is integer 1234 in hexadecimal", 1234);
printf("%u is the maximum of unsigned int", (unsigned int)0xFFFFFFFF);
printf("%p is the hexadecimal address of the hello-world string", msg);
printf("%lu is the maximum of unsigned long long", 0xFFFFFFFFFFFFFFFFUL);
Dynamic memory allocation ​
The char buf[512]
in printf
allocates 512 bytes on the stack of printf
to hold the output string generated by format_to_str
. However, the output string can be longer than 512 bytes and, during compile time, the length of the output string is usually unknown. This leads to the need of dynamic memory allocation:
int printf(const char* format, ...) {
/* TODO: calculate the length of the output string as buf_len */
char *buf = malloc(buf_len);
...
terminal_write(buf, strlen(buf));
free(buf);
/* the memory region returned by malloc() is no longer needed */
return 0;
}
In this last part of P0, you need to implement the TODO
above which initializes an integer variable buf_len
as the argument of malloc
, a function provided by stdlib.h
. After malloc
, the buf
pointer holds an address such that the buf_len
bytes starting at this address can be freely used. After invoking free
, this region cannot be used any more.
Heap ​
Before explaining more about malloc
and free
, we explain the heap memory region with the code below which you need to uncomment from hello.c
.
extern char __heap_start, __heap_max;
static char* brk = &__heap_start;
char* _sbrk(int size) {
if (brk + size > (char*)&__heap_max) {
terminal_write("_sbrk: heap grows too large\r\n", 29);
return NULL;
}
char* old_brk = brk;
brk += size;
return old_brk;
}
__heap_start
and __heap_max
are defined in hello.lds
: __heap_start
is the end of the data region (i.e., the end of the .bss
section) while __heap_max
is hard-coded to 0x80200000
, just like the initial stack pointer is hard-coded to 0x80400000
.
The purpose here is to ensure that the heap and stack do not overlap when the heap grows up and the stack grows down, which spawns the concept of break pointer (i.e., the brk
variable above pointing to the end of the heap). As long as the break pointer is lower than the stack pointer, the stack and the heap do not overlap. We simply assume that the stack pointer sp
is never lower than 0x80200000
(i.e., stack size is never larger than 2MB) and the brk
variable is never greater than 0x80200000
.
Malloc and free ​
The initial heap size is 0 since brk
is initialized as __heap_start
and the heap region is defined as [brk, __heap_start)
. Say malloc(512)
is the first time of invoking malloc
. Three things will happen within malloc
:
- Invoke
_sbrk
with an argument greater than512
, say4096
(i.e., 4KB); - With the return value of
_sbrk
, initialize some data structures in the 4KB region; - With the data structures, find
512
bytes in this heap region and return its address.
Say malloc(32)
is invoked next. It is likely that malloc
can find 32 bytes using the data structures without having to invoke _sbrk
again and increase the heap. Upon free
, the data structures are updated so that the freed memory could be reused by a later malloc
. For example, multiple invocations of printf
may use the same memory region to hold the output string (i.e., malloc
returns the same pointer multiple times).
The data structure operations for malloc
and free
are implemented in the stdlib.h
library from the C compiler so that you don't have to know the details, just like you don't have to know the details of itoa
or strcat
. However, you are more than welcome to design your own data structures and write your own malloc
and free
functions.
Accomplishments ​
You have finished reading library/libc
in egos-2000. You also get the intuition behind library/elf
which initializes the 4 memory regions for a program, given the executable file of this program in the ELF format.