Skip to content

Hello, World! ​

This project helps you understand the following C program.

c
int main() {
    printf("Hello, World!");
    return 0;
}

Specifically, you will learn about 3 types of CPU instructions (integer computation, control transfer, memory access) and 4 types of memory regions (code, data, heap, stack), which are fundamental to every computer program.

Prerequisites: an entry-level systems course equivalent to Cornell CS3410 or CMU 15-213 is typically required before learning OS. This project aims at revisiting the concepts of CPU instructions and memory regions instead of taking the place of an entry-level course.

Get started ​

Follow the instructions of step1 and step2 in USAGES.md. After you can compile egos-2000 and run it on QEMU, switch to the hello branch by git checkout hello. Run this branch by make qemu and then you will see

-------- Compile Hello, World! --------
riscv64-unknown-elf-gcc -march=rv32ima -mabi=ilp32 -Wl,--gc-sections -ffunction-sections -fdata-sections -fdiagnostics-show-option hello.s hello.c -Thello.lds -nostdlib -lc -lgcc -o hello.elf
riscv64-unknown-elf-objdump --source --all-headers --demangle --line-numbers --wide hello.elf > hello.lst
riscv64-unknown-elf-objcopy -O binary hello.elf hello.bin
-------- Run Hello-World on QEMU --------
qemu-system-riscv32 -nographic -machine sifive_u -smp 2 -bios hello.bin
Hello, World!

The code in this branch simply prints Hello, World! to the screen and then stalls. You can exit QEMU by typing ctrl+a and then x, after which you will see QEMU: Terminated.

Next, we explain the code in this hello branch which gives concrete examples of CPU instructions (integer computation, control transfer, memory access) and memory regions (code, data, heap, stack). After the explanation, you will start from this code and implement the printf function as an exercise.

Instructions and memory ​

Before main() ​

Let's start with the first two lines of hello.lds.

lds
OUTPUT_ARCH("riscv")
ENTRY(_start)

The first line says that the output architecture for this computer program is RISC-V. This book will focus on RISC-V since it is widely used for education, but there also exist other influential CPU architectures such as ARM and Intel x86. The second line says that the entry point of this program is _start which is defined in hello.s. We will move our attention to hello.s and ignore the rest of hello.lds for now.

According to _start in hello.s, the first two CPU instructions in this program are

s
_start:
    csrr t0, mhartid
    bnez t0, _end

You could regard a hart as a CPU core. Say a CPU has 4 cores, the mhartid would be 0, 1, 2 and 3. The first instruction loads the mhartid to t0, a general-purpose register. This book assumes a 32-bit RISC-V CPU, which means that each general-purpose register holds 4 bytes and each CPU core holds 31 general-purpose registers.

Note that mhartid is a Control and Status Register (CSR) which is not a general-purpose register. You will meet more CSRs later in this book and, for now, it is enough to know that the csrr instruction moves the value of mhartid into t0.

The second instruction, bnez, stands for branch-if-not-equal-to-zero and it is an example of control transfer instructions. Therefore, only core #0 will execute the instruction next to bnez while all the other cores will jump to _end and stall there. There are two instructions next to bnez which core#0 will execute:

s
    li sp, 0x80400000
    call main

The li instruction loads an integer, 0x80400000 in this case, to the stack pointer register sp, which is another general-purpose register. li is an example of integer computation instructions because it modifies a register with an integer. Other examples include addition, multiplication, bitwise shift, bitwise AND/OR/XOR operations, etc.

After setting sp, the call instruction transfers the control to main() in hello.c. The call instruction does two things: (1) point the program counter to the first instruction of main(); (2) set the ra register to the address of call _end in hello.s, which is the instruction next to call main. Intuitively, sp specifies the memory that main() can use and ra specifies the return address, i.e., the program counter after main() returns. We make this intuition concrete by reading the code of main().

In main() ​

The C code of main() in hello.c is simple:

c
int main() {
    char* msg = "Hello, World!\n\r";
    terminal_write(msg, 15);

    return 0;
}

After make qemu, you can find the CPU instructions for main() in hello.lst:

lst
80000084 <main>:
main():
80000084:   fe010113            addi    sp,sp,-32
80000088:   00112e23            sw  ra,28(sp)
8000008c:   00812c23            sw  s0,24(sp)
80000090:   02010413            addi    s0,sp,32
80000094:   800007b7            lui a5,0x80000
80000098:   0c878793            addi    a5,a5,200 # 800000c8 <main+0x44>
8000009c:   fef42623            sw  a5,-20(s0)
800000a0:   00f00593            li  a1,15
800000a4:   fec42503            lw  a0,-20(s0)
800000a8:   f71ff0ef            jal ra,80000018 <terminal_write>
800000ac:   00000793            li  a5,0
800000b0:   00078513            mv  a0,a5
800000b4:   01c12083            lw  ra,28(sp)
800000b8:   01812403            lw  s0,24(sp)
800000bc:   02010113            addi    sp,sp,32
800000c0:   00008067            ret

There are 16 instructions for main starting at memory address 0x80000084 and each of them occupies 4 bytes of memory. Therefore, the call main instruction in hello.s sets the program counter to 0x80000084 which points to an addi instruction. The binary form of this addi instruction is 0xfe010113 and such binary encoding is explained in the RISC-V Instruction Set Manual. For our purpose, we can ignore the binary encodings and focus on the instructions themselves. Let's start by reading the 7 highlighted instructions.

Common structure ​

The two addi instructions at address 0x80000084 and 0x800000bc add integers -32 and 32 to the sp register, which indicates that the 32-byte memory region [0x80400000 - 32, 0x80400000) is used for main(). The ret at the end will cause a control transfer to the instruction pointed by ra, i.e., the call _end in hello.s as we explained earlier. Such usage of sp and ra is the common structure of every function in C.

Note that ra should still hold the address of call _end when executing ret, which is ensured by these two instructions:

lst
80000088:   00112e23            sw  ra,28(sp)
...
800000b4:   01c12083            lw  ra,28(sp)

28(sp) means adding 28 to sp as a memory address. Since sp equals to 0x80400000 - 32, the sw instruction stores the 4 bytes of ra into memory region [0x80400000 - 4, 0x80400000) and the lw instruction loads 4 bytes from the region back to ra. As long as this region is unmodified, ra can be freely modified during the execution of main().

These are examples of memory access instructions and another example is lb/sb which loads/stores one byte from/to the memory while lw/sw loads/stores a word of 4 bytes.

TIP

By this point, we have seen examples of all 3 types of CPU instructions (integer computation, control transfer, memory access) which constitute the user-level instructions of RISC-V. P1 is called User-level Threads because you will learn about threads solely based on these instructions. After P1, you will learn about more instructions that support an operating system.

First line ​

Next, we explain why the first line of main():

c
    char* msg = "Hello, World!\n\r";

is compiled to the following instructions.

lst
80000090:   02010413            addi    s0,sp,32
80000094:   800007b7            lui a5,0x80000
80000098:   0c878793            addi    a5,a5,200 # 800000c8 <main+0x44>
8000009c:   fef42623            sw  a5,-20(s0)

lui stands for load-upper-immediate which means that the upper 20 bits of a5 are set to 0x80000 while the lower 12 bits are set to 0. Therefore, these 4 instructions set s0 to 0x80400000, set a5 to 0x800000c8, and stores the 4-byte value 0x800000c8 to memory address 0x80400000 - 20. This means two things:

  1. The compiler puts the "Hello, World!\n\r" string at memory address 0x800000c8.

  2. The compiler decides that msg, as a local variable of main(), occupies the 4 bytes of memory starting at 0x80400000 - 20 in the 32-byte main() stack.

TIP

This is a good point to revise the concept of pointers. In C, a char is simply one byte storing the ASCII encoding of a character. For example, the ASCII encoding for "H" and "e" are 72 and 101 respectively so that memory address 0x800000c8 holds 72 and 0x800000c9 holds 101. The type of msg, namely char*, is called char pointer, meaning that variable msg holds the memory address of a char. This book targets 32-bit CPUs so that every memory address is represented by 4 bytes (i.e., all pointers hold 4 bytes). One can dereference the msg pointer with syntax *msg, e.g., *msg == 'H' and *(msg + 1) == 'e'.

Second and third line ​

Next, we explain why the last two lines of main():

c
    terminal_write(msg, 15);
    return 0;

are compiled to the following instructions.

lst
800000a0:   00f00593            li  a1,15
800000a4:   fec42503            lw  a0,-20(s0)
800000a8:   f71ff0ef            jal ra,80000018 <terminal_write>
800000ac:   00000793            li  a5,0
800000b0:   00078513            mv  a0,a5

In short, registers a0 and a1 hold the first two arguments when calling a function, and a0 further holds the return value. Therefore, a1 is set to 15 and a0 is set to 0x800000c8 by loading 4 bytes from address 0x80400000 - 20. jal is an alias of call and the first instruction of terminal_write is at address 0x80000018. After terminal_write returns, a0 is set to 0 as the return value of main(). It is possible to do li a0,0 directly without touching a5 if the compiler did more optimizations.

In terminal_write() ​

For terminal_write(), we focus on explaining the two lines of code below and you could read all its instructions in hello.lst as an exercise.

c
        while ( *(int*)(0x10010000UL) & (1 << 31) );
        *(int*)(0x10010000UL) = str[i];

The first line waits for the terminal to become ready for printing a character. The second line prints a character by writing the ASCII code to a special memory address. At this point, it is enough to know that writing to this special address will print a character on the screen. When you reach P6, you will learn more about the architectural support for device I/O.

*(int*)(0x10010000UL) converts 0x10010000 to an int pointer and then dereferences the pointer, so that it stands for the 4 bytes starting at memory address 0x10010000. The first line above would read these 4 bytes while the second line writes to these 4 bytes.

We have examined the two functions in this program. We now review the memory regions touched by the CPU instructions.

Code, data and stack ​

Let's start with the following part of hello.lst:

lst
Sections:
Idx Name              Size      VMA       LMA       File off  Algn  Flags
  0 .init             00000014  80000000  80000000  00001000  2**3  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .text             000000ac  80000018  80000018  00001018  2**3  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .rodata           00000010  800000c8  800000c8  000010c8  2**3  CONTENTS, ALLOC, LOAD, READONLY, DATA

A section is simply a memory region specified by a starting address and its size.

The .init section starts at 0x80000000 and holds all the instructions in hello.s. The RISC-V machine emulated by QEMU uses 0x80000000 to initialize its program counter. The .text section holds the instructions for terminal_write starting at 0x80000018 and the instructions for main starting at 0x80000084. Together, they form the code region of this program, [0x80000000, 0x800000c8).

The 16 bytes starting at 0x800000c8 form the data region of this program. Typically, the data region would contain 3 sections:

  1. .rodata holds read-only data such as the "Hello, World!\n\r" string.
  2. .data holds global variables with non-zero initial values.
  3. .bss holds global variables with zero as their initial value.

This simple program does not have any global variables so we only see .rodata. The end of .rodata is 0x800000d8, which is the start of the heap region. You will learn about this region later in the dynamic memory allocation section.

Lastly, the stack region starts at 0x80400000 and grows down. Specifically, the stack for main is [0x80400000 - 32, 0x80400000) and the stack for terminal_write is the 48-byte region [0x80400000 - 32 - 48, 0x80400000 - 32). The choice of 0x80400000 is not mandatory and it is possible to set sp to some different value in hello.s.

Formatted output ​

The terminal_write function takes a string and prints it on the screen, but we may further wish to print integers, pointers in hexadecimal format, etc. We thus introduce the printf function which converts a format into a string and prints it on the screen.

c
printf("%d is a number", 1234);
printf("%s is a string", "abcd");
printf("%s-%d is awesome!", "egos", 2000);

These examples show that printf takes a string as the first argument which has special meanings: %d and %s mean placeholders for integers and strings. For example, the last printf would print "egos-2000 is awesome!" according to the 3 arguments.

Now, uncomment the printf from the main function of hello.c as well as the code block containing format_to_str and printf. Do make qemu and then you will see:

Hello, World!
egos-2000 is awesome!

Let's start by reading the first 3 lines of this code block:

c
#include <string.h>  // for strlen() and strcat()
#include <stdlib.h>  // for itoa()
#include <stdarg.h>  // for va_start(), va_end() and va_arg()

Intuitively, the code is including part of the C standard library. For example, strlen takes a string and returns its length, and the code of strlen is provided by the string.h library in the C compiler so that you don't need to write it yourself. Similarly, strcat in this library concatenates two strings into a single string and itoa in stdlib.h converts an integer to a string in ASCII encoding (i.e., integer-to-ASCII).

Lastly, stdarg.h provides va_start, va_end and va_arg, where va means variable argument (i.e., the argument list of printf does not have a fixed length so that different invocations of printf can take different numbers of arguments). In format_to_str, the va_arg(args, int) reads an int-type argument from the list and the itoa converts this integer to a string, appending to the end of the out string.

TIP

Please do read the assembly code of format_to_str and printf in hello.lst. You can skip library functions like strlen or strcat, and focus on instructions within these two functions. You will get a precise understanding by connecting the C code with the CPU instructions, which verbal explanations cannot give to you. For example, you will see how va_arg is compiled to a few load and store instructions without doing any function calls.

After gaining a precise understanding of format_to_str and printf, you need to modify format_to_str so that the code below works correctly.

c
printf("%c is character $", '$');
printf("%c is character 0", (char)48);
printf("%x is integer 1234 in hexadecimal", 1234);
printf("%u is the maximum of unsigned int", (unsigned int)0xFFFFFFFF);
printf("%p is the hexadecimal address of the hello-world string", msg);

Dynamic memory allocation ​

The char buf[512] in printf allocates 512 bytes on the stack of printf to hold the output string generated by format_to_str. However, the output string can be longer than 512 bytes and, during compile time, the length of the output string is usually unknown. This leads to the need of dynamic memory allocation:

c
int printf(const char* format, ...) {
    /* TODO: calculate the length of the output string as buf_len */
    char *buf = malloc(buf_len);
    ...
    terminal_write(buf, strlen(buf));
    free(buf);
    /* the memory region returned by malloc() is no longer needed */
    return 0;
}

In this last part of P0, you need to implement the TODO above which initializes an integer variable buf_len as the argument of malloc, a function provided by stdlib.h. After malloc, the buf pointer holds an address such that the buf_len bytes starting at this address can be freely used. After invoking free, this region cannot be used any more.

Heap ​

Before explaining more about malloc and free, we explain the heap memory region with the code below which you need to uncomment from hello.c.

c
extern char __heap_start, __heap_max;
static char* brk = &__heap_start;
char* _sbrk(int size) {
    if (brk + size > (char*)&__heap_max) {
        terminal_write("_sbrk: heap grows too large\r\n", 29);
        return NULL;
    }

    char* old_brk = brk;
    brk += size;
    return old_brk;
}

__heap_start and __heap_max are defined in hello.lds: __heap_start is the end of the data region (i.e., the end of the .bss section) while __heap_max is hard-coded to 0x80200000, just like the initial stack pointer is hard-coded to 0x80400000 in hello.s.

The purpose here is to ensure that the heap and stack do not overlap when the heap grows up and the stack grows down, which spawns the concept of break pointer (i.e., the brk variable above pointing to the end of the heap). As long as the break pointer is lower than the stack pointer, the stack and the heap do not overlap.

In this project, we simply assume that sp is never lower than 0x80200000 (i.e., the stack is never larger than 2MB) and the brk variable is never greater than 0x80200000.

Malloc and free ​

The initial heap size is 0 since brk is initialized as __heap_start and the heap region is defined as [brk, __heap_start). Say malloc(512) is the first time of invoking malloc. Three things will happen within malloc:

  1. Invoke _sbrk with an argument greater than 512, say 4096 (i.e., 4KB);
  2. With the return value of _sbrk, initialize some data structures in the 4KB heap region;
  3. With the data structures, find 512 bytes in the heap region and return its address.

Say malloc(32) is invoked next. It is likely that malloc can find 32 bytes using the data structures without having to invoke _sbrk again and increase the heap. During free, the data structures are updated so that the freed memory could be reused by a later malloc. For example, multiple invocations of printf may use the same memory region to hold the output string (i.e., malloc returns the same pointer multiple times).

The data structure operations for malloc and free are implemented in the stdlib.h library from the C compiler so that you don't have to know the details, just like you don't have to know the details of itoa or strcat. However, you are more than welcome to design your own data structures and write your own malloc/free functions.

Accomplishments ​

You have finished reading library/libc in egos-2000. You also get the intuition behind library/elf which initializes the 4 memory regions for a program, given the executable file of this program in the ELF format.

"... any person ... any study."