Developing RiscV OS with a hex editor That means that I dont have a compiler (thus no C or higher support), no assembler (so can't write in assembly), I litterly have to build it all myself. I only have 2 tools:
- hex editor, for creating the os binary, and writing this documentation.
qemu
for running the os. Line used:qemu-system-riscv32 -M virt -m 2048 -serial mon:stdio -smp 4 -bios hellos
.- I plan to use the next qemu devices:
-device VGA -drive file=mydisk.img,format=raw,if=none,id=drive0 -device virtio-blk,drive=drive0 -device e1000 -device virtio-keyboard -device virtio-mouse -device virtio-sound
(append it to the qemu line).
- I plan to use the next qemu devices:
The only things that separates me from the first days of computers, is that:
- I have today's knowledge (I can search the web, but not copy anything).
- RiscV is a newer architecture, and I use qemu so I don't touch real hardware.
Other than that - I want to replicate the feeling of a first day computer scientist. I won't even enjoy the benifits of git, and just use this repo as a way to share my progress.
Create a stable and fast os, with graphics, network driver, build a basic browser, and be able to open youtube and watch a video (you can suggest which in the repo discussions).
My main idea is to write an assembler in opcode-bytes, run it, then assemble a compiler (written in assembly text) and run it, and then write the big os in that high level compiled language. Then program a file system, files, shell, simple cli-ide, graphics, real ide, and then write all the programs and drivers needed to achive the goal.
Here I'll write my OS versions, from first to current.
hellos
is Hello World OS. You'll see that I'll give a name to every OS that I'm going to develop, until reaching os_zero
- The complete one with the youtube.
I managed to write a single character to UART (output), 'T', using a "putc" assembly function. Calling twice makes a "TT" print. That's awesome. I'll call this os ttos
.
I added a partial_os
folder to store working but not complete os-versions, which are a critical part of my way of getting to the actual next os-version).
That's my way of saving versions, as I can't use a git-versioning until I develop one. You can look at each file in this folder as a backup disk, used to store only the current os on it.
I decided to go 32bit as I think that'll be enough for our end-goal.
I decided about a rough "memory regions" partition (all hexadecimal numbers. X:Y
means addresses from X
up to Y
, not including Y
):
- Stack:
FFFFFFF8
and downwords. The Ram is80000000:100000000
- Fast syscalls:
80001000:80001800
(note thatgp=80001000
), so that'sgp:gp+800
.putc
is atgp+0
. I'll use the shortcut sysX forgp+X
, soputc
issys0
. - Global data:
80000800:80001000
, sogp-800:gp
. I'll use the shortcutg_X
forgp-1000+X
. - Boot code: At
80000000
, jumps to the first code. - First code: At
80002000
.
I decided I'm gonna allow myself to use the "Text" window on my hex editor, and not the 16-byte-wide window, for writing this README. It's still not a markdown editor, and I justified this Text window for being ok because I basically do the same writing on my notebook. The only benefit here is that it's backed up, and That's fine by me.
Currently my first code is still in the Boot code region. My goal for hellos
is a First code at its region, that reads a string "Hello, World!\n\0" (from Global data maybe) and prints it (maybe as another sys
).
I wrote a small Boot code, that sets gp, sp
, saves a0, a1, a2
as globals, and calls the First code. Has a stub that prints "\nFIN\n" if the First code returns. I called this os (with First code that just returns) - finos
.
I added two syscalls, puts
(prints a string), and exit
, and called them both from the First code to print "Hello, World!\n" (stored as a global). And it worked!
hellos
is done!
putc(c: char_a0) -> None
-sys0
(usest0,t1
, keepsa0
).puts(s: pointer_a0) -> None
-sys20
(keep all regs).exit() -> NO_RETURN
-sys70
(infinite loop).boot_a0
-g_FF4
.boot_a1
-g_FF8
.boot_a2
-g_FFC
.- Rule: each
sys
is aligned to0x10
, and eachglobal
is aligned to0x4
.
Unused memory will always be marked as 'A'
bytes.
- Wrote the first os in big endien.
- I initialy failed to encode 5 out of 10 opcodes. Encoding by hand is HARD.
- I got wrong the loading address of the bios binary. (thought 0x1000, was 0x80000000). I tried writing an elf format in the hex editor, and it took me trial and error + debugging to understand what is needed - a non-elf binary file (first opcode in offset 0), that's loaded into address 0x80000000. At first I was afraid to use debugger, as it felt like cheating, but it was to understand qemu's ABI.
I decided to add the -smp 4
(4 cores) support now, so the Boot code need to hande the other cores too.
Currently adding this flag prints the "Hello, World!\n" string four times, which is great. I want to change it so that only the first core will run, and the rest will "sleep".
I improved the current Boot code: It now sets different stack for each process, based on a0
(initialized at boot with the hardware thread id). If it's the 0 thread then jump to the First code.
If it's another thread id, then jump to a wfi
which afterwards jumps to the address found in multicore_boot_address
(initialized to the wfi
address).
It allows us to control when to "release" the other threads, and what code they should execute.
multi_hellos
is done!
multicore_boot_address -
g_FF0`.
This version will be able to get input. echos will get string as input and will echo it.
I decided that I need to be more organized, and that I'll allow myself writing multiple "notebooks" (yeah, I'm addressing these ".md" files as "Notebooks"). I created the bare_asm.md for the actual assembly I coded in memory, and the memory regions.
I implemented getc
that inputs a byte, and availc
that returns wether an input is available to be read.
I wrote a small program that inputs a char, increments it and prints it (charos
).
I added a First code that prints "Enter your name: ", calls gets(buf, 16), and then prints "Hello {buf}!\n".
While encoding the gets
function, I saw that I misencoded the addi sp, sp, 0x14
at the end of putc
. Fixed.
I created a working echos! It prints "Hello {buf}!", but it doesn't print any of the characters live (like, it doesn't show anything until you press "Enter". I call it ghost_echos
.
I added a print to every char (and replaced \r with a \n print).
echos
is done!
getc() -> char_a0
-sys80
(usest0,t1
).availc() -> bool_a0
-sysA0
(keep all regs) - NOT CHECKED.gets(out_s: pointer_a0, max_len: a1) -> None
-sysB0
(keep all regs).
- I implemented
gets
to stop only of '\n', but my keybord actually only generated '\r', so I needed to add another check.
I want to be able to print registers, numbers.
This os version version will be dedicated to print 4-byte hexadecimal values - thus hexos
.
Of course, I managed to find a bug in gets
from the previous version - a1
was stored on the a0
stack spot.
I implemented the hex print, and it works perfectly.
hexos
is done!
putx(hex: a0) -> None
-sys130
(keep all regs).
I think that before going further, I must have some testing support, and "testing routine". Thus - testos
.
In this operation system version I'm going to implement multiple testing functions (That will print errors on test failures),
and test every existing sysX
function.
Adding testing capabilities this early, will hopefully lead me to keep my entire operation system tested at every phase of the process. That's exciting! So I have to do a good job in this part.
My 0x800 space after $gp
is running out, so I'm gonna start using it as a "jumpt to" table, in the next way:
sysXX0 will now only take 12 bytes (+4 more reserved): sw x1, FFC(x2)
, lui x1 0x?????
, jalr x0, 0x???(x1)
. The jumped address will be responsible for doing the addi sp, sp, FFC
part.
Note that the current 7 sys funcs already used 0x1A0
bytes.
Making this change now will let me use the next 0x660
bytes as 102 more functions. That will probably be enough for the assembler functionality, and if not, we could find a future solution for that.
Ok, so let's create the first function - assert_ret
.
I need to find a place for the function's body. I'll designate the next section for this part: 0x80004000-0x8000FFFF.
I implemented the assert_ret
!
I also a tests_failure
global, that initialized with 1 but set to 0 on any test failed (by the testing functions). It gets printed at the end of the First code.
It took time to encode this function. Hopefully the next will be easier.
I proceeded to implement both put_regs_test_values()
and store_all_regs()
, and tested them together.
First code is now:
- 80002000-80002010: Check
assert_ret
. - 80002010-8000203C: Check
put_regs_test_values, store_all_regs
. - 8000203C-80002058: Print the
tests_success
boolean, and finish.
I'll have to have some kind of documentation + order of the "First code" section, as now the data there might get lost.
I want this testos
to also define exactly how the "First code" shouls look like.
My first idea is to have it "First code" to lie in addresses 80002000-80004000, and it will be responsible for setting up the tests and running them, and afterwords to jump to the assembler code in 0x80010000. Each test in the first code will be documented in a separate document "binary_tests.md".
I decided to have all of my tests automatic - means that I won't have to manually change the testing-opcode-bytes each time to make sure all cases work. It actually only matters for testing the testing functions. I decided of having two "atomic" testing functions, which are as simple as they can be:
assert_test_success
- For making sure the globaltests_success
in on. Prints if not, and anyway sets it on again.assert_test_failure
- For making sure the globaltests_success
in off. Prints if not, and anyway sets it on again.
I really see myself digging too deep here, as why didn't I called the previous testing functions "atomic"? Well, here I took extra measures. Not sure if it was really needed, but it's better checked in that way.
My goal now is to write a documented First code that tests all 4 testing functions (of course, to implement the 4th too). This os version will be called tested_test_funcsos
.
I completed testing the assert_ret
, and the put+store
combination, for both sanity and failure, with the new atomic functions!
I completed implementing the validate_all_regs_unchanged
, and testing it, with 8 different tests!
I also decided and modified put_regs_test_values
to store s1=1
for my ease of not setting it again every time I call sys1D0 or sys1E0.
Now all of my testing functions are implemented, documented and tested.
The other sysXXX functions aren't tested, but I still stop for a second to celebrate, and call the current os: The tested_test_funcsos
.
I continue with jumping to 0x80002400, the start of the sys tests. I decided that when this part ends, it will jump to 0x80010000, the start of the actual code to run (the assembler).
I implemented the output tests, and the input tests!
All of my binary functions are now fully tested (except the three ATOMIC functions).
Also, Each piece of binary-encoded assembly is documented in the bare_asm.md
or binary_tests.md
notebook.
This part was the longest to implement yet. I'm so glad to declare:
testos
is done!
assert_test_success() -> None
-sys1D0
JUMPER (modifies a0, relies ons1==1
).assert_test_failure() -> None
-sys1E0
JUMPER (modifies a0, relies ons1==1
).assert_ret(ret_val: a0, expected: a1, string_testname: a2) -> None
-sys1A0
JUMPER (keep all regs).put_regs_test_values() -> None
-sys1B0
JUMPER (modifies all regs).store_all_regs() -> None
-sys1C0
JUMPER (keep all regs except sp).validate_all_regs_unchanged(regs_mask: a0) -> None
-sys1F0
JUMPER (modifies the a-regs, and sp).testing_mask
-g_FEC
tests_success
-g_FE8
test_print_on_failure
-g_FE4
testing_tests_success
-g_FE0
.gets_tests_buffers
-g_FB0
(0x30 bytes).
allocos
is dynamic allocation os.
In this os I'm gonna implement a very simple dynamic allocation functionality: brk
, sbrk
. Of course, they'll be tested.
The allocation will have a specific memory chunk to allocate from it: 0x81000000 -> 0x81FFFFFF.
This dynamic heap will be called binary_heap, as it's in the binary section (as I'm still coding in binary).
To test these functions I'll use some test data. I see that if I'll keep using the globals spot for tests data, I might finish it very fast. So I need another spot that's reserved just for that:
- I'll reserve the 8000E800-8000F800 page just for big test-data (buffers, parametrized arrays, etc.).
- I'll reserve 0x400 stack frame ready to use by any test, without having to mess with modifying sp.
I coded brk
and manually tested the function.
I designed my first "parametrized" test, the brk
test (described in the binary_tests.md
notebook).
After a few pushbacks, it works as a charm! I'll call this half-version, of brk
implemented and tested - brkos
.
After that, implementing sbrk
was easy, and for its parametrized tests I basically copy pasted the "brk parametrized test" and just modified the parameters. That's awesome.
allocos
is done!
brk() -> None
-sys200
JUMPER.sbrk() -> None
-sys210
JUMPER.binary_heap_segment_end
-g_FAC
.binary_heap_segment_start
-g_FA8
.binary_heap_top
-g_FA4
- Initialized with "segment_start", and represent the first unallocated byte in the binary_heap.
- I
btl
instead ofbltu
. - Gave
assert_ret
a2-param
the first-word of a string, instead of the pointer to the string.
In this operation system I'm going to implement a dictionary data structure (256-bins hashmap) using hand-coded assembly, Thus - dictos
.
It will be very helpful for writing an assembler in the future version, asos
.
The dictionary will only have "Init, Get, Insert", and no delete.
For it's dynamic nature, it could use the sbrk()
as node allocations for the bins. When we finish with the dict, we can "free" the memory with brk(before_dict)
, assuming no other allocations.
I decided to create the asm_cheatsheet.md
notebook, for writing common encodings. I hope that it won't make more bugs, but instead save time and bugs.
I implemented strncmp
, write a test to it, and luckily it fails on v6
. I'm not sure why yet, and I think it's the test's fault.
It was indeed the tests fault! As I encoded the parameters in big endien.. Anyway, It's good that I both write the test AND gives it a call from "main" at the same time (from address 80010000), so that I could test it withput the test's implementation.
I implemented and tested both DICT_initialize
and DICT_calculate_key
. That's the first part of creating this hash dict. Yay!
For the next part I'll need to show the low-level details of what the dict looks like:
- The dict is just 256 pointers (one for each possible hash value of
DICT_calculate_key
). - The pointers are the linked-list head of all the previously inserted (string,n,value) triplets.
- A node in the list is 16 bytes, which is 4 parts of 4 bytes as following:
[ char* next; u32 value; u32 len; char* string ]
. - An empty list has no nodes in it and the dict bin pointer is just NULL. A non empty list ends with a node that have
next == NULL
. DICT_insert
allocates a new node bysbrk(16)
, and no other function allocates/deallocates memory.- There are no "deletions" in this dict. The most you can do is
DICT_get
a node and modify it'slen
toFFFFFFFF
, so that it won't get found.
I took some time and wrote the text-assembly for each of the next 3 DICT functions, and planned on paper how I would test each function. It will take some time.
Let's remember what's it all for - After the dict functions are implemented and tested, I will be able to write an assembler. And after testing it and every opcode, I will be able to just write textual-assembly and make my life much easier. Then, I will be able to implement an actual compiler, and after testing it I will be able to write C-style code and actually start writing an OS, drivers, real programs much easier. Even just an assembler will be enough. Just be patient. Today we focus on writing some dict functions.
DICT_get_by_bin
and DICT_get
are implemented and tested. One to go.
At this point, after implementing DICT_insert
but before testing it, I started thinking that the hash isn't strong enough. I decided to change it.
I'll modify the "character hash" to be SBOX[str[i]] ^ (0x53*i)
, where SBOX
is the Rijndael S-box (The AES lookup table).
It's possible to do it thanks to the extra 3-op space I left for the adjacent sys funcs.
Now I have to modify the tests of DICT_calculate_key
and DICT_get
.
I updated it all to the new hash. Now I just need to test DICT_insert
.
I implemented the HUGE Dict_insert
test, and it works!
dictos
is done!
strncmp(str1: a0, str2: a1, n: a2) -> a0
-sys220
JUMPER.DICT_initialize(buffer: a0) -> None
-sys230
JUMPER.DICT_calculate_key(str: a0, n: a1) -> a0
-sys240
JUMPER.DICT_get_by_bin(bin: a0, str: a1, n: a2) -> a0
-sys250
JUMPER.DICT_get(dict: a0, str: a1, n: a2) -> a0, a1
-sys260
JUMPER.DICT_insert(dict: a0, str: a1, n: a2, value: a3) -> a0
-sys270
JUMPER.s_box_ptr
-g_FA0
.
- I
lw
instead oflb
. - I encoded big endien number in the
strncmp
parametrized test, in both the length-param and the expected-result.
asos
is assembly os (see next os version). This version will implement the skeleton for the assembler, thus skeleton_asos
.
As you'll read in the following part, I decided to split it into 3 parts, and the first will be asos_helpers
, thus this will be the part's name.
This version will setup all the code functions, and a parsing mechanism that sends assembly-text-line into the correct handler,
but the actual handlers will just "write" some 4-byte ascii message related to the op
(with the last byte being \n for easy reading. Maybe the 3 bytes will be the "return address to the sys_XXX
function. Maybe "oB7\n" for sys_B70).
The handlers will have a sys_XXX
address (40-50 funcs), but they'll just each call the "unimplemented handler" (another sys_YYY
) with the correct op string parameter.
Each of the assembler mechanism, functions, will be unit-tested.
The philosophy behind asos
is to be SIMPLE, yet enough:
- I don't have to support encoding instructions that I won't use (I could always write those intructions by
.data
). - Should run fast, and be fully tested.
- Should be generic, and simple to use.
- Should print extensive errors on every problem in every line of code.
- Should be fast to develop, and easy to extend (up to extra 20 future ops/directives).
The end-goal is to be able to compile "C" code into rv32g, that should be enugh to write a standard library and an os, and drivers and gui and so on, which means the next riscv extensions:
- M - Multiplication/division
- A - Atomic
- F - single-precision floating-point
- D - double-precision floating-point
- Zicsr - CSR instructions
- Zifencei - Intruction-fetch fence
Ops to be supported by the assembler - All the rv32im ops (+assembly directives):
- lui, auipc, jal, jalr, beq, bne, blt, bge, bltu, bgeu
- lb, lh, lw, lbu, lhu, sb, sh, sw
- addi, slti, sltiu, xori, ori, andi, slli, srli, srai
- add, sub, sll, slt, sltu, xor, srl, sra, or, and
- mul, mulh, mulhsu, mulhu, div, divu, rem, remu
- fence.tso, fence, pause, ecall, ebreak, fence.i
- sret, mret, wfi
- csrrw, csrrs, csrrc, csrrwi, csrrsi, csrrci
- amoor, amoand, amoswap, lr, sc
- li, la PCREL, mv, neg, not, j PCREL, jr, call PCREL, ret, nop, sys (call sys),
- 🏷️
- .op HEX, .data N HEXBYTES, .ascii CHARS, .org ADDRESS, .align BYTESIZE, .end
- Extra space for 20 future instructions.
Ops that won't be supported, but could be implemented using .data
:
- Rest of the the privileged instructions.
- Most of Atomic extension
- Both single/double floating-point extensions.
So that's a total of 82 supported assembly directives, besides labels, and the extra reserved 18.
I'll have to focus so much here. 82 ops is a lot, defenetly for hand-coding each, and testing each.
I think that this version will compile each of these to "jump" to a function that'll print the IP of the unimplemented op, and it's kind?
Now I need to come up with the main functions of this part. It might come in handy to use some fixed registers for some fixed purposes, like "labels dict", "ops dict", "current op", "write op", "end_pointer", ...
I think that I'll allocate the next 6 registers for "fixed purposes": s2-s7:
s2
-global_text_pointer
(tptr
).s3
-current_binary_offset
.s4
-labels_dict
.
This version requires many decisions. After some thought, here they are:
- All the binary encoded code will be BEFORE the first textual-assembly. Writing assembly will never be before some encoded blob. If there is not enough code space until 0x80010000, I'll make the cut above it.
- The result of the assembler is a binary file with a new format:
- It has a small header
[ u32 magic="os0". u32 entry_point_offset. u32 import_table_offset. u32 export_table_offset. u32 binary_name_offset. u32 flags=0. ]
. import_table
points to au32 length
followed bylength
entries ofu32 str_ptr
, which will be replaced by the address of the external function named as the pointed string (which isf"{binary_name}.{func_name}"
).export_table
points to au32 length
followed bylength
entries of[ u32 str_ptr. u32 func_offset ]
- These are the external functions offered to other binaries, can be imported by importingf"{binary_name}.{func_name}"
(func_name
is the string thatstr_ptr
points to).
- It has a small header
- The opt dict will be built from this structure:
[ u32 asm_func. u32 size_func. u8 op_name_length. u8[7] op_name ]
.- These will be saved in some free 0x800 space in memory, and the
op_dict
can use direct pointers into it (its values are[ u32 asm_func. u32 size_func ]
). - The actual
asm_func
s will be in a JUMPER table of size 0x800 (max 128 ops, so some are reserved). - In the skeleton os version I'll implement all
size_func
s but not anyasm_func
s - Those will point to thedefault_op_handler()
.
- These will be saved in some free 0x800 space in memory, and the
- White spaces are
' '
,'\t'
,','
. Line ends in'\r'
,'\n'
,'#'
(comment start). - Helper global_pointer
sys
funcs (only those allowed to modift the text pointer:skip_ws(), advance_to_next_line(), skip_until_ws(), skip_until_label_end(), advance_1()
. - General helper funcs:
is_line_end(), calc_label(str, n, relative?)->int_val, default_op_handler(), print_line_error(), addvance_binary_ptr_4(), addvance_binary_ptr_8(), print_and_exit()
. - Main assembler funcs:
asm_mem_file(), asm_line(), label_pass(), label_pass_line()
. - What the "main" will do? (Need
load_ops_dict(), load_mem_bin_to_memory()
functions for it):- Print "tests success"
- Loads the ops_dict.
- Executes
asm_mem_file()
. - (Optional) Loads "system funcs" exports to memory.
- Loads the file to memory.
- Jumo to the loaded entry address.
I just undertand how this part is HUGE. I'll need to split it somehow. Split idea:
asos_helpers
- Implement and test theHelper global_pointer
and theGeneral helper funcs
functions.asos_origins
- Implement and test theMain assembler funcs
functions.asos_skeleton
- Implement the main, "default" ops_dict, and make it all work. unit test it and also parametrize-test the entire process.
Fuck. That's gonna be a long process.
asm(src_address: a0, dst_address: a1, src_len: a2, dst_len: a3) -> a0
- Returns 0 on success - NOT IMPLEMENTED, NOT TESTED.labels_pass(src_address: a0, src_len: a1, labels_dict: a2) -> a0
- Returns 0 on success - NOT IMPLEMENTED, NOT TESTED.build_ops_dict(???) -> ???
- NOT IMPLEMENTED, NOT TESTED.asm_pass(???) -> ???
- NOT IMPLEMENTED, NOT TESTED.
This part will be another step forward into a working assembler. In this part I'll implement and test the main textual parsing functions (but not the op handlers themselves).
In this part I'm gonna make the entire assembly process work! It won't assemble to the actual desired ops, but all the ops will be parsed in their desired op handler jumper functions.
asos
is assembly os.
This version will be an actual assembler, written byte by byte, to assemble a piece of memory as text, into assembly binary, and execute it. It will allow me to continue much more easily after it's done, and very tested.
In this step we will implement each of the ops handlers, that the previous part pointed them to the default op handler.