diff --git a/blog/2023/12/acadia-0.1.0.md b/blog/2023/12/acadia-0.1.0.md new file mode 100644 index 0000000..1e6d5ac --- /dev/null +++ b/blog/2023/12/acadia-0.1.0.md @@ -0,0 +1,307 @@ +--- +title: "AcadiaOS 0.1.0" +date: 2023-12-06 +--- + +For the last six months or so I've been periodically working on developing a +hobby operating system. A couple weeks ago I decided that I should finally aim +to cut a "release." This very-early release doesn't include a bunch of user +functionality. Namely you can navigate a filesystem in a primitive manner and +execute binaries. The following image shows just about everything the OS can do. +(The black window is the OS running in QEMU and the larger gray window is debug +output sent to COM1). + +![AcadiaOS in action](images/acadiaos-0.1.0.png) + +While there isn't much to do as a user, there are a lot of building blocks there +that I spent the last 6 months learning about and working on. + +## What I knew going into this + +Frankly, not a lot. + +I took an OS class in college, but while it covered OS fundamentals the projects +were based on writing modules for the Linux kernel rather than working on our +own barebones kernel and OS. So while I vaguely knew of how things like process +scheduling, interrupts, and memory management worked, I had no experience +getting down to the brass tacks of how to actually implement these things. + +I had over the previous couple years spent some time writing a small kernel to +start learning some of these things. However, since I used it as a testing +ground for learning with no real design goals or long term plan, it was kind of +a mess. I had gotten to user space with some primitive syscalls but it was +memory issues and page faults galore. So I decided to "reboot" things earlier +this year. + +## Design Goals + +I decided I wanted to write a microkernel based OS because I figured the more of +my messy code I can move to user space the better. And also because that's what +OS nerds do. I'm not too concerned about the performance cost of extra syscalls +because by god this thing isn't gonna be too performant anyways. + +Additionally, I wanted to try to make the system capability-based. Trying a new +permission model was appealing to me because I've always felt the unix style one +was a bit clunky. After spending some time reading about seL4 and digging into +the Zircon interface I had a (very) rough idea of how these systems worked. I +have no illusions that my OS will every be "secure" but I find the model +interesting. + +## References and Resources + +Over the course of this project I used a lot of resources, not least of which +the OSDev.org [wiki](https://wiki.osdev.org) and +[forums](https://forum.osdev.org). The resources provided there were invaluable, +but the biggest lesson I learned since my first time around writing a kernel was +to rely on specs more than other's code samples and tutorials. + +For the low-level stuff I spent a lot of time digging through Intel and AMD's +monstrous programming manuals. It was helpful to use the wiki to learn for +instance that using the "iret" instruction is a good way to jump to user-space +for the first time, but from there using the programming manuals to understand +exactly how that instruction works rather than just copying code from somewhere. +I had a similar experience with initializing the GDT in 64 bit software. There +are a lot of random claims out there on exactly how you have to set it up, so it +was much more efficient to just go dig through the AMD64 spec however dry it may +be. + +As I worked my way up the stack, I used the SATA and AHCI specs as well. They +pose the additional complication of splitting things up across multiple specs so +you have to go back and forth a lot in non-obvious ways. Hey at least they don't +try to charge you thousands of dollars to get the spec like PCI. + +I also found that when you needed examples of how to do something specific it +can be far better to look at an existing operating system's approach to help +contextualize a specification. Andreas Kling's SerenityOS was invaluable for +this for some low level x86 things. I also referenced the Zircon microkernel to +figure out how to use C++ templates to downcast capability pointers to their +specific objects types without relying on RTTI (run time type information). + +## Kernel Implementation Details + +Ok enough about high level information, ambitions, and goals. Let's discuss a +little bit more about what the actual system can do at this point. I named the +kernel Zion because it is another place I love and it is also kind of fun to +think of the operating system as everything from (A)cadia down to (Z)ion. + +This section will frequently reference the source code which is available on my +self-hosted [gitea](https://gitea.tiramisu.one) or mirrored to +[GitHub](https://github.com/dgalbraith33/acadia). + +### Low-level x86-64 stuff + +Because I found setting up paging, the higher half kernel, and getting to long +mode to be a pain the first time around, I decided to use the [limine +bootloader](https://github.com/limine-bootloader/limine) to start the kernel +this time around instead of GRUB so I could focus on slightly higher level +things. I have ambitions to make the kernel more bootloader-agnostic in the +future but for now it is tightly coupled to the limine protocol. + +On top of the things mentioned above, we use the limine protocol to: + +* Get a map of physical memory. +* Set up a higher-half direct map of memory. +* Find the RDSP. +* Get a VGA framebuffer from UEFI. +* Load the 3 init programs that are needed to bootstrap the VFS. + +Following boot we immediately initialize the global descriptor table (GDT) and +interrupt descriptor table (IDT). The **GDT** is mostly irrelevant for x86-64, +however it was interesting trying to get it to work with the sysret function +which expects two copies of the user-space segment descriptors to allow returing +to 32bit code from a 64 bit OS. Right now the system doesn't support 32 bit code +(and likely never will) so we just duplicate the 64 bit code segment. + +The **IDT** is fairly straightforward and barebones for now. I slowly add more +debugging information to faults as I run into them and it is useful. One of the +biggest improvements was setting up a seperate kernel stack for Page Faults and +General Protection Faults. That way if I broke memory related to the current +stack frame I get useful debugging information rather than an immediate triple +fault. I also recently added some very sloppy stack unwind code so I can more +easily find the context that the fault occurred in. + +Finally we also initialize the **APIC** in a rudimentary fashion. The timer is +used to trigger scheduling events and we map PCI and PS/2 Keyboard interrupts to +appropriate vectors in the IDT. + +### Memory management + +Memory management seems to be one of those areas where every time I make +progress on something I discover about 4 more things I'll have to do down the +line. I'm somewhat happy with the progress I've made so far but I still have a +lot to read up on and learn - especially relating to caching policies for mapped +pages. + +For **physical memory management** I maintain the available memory regions in +two separate linked lists. One list contains single pages for when those are +requested, the other contains the large memory regions which are populated +during initialization. This design allows us to easily reuse freed pages (using +the list of small pages) while still efficiently finding large blocks for things +like memory mapped IO (using the list of large pages). + +The one catch is that to build these linked lists we need an available heap. And +to have an available heap we need to be able to allocate a physical memory +region for it (and its necessary paging structures). To accommodate this, we +initialize a temporary physical memory manager that just takes a hardcoded +number of pages from the first memory region and doles them out in sequence. +Right now I hardcode the number of necessary pages to exactly the number it +needs. This means if I change something that causes more pages to be allocated +earlier than they need to be it is obvious because things break. + +For **virtual memory management** I keep the higher half (kernel) mappings +identical in each address space. Most of the kernel mappings are already +availble from the bootloader but some are added for heaps and additional stacks. +For user memory we maintain a tree of the mapped in objects to ensure that none +intersect. Right now the tree is innefficient because it doesn't self balance +and most objects are inserted in ascending order (i.e. it is essentially a +linked list). + +For user space memory structures we wait until the memory is accessed and +generates a page fault to actually map it in. In order to map it in we check +each paging structure in the higher-half direct map (rather than using a +recursive page structure) to ensure it exists, allocating a page table if +necessary. All physical pages used for paging structures are freed when the +process exits. + +For **kernel heap management** I wrote a +[slab-allocator](https://en.wikipedia.org/wiki/Slab_allocation) for relatively +small allocations (up to 128 bytes currently). I plan on raising the limit for +that as well as adding a buddy allocator for larger allocations in the future +but for now there is no need - all of the allocations are 128 bytes or less! +Larger allocations for now are done using a linear allocator. + +### Scheduling + +Right now the scheduling process is very straight forward. Each runnable thread +is kept in an intrusive linked list and scheduled for a single time slice in a +round robin fashion. + +Thread can block on other threads, semaphores, or mutexes. When this happens +they are flagged as blocked and moved to an intrusive linked list on that object +which is responsible for scheduling those threads once the relevant state +changes. + +The context switching code simply dumps all of the registers onto the stack and +then writes the stack pointer into the thread structure. It also writes the SSE +registers to an allocated space on the thread structure. I believe this code +could be made more efficient by only pushing callee-saved registers and using +the x86 feature that allows you to lazily save the SSE registers only once they +are used. However for now I prefer this code be more reliable than efficient +(because it scares me and is a PITA to debug). + +Finally, there are definitely critical sections in the kernel code that are not +mutex protected currently. It is on the TODO list to do a good audit of this in +preparation for SMP (AcadiaOS 0.2 anyone?). + +### Interface + +Most system calls the kernel provides either (a) create and return a capability +or (b) operate on an existing capability. Capabilities can be duplicated and/or +transmitted to other processes using IPC. + +For syscalls that operate on an existing capability, the kernel checks that the +capability exists, that it is of the correct type, and that the caller has the +correct permissions on it. Only then does it act on the request. + +The kernel provides APIs to: + +* Manage processes and threads. +* Synchronizes threads using mutexes and semaphores. +* Allocate memory and map it into an address space. +* Communicate with other processes using Endpoints, Ports, and Channels. +* Register IRQ handlers. +* Manage Capabilites. +* Print debug information to the VM output. + +### IPC + +Interprocess communication can be done using Endpoints, Ports, or Channels. +**Endpoints** are like servers that can be called and provide a response. For +each call a "ReplyPort" capability is generated that the caller can wait for a +response on and the server can send its response to. **Ports** are simply +one-way streams of messages that don't expect a response. Example uses are for +process initialization information or for IRQ handlers. **Channels** are +for bidirectional message passing that I haven't found a use for and will +probably replace in the future with a byte-stream interface. + +Message that are passed on these interfaces consist of two parts: a byte array, +and an array of capabilities. Each capability passed is removed from the +existing process and passed along to whichever process receives the request. + +I'm fairly happy with these interfaces so far and was able to build a user-space +IDL (Yunq) on top of them to facilitate message and capability passing. However, +I'm concerned about their ability to handle certain concerns. For instance, +since endpoints aren't "owned" by a specific process, it is impossible to tell +if you are "shouting into the void" at a process that has crashed or isn't +listening to the specific endpoint anymore. + +## User Space Programs + +There are a few user-space programs that are run on the system: + +* **Yellowstone**: The init process that starts all others and maintains a + registry of endpoints. (Because Yellowstone was first). +* **Denali**: A basic AHCI driver to read from disk. (D for disk). +* **VictoriaFallS**: A VFS server with a super simple read-only ext2 + implementation. (I couldn't resist because it has VFS in it). +* **Teton**: A terminal application with a lightweight shell in it (should + eventually be split). (T for terminal). +* **Voyageurs**: PS/2 Keyboard driver with the intent of becoming the USB + driver. (Idk bytes traveling over USB are making a voyage I guess). + +These programs are all bare-bones versions of what they could be in the future. +I hope to describe them in further detail in the future, but for now the +initialization process works like this. + +1. Yellowstone, Denali, and VictoriaFallS binaries are loaded into memory as + modules by the bootloader. +2. The kernel loads and starts the Yellowstone process, passing it memory + capabilities to the Denali and VictoriaFallS binaries. +3. Yellowstone starts Denali and waits for it to register itself. +4. Yellowstone reads the GPT and then starts VictoriaFallS on the correct + partition and waits for it to register itself. +5. Yellowstone then reads the /init.txt file from the disk and starts each + process specified (one per line) in succession. + +## Yunq IDL + +As I began writing system services, I found a huge speed bump was creating +client and server classes for the service. I started by just passing structs as +a byte array and hardcoding whether or not the process expected to receive a +capability with the call. This approach worked but was painful and led to me +dreading each new service I added to the system (not how it should be for a +microkernel architecture!). Additionally I did things like avoiding repeated +fields or strings fields that weren't possible to pass in a single struct. + +It was clear I needed some sort of IDL to handle this, but for months I waffled +on it as I tried to figure out how to incorporate an existing one into the +system. That didn't work for two reasons. First, we need a way to pass +capabilities with the messages. These kind of need to be sidechanneled because +the kernel can't just treat them as another string of bytes (they have to be +moved into the other processes capability space). Second, existing serialization +libraries tend to have dependencies, so porting them would require porting those +dependencies first. Granted, some of them just require super basic things like +say a libc implementation - but we don't even have that yet. All that to say I +ended up writing my own. + +I was pleasantly surprised with how straightforward it ended up being. I think +it took me about 3 coding sessions to get the basic parsing and codegen going +for the language. It still doesn't have all of the features I planned for it +(like nested messages), but it works super well for setting up new services +quickly and easily. Currently the implementation is in python because I wanted +to get something working quickly, but I'll probably reimplement it in a compiled +language in the future with a focus on better error information. + +## Closing thoughts + +Overall, I'm very pleased with how this project has turned out. I feel like I've +definitely accomplished my goal to learn more about how operating systems are +actually implemented. It has been cool to be able to pull back the curtain and +see some of the simple primitives that underlay the complex features of an +operating system. + +I aim to continue forward with this project - without throwing out the code +again as I did earlier this year. I'm happy with the base and look to iterate on +it, hopefully building something more useful in the future but definitely +learning more along the way. + diff --git a/blog/2023/12/images/acadiaos-0.1.0.png b/blog/2023/12/images/acadiaos-0.1.0.png new file mode 100644 index 0000000..6e3713d Binary files /dev/null and b/blog/2023/12/images/acadiaos-0.1.0.png differ diff --git a/public/blog/2023/12/acadia-0.1.0.html b/public/blog/2023/12/acadia-0.1.0.html new file mode 100644 index 0000000..25b94f3 --- /dev/null +++ b/public/blog/2023/12/acadia-0.1.0.html @@ -0,0 +1,323 @@ + + +
+ +For the last six months or so I’ve been periodically working on + developing a hobby operating system. A couple weeks ago I decided + that I should finally aim to cut a “release.” This very-early + release doesn’t include a bunch of user functionality. Namely you + can navigate a filesystem in a primitive manner and execute + binaries. The following image shows just about everything the OS can + do. (The black window is the OS running in QEMU and the larger gray + window is debug output sent to COM1).
+ +While there isn’t much to do as a user, there are a lot of + building blocks there that I spent the last 6 months learning about + and working on.
+Frankly, not a lot.
+I took an OS class in college, but while it covered OS + fundamentals the projects were based on writing modules for the + Linux kernel rather than working on our own barebones kernel and OS. + So while I vaguely knew of how things like process scheduling, + interrupts, and memory management worked, I had no experience + getting down to the brass tacks of how to actually implement these + things.
+I had over the previous couple years spent some time writing a + small kernel to start learning some of these things. However, since + I used it as a testing ground for learning with no real design goals + or long term plan, it was kind of a mess. I had gotten to user space + with some primitive syscalls but it was memory issues and page + faults galore. So I decided to “reboot” things earlier this + year.
+I decided I wanted to write a microkernel based OS because I + figured the more of my messy code I can move to user space the + better. And also because that’s what OS nerds do. I’m not too + concerned about the performance cost of extra syscalls because by + god this thing isn’t gonna be too performant anyways.
+Additionally, I wanted to try to make the system + capability-based. Trying a new permission model was appealing to me + because I’ve always felt the unix style one was a bit clunky. After + spending some time reading about seL4 and digging into the Zircon + interface I had a (very) rough idea of how these systems worked. I + have no illusions that my OS will every be “secure” but I find the + model interesting.
+Over the course of this project I used a lot of resources, not + least of which the OSDev.org wiki and forums. The resources provided + there were invaluable, but the biggest lesson I learned since my + first time around writing a kernel was to rely on specs more than + other’s code samples and tutorials.
+For the low-level stuff I spent a lot of time digging through + Intel and AMD’s monstrous programming manuals. It was helpful to use + the wiki to learn for instance that using the “iret” instruction is + a good way to jump to user-space for the first time, but from there + using the programming manuals to understand exactly how that + instruction works rather than just copying code from somewhere. I + had a similar experience with initializing the GDT in 64 bit + software. There are a lot of random claims out there on exactly how + you have to set it up, so it was much more efficient to just go dig + through the AMD64 spec however dry it may be.
+As I worked my way up the stack, I used the SATA and AHCI specs + as well. They pose the additional complication of splitting things + up across multiple specs so you have to go back and forth a lot in + non-obvious ways. Hey at least they don’t try to charge you + thousands of dollars to get the spec like PCI.
+I also found that when you needed examples of how to do something + specific it can be far better to look at an existing operating + system’s approach to help contextualize a specification. Andreas + Kling’s SerenityOS was invaluable for this for some low level x86 + things. I also referenced the Zircon microkernel to figure out how + to use C++ templates to downcast capability pointers to their + specific objects types without relying on RTTI (run time type + information).
+Ok enough about high level information, ambitions, and goals. + Let’s discuss a little bit more about what the actual system can do + at this point. I named the kernel Zion because it is another place I + love and it is also kind of fun to think of the operating system as + everything from (A)cadia down to (Z)ion.
+This section will frequently reference the source code which is + available on my self-hosted gitea or mirrored to GitHub.
+Because I found setting up paging, the higher half kernel, and + getting to long mode to be a pain the first time around, I decided + to use the limine + bootloader to start the kernel this time around instead of GRUB + so I could focus on slightly higher level things. I have ambitions + to make the kernel more bootloader-agnostic in the future but for + now it is tightly coupled to the limine protocol.
+On top of the things mentioned above, we use the limine protocol + to:
+Following boot we immediately initialize the global descriptor + table (GDT) and interrupt descriptor table (IDT). The + GDT is mostly irrelevant for x86-64, however it was + interesting trying to get it to work with the sysret function which + expects two copies of the user-space segment descriptors to allow + returing to 32bit code from a 64 bit OS. Right now the system + doesn’t support 32 bit code (and likely never will) so we just + duplicate the 64 bit code segment.
+The IDT is fairly straightforward and barebones + for now. I slowly add more debugging information to faults as I run + into them and it is useful. One of the biggest improvements was + setting up a seperate kernel stack for Page Faults and General + Protection Faults. That way if I broke memory related to the current + stack frame I get useful debugging information rather than an + immediate triple fault. I also recently added some very sloppy stack + unwind code so I can more easily find the context that the fault + occurred in.
+Finally we also initialize the APIC in a + rudimentary fashion. The timer is used to trigger scheduling events + and we map PCI and PS/2 Keyboard interrupts to appropriate vectors + in the IDT.
+Memory management seems to be one of those areas where every time + I make progress on something I discover about 4 more things I’ll + have to do down the line. I’m somewhat happy with the progress I’ve + made so far but I still have a lot to read up on and learn - + especially relating to caching policies for mapped pages.
+For physical memory management I maintain the + available memory regions in two separate linked lists. One list + contains single pages for when those are requested, the other + contains the large memory regions which are populated during + initialization. This design allows us to easily reuse freed pages + (using the list of small pages) while still efficiently finding + large blocks for things like memory mapped IO (using the list of + large pages).
+The one catch is that to build these linked lists we need an + available heap. And to have an available heap we need to be able to + allocate a physical memory region for it (and its necessary paging + structures). To accommodate this, we initialize a temporary physical + memory manager that just takes a hardcoded number of pages from the + first memory region and doles them out in sequence. Right now I + hardcode the number of necessary pages to exactly the number it + needs. This means if I change something that causes more pages to be + allocated earlier than they need to be it is obvious because things + break.
+For virtual memory management I keep the higher + half (kernel) mappings identical in each address space. Most of the + kernel mappings are already availble from the bootloader but some + are added for heaps and additional stacks. For user memory we + maintain a tree of the mapped in objects to ensure that none + intersect. Right now the tree is innefficient because it doesn’t + self balance and most objects are inserted in ascending order + (i.e. it is essentially a linked list).
+For user space memory structures we wait until the memory is + accessed and generates a page fault to actually map it in. In order + to map it in we check each paging structure in the higher-half + direct map (rather than using a recursive page structure) to ensure + it exists, allocating a page table if necessary. All physical pages + used for paging structures are freed when the process exits.
+For kernel heap management I wrote a slab-allocator + for relatively small allocations (up to 128 bytes currently). I plan + on raising the limit for that as well as adding a buddy allocator + for larger allocations in the future but for now there is no need - + all of the allocations are 128 bytes or less! Larger allocations for + now are done using a linear allocator.
+Right now the scheduling process is very straight forward. Each + runnable thread is kept in an intrusive linked list and scheduled + for a single time slice in a round robin fashion.
+Thread can block on other threads, semaphores, or mutexes. When + this happens they are flagged as blocked and moved to an intrusive + linked list on that object which is responsible for scheduling those + threads once the relevant state changes.
+The context switching code simply dumps all of the registers onto + the stack and then writes the stack pointer into the thread + structure. It also writes the SSE registers to an allocated space on + the thread structure. I believe this code could be made more + efficient by only pushing callee-saved registers and using the x86 + feature that allows you to lazily save the SSE registers only once + they are used. However for now I prefer this code be more reliable + than efficient (because it scares me and is a PITA to debug).
+Finally, there are definitely critical sections in the kernel + code that are not mutex protected currently. It is on the TODO list + to do a good audit of this in preparation for SMP (AcadiaOS 0.2 + anyone?).
+Most system calls the kernel provides either (a) create and + return a capability or (b) operate on an existing capability. + Capabilities can be duplicated and/or transmitted to other processes + using IPC.
+For syscalls that operate on an existing capability, the kernel + checks that the capability exists, that it is of the correct type, + and that the caller has the correct permissions on it. Only then + does it act on the request.
+The kernel provides APIs to:
+Interprocess communication can be done using Endpoints, Ports, or + Channels. Endpoints are like servers that can be + called and provide a response. For each call a “ReplyPort” + capability is generated that the caller can wait for a response on + and the server can send its response to. Ports are + simply one-way streams of messages that don’t expect a response. + Example uses are for process initialization information or for IRQ + handlers. Channels are for bidirectional message + passing that I haven’t found a use for and will probably replace in + the future with a byte-stream interface.
+Message that are passed on these interfaces consist of two parts: + a byte array, and an array of capabilities. Each capability passed + is removed from the existing process and passed along to whichever + process receives the request.
+I’m fairly happy with these interfaces so far and was able to + build a user-space IDL (Yunq) on top of them to facilitate message + and capability passing. However, I’m concerned about their ability + to handle certain concerns. For instance, since endpoints aren’t + “owned” by a specific process, it is impossible to tell if you are + “shouting into the void” at a process that has crashed or isn’t + listening to the specific endpoint anymore.
+There are a few user-space programs that are run on the + system:
+These programs are all bare-bones versions of what they could be + in the future. I hope to describe them in further detail in the + future, but for now the initialization process works like this.
+As I began writing system services, I found a huge speed bump was + creating client and server classes for the service. I started by + just passing structs as a byte array and hardcoding whether or not + the process expected to receive a capability with the call. This + approach worked but was painful and led to me dreading each new + service I added to the system (not how it should be for a + microkernel architecture!). Additionally I did things like avoiding + repeated fields or strings fields that weren’t possible to pass in a + single struct.
+It was clear I needed some sort of IDL to handle this, but for + months I waffled on it as I tried to figure out how to incorporate + an existing one into the system. That didn’t work for two reasons. + First, we need a way to pass capabilities with the messages. These + kind of need to be sidechanneled because the kernel can’t just treat + them as another string of bytes (they have to be moved into the + other processes capability space). Second, existing serialization + libraries tend to have dependencies, so porting them would require + porting those dependencies first. Granted, some of them just require + super basic things like say a libc implementation - but we don’t + even have that yet. All that to say I ended up writing my own.
+I was pleasantly surprised with how straightforward it ended up + being. I think it took me about 3 coding sessions to get the basic + parsing and codegen going for the language. It still doesn’t have + all of the features I planned for it (like nested messages), but it + works super well for setting up new services quickly and easily. + Currently the implementation is in python because I wanted to get + something working quickly, but I’ll probably reimplement it in a + compiled language in the future with a focus on better error + information.
+Overall, I’m very pleased with how this project has turned out. I + feel like I’ve definitely accomplished my goal to learn more about + how operating systems are actually implemented. It has been cool to + be able to pull back the curtain and see some of the simple + primitives that underlay the complex features of an operating + system.
+I aim to continue forward with this project - without throwing + out the code again as I did earlier this year. I’m happy with the + base and look to iterate on it, hopefully building something more + useful in the future but definitely learning more along the way.
+