AcadiaOS 0.1.0 draft post
This commit is contained in:
parent
a347fee0de
commit
1339d09535
|
@ -0,0 +1,307 @@
|
||||||
|
---
|
||||||
|
title: "AcadiaOS 0.1.0"
|
||||||
|
date: 2023-12-06
|
||||||
|
---
|
||||||
|
|
||||||
|
For the last six months or so I've been periodically working on developing a
|
||||||
|
hobby operating system. A couple weeks ago I decided that I should finally aim
|
||||||
|
to cut a "release." This very-early release doesn't include a bunch of user
|
||||||
|
functionality. Namely you can navigate a filesystem in a primitive manner and
|
||||||
|
execute binaries. The following image shows just about everything the OS can do.
|
||||||
|
(The black window is the OS running in QEMU and the larger gray window is debug
|
||||||
|
output sent to COM1).
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
While there isn't much to do as a user, there are a lot of building blocks there
|
||||||
|
that I spent the last 6 months learning about and working on.
|
||||||
|
|
||||||
|
## What I knew going into this
|
||||||
|
|
||||||
|
Frankly, not a lot.
|
||||||
|
|
||||||
|
I took an OS class in college, but while it covered OS fundamentals the projects
|
||||||
|
were based on writing modules for the Linux kernel rather than working on our
|
||||||
|
own barebones kernel and OS. So while I vaguely knew of how things like process
|
||||||
|
scheduling, interrupts, and memory management worked, I had no experience
|
||||||
|
getting down to the brass tacks of how to actually implement these things.
|
||||||
|
|
||||||
|
I had over the previous couple years spent some time writing a small kernel to
|
||||||
|
start learning some of these things. However, since I used it as a testing
|
||||||
|
ground for learning with no real design goals or long term plan, it was kind of
|
||||||
|
a mess. I had gotten to user space with some primitive syscalls but it was
|
||||||
|
memory issues and page faults galore. So I decided to "reboot" things earlier
|
||||||
|
this year.
|
||||||
|
|
||||||
|
## Design Goals
|
||||||
|
|
||||||
|
I decided I wanted to write a microkernel based OS because I figured the more of
|
||||||
|
my messy code I can move to user space the better. And also because that's what
|
||||||
|
OS nerds do. I'm not too concerned about the performance cost of extra syscalls
|
||||||
|
because by god this thing isn't gonna be too performant anyways.
|
||||||
|
|
||||||
|
Additionally, I wanted to try to make the system capability-based. Trying a new
|
||||||
|
permission model was appealing to me because I've always felt the unix style one
|
||||||
|
was a bit clunky. After spending some time reading about seL4 and digging into
|
||||||
|
the Zircon interface I had a (very) rough idea of how these systems worked. I
|
||||||
|
have no illusions that my OS will every be "secure" but I find the model
|
||||||
|
interesting.
|
||||||
|
|
||||||
|
## References and Resources
|
||||||
|
|
||||||
|
Over the course of this project I used a lot of resources, not least of which
|
||||||
|
the OSDev.org [wiki](https://wiki.osdev.org) and
|
||||||
|
[forums](https://forum.osdev.org). The resources provided there were invaluable,
|
||||||
|
but the biggest lesson I learned since my first time around writing a kernel was
|
||||||
|
to rely on specs more than other's code samples and tutorials.
|
||||||
|
|
||||||
|
For the low-level stuff I spent a lot of time digging through Intel and AMD's
|
||||||
|
monstrous programming manuals. It was helpful to use the wiki to learn for
|
||||||
|
instance that using the "iret" instruction is a good way to jump to user-space
|
||||||
|
for the first time, but from there using the programming manuals to understand
|
||||||
|
exactly how that instruction works rather than just copying code from somewhere.
|
||||||
|
I had a similar experience with initializing the GDT in 64 bit software. There
|
||||||
|
are a lot of random claims out there on exactly how you have to set it up, so it
|
||||||
|
was much more efficient to just go dig through the AMD64 spec however dry it may
|
||||||
|
be.
|
||||||
|
|
||||||
|
As I worked my way up the stack, I used the SATA and AHCI specs as well. They
|
||||||
|
pose the additional complication of splitting things up across multiple specs so
|
||||||
|
you have to go back and forth a lot in non-obvious ways. Hey at least they don't
|
||||||
|
try to charge you thousands of dollars to get the spec like PCI.
|
||||||
|
|
||||||
|
I also found that when you needed examples of how to do something specific it
|
||||||
|
can be far better to look at an existing operating system's approach to help
|
||||||
|
contextualize a specification. Andreas Kling's SerenityOS was invaluable for
|
||||||
|
this for some low level x86 things. I also referenced the Zircon microkernel to
|
||||||
|
figure out how to use C++ templates to downcast capability pointers to their
|
||||||
|
specific objects types without relying on RTTI (run time type information).
|
||||||
|
|
||||||
|
## Kernel Implementation Details
|
||||||
|
|
||||||
|
Ok enough about high level information, ambitions, and goals. Let's discuss a
|
||||||
|
little bit more about what the actual system can do at this point. I named the
|
||||||
|
kernel Zion because it is another place I love and it is also kind of fun to
|
||||||
|
think of the operating system as everything from (A)cadia down to (Z)ion.
|
||||||
|
|
||||||
|
This section will frequently reference the source code which is available on my
|
||||||
|
self-hosted [gitea](https://gitea.tiramisu.one) or mirrored to
|
||||||
|
[GitHub](https://github.com/dgalbraith33/acadia).
|
||||||
|
|
||||||
|
### Low-level x86-64 stuff
|
||||||
|
|
||||||
|
Because I found setting up paging, the higher half kernel, and getting to long
|
||||||
|
mode to be a pain the first time around, I decided to use the [limine
|
||||||
|
bootloader](https://github.com/limine-bootloader/limine) to start the kernel
|
||||||
|
this time around instead of GRUB so I could focus on slightly higher level
|
||||||
|
things. I have ambitions to make the kernel more bootloader-agnostic in the
|
||||||
|
future but for now it is tightly coupled to the limine protocol.
|
||||||
|
|
||||||
|
On top of the things mentioned above, we use the limine protocol to:
|
||||||
|
|
||||||
|
* Get a map of physical memory.
|
||||||
|
* Set up a higher-half direct map of memory.
|
||||||
|
* Find the RDSP.
|
||||||
|
* Get a VGA framebuffer from UEFI.
|
||||||
|
* Load the 3 init programs that are needed to bootstrap the VFS.
|
||||||
|
|
||||||
|
Following boot we immediately initialize the global descriptor table (GDT) and
|
||||||
|
interrupt descriptor table (IDT). The **GDT** is mostly irrelevant for x86-64,
|
||||||
|
however it was interesting trying to get it to work with the sysret function
|
||||||
|
which expects two copies of the user-space segment descriptors to allow returing
|
||||||
|
to 32bit code from a 64 bit OS. Right now the system doesn't support 32 bit code
|
||||||
|
(and likely never will) so we just duplicate the 64 bit code segment.
|
||||||
|
|
||||||
|
The **IDT** is fairly straightforward and barebones for now. I slowly add more
|
||||||
|
debugging information to faults as I run into them and it is useful. One of the
|
||||||
|
biggest improvements was setting up a seperate kernel stack for Page Faults and
|
||||||
|
General Protection Faults. That way if I broke memory related to the current
|
||||||
|
stack frame I get useful debugging information rather than an immediate triple
|
||||||
|
fault. I also recently added some very sloppy stack unwind code so I can more
|
||||||
|
easily find the context that the fault occurred in.
|
||||||
|
|
||||||
|
Finally we also initialize the **APIC** in a rudimentary fashion. The timer is
|
||||||
|
used to trigger scheduling events and we map PCI and PS/2 Keyboard interrupts to
|
||||||
|
appropriate vectors in the IDT.
|
||||||
|
|
||||||
|
### Memory management
|
||||||
|
|
||||||
|
Memory management seems to be one of those areas where every time I make
|
||||||
|
progress on something I discover about 4 more things I'll have to do down the
|
||||||
|
line. I'm somewhat happy with the progress I've made so far but I still have a
|
||||||
|
lot to read up on and learn - especially relating to caching policies for mapped
|
||||||
|
pages.
|
||||||
|
|
||||||
|
For **physical memory management** I maintain the available memory regions in
|
||||||
|
two separate linked lists. One list contains single pages for when those are
|
||||||
|
requested, the other contains the large memory regions which are populated
|
||||||
|
during initialization. This design allows us to easily reuse freed pages (using
|
||||||
|
the list of small pages) while still efficiently finding large blocks for things
|
||||||
|
like memory mapped IO (using the list of large pages).
|
||||||
|
|
||||||
|
The one catch is that to build these linked lists we need an available heap. And
|
||||||
|
to have an available heap we need to be able to allocate a physical memory
|
||||||
|
region for it (and its necessary paging structures). To accommodate this, we
|
||||||
|
initialize a temporary physical memory manager that just takes a hardcoded
|
||||||
|
number of pages from the first memory region and doles them out in sequence.
|
||||||
|
Right now I hardcode the number of necessary pages to exactly the number it
|
||||||
|
needs. This means if I change something that causes more pages to be allocated
|
||||||
|
earlier than they need to be it is obvious because things break.
|
||||||
|
|
||||||
|
For **virtual memory management** I keep the higher half (kernel) mappings
|
||||||
|
identical in each address space. Most of the kernel mappings are already
|
||||||
|
availble from the bootloader but some are added for heaps and additional stacks.
|
||||||
|
For user memory we maintain a tree of the mapped in objects to ensure that none
|
||||||
|
intersect. Right now the tree is innefficient because it doesn't self balance
|
||||||
|
and most objects are inserted in ascending order (i.e. it is essentially a
|
||||||
|
linked list).
|
||||||
|
|
||||||
|
For user space memory structures we wait until the memory is accessed and
|
||||||
|
generates a page fault to actually map it in. In order to map it in we check
|
||||||
|
each paging structure in the higher-half direct map (rather than using a
|
||||||
|
recursive page structure) to ensure it exists, allocating a page table if
|
||||||
|
necessary. All physical pages used for paging structures are freed when the
|
||||||
|
process exits.
|
||||||
|
|
||||||
|
For **kernel heap management** I wrote a
|
||||||
|
[slab-allocator](https://en.wikipedia.org/wiki/Slab_allocation) for relatively
|
||||||
|
small allocations (up to 128 bytes currently). I plan on raising the limit for
|
||||||
|
that as well as adding a buddy allocator for larger allocations in the future
|
||||||
|
but for now there is no need - all of the allocations are 128 bytes or less!
|
||||||
|
Larger allocations for now are done using a linear allocator.
|
||||||
|
|
||||||
|
### Scheduling
|
||||||
|
|
||||||
|
Right now the scheduling process is very straight forward. Each runnable thread
|
||||||
|
is kept in an intrusive linked list and scheduled for a single time slice in a
|
||||||
|
round robin fashion.
|
||||||
|
|
||||||
|
Thread can block on other threads, semaphores, or mutexes. When this happens
|
||||||
|
they are flagged as blocked and moved to an intrusive linked list on that object
|
||||||
|
which is responsible for scheduling those threads once the relevant state
|
||||||
|
changes.
|
||||||
|
|
||||||
|
The context switching code simply dumps all of the registers onto the stack and
|
||||||
|
then writes the stack pointer into the thread structure. It also writes the SSE
|
||||||
|
registers to an allocated space on the thread structure. I believe this code
|
||||||
|
could be made more efficient by only pushing callee-saved registers and using
|
||||||
|
the x86 feature that allows you to lazily save the SSE registers only once they
|
||||||
|
are used. However for now I prefer this code be more reliable than efficient
|
||||||
|
(because it scares me and is a PITA to debug).
|
||||||
|
|
||||||
|
Finally, there are definitely critical sections in the kernel code that are not
|
||||||
|
mutex protected currently. It is on the TODO list to do a good audit of this in
|
||||||
|
preparation for SMP (AcadiaOS 0.2 anyone?).
|
||||||
|
|
||||||
|
### Interface
|
||||||
|
|
||||||
|
Most system calls the kernel provides either (a) create and return a capability
|
||||||
|
or (b) operate on an existing capability. Capabilities can be duplicated and/or
|
||||||
|
transmitted to other processes using IPC.
|
||||||
|
|
||||||
|
For syscalls that operate on an existing capability, the kernel checks that the
|
||||||
|
capability exists, that it is of the correct type, and that the caller has the
|
||||||
|
correct permissions on it. Only then does it act on the request.
|
||||||
|
|
||||||
|
The kernel provides APIs to:
|
||||||
|
|
||||||
|
* Manage processes and threads.
|
||||||
|
* Synchronizes threads using mutexes and semaphores.
|
||||||
|
* Allocate memory and map it into an address space.
|
||||||
|
* Communicate with other processes using Endpoints, Ports, and Channels.
|
||||||
|
* Register IRQ handlers.
|
||||||
|
* Manage Capabilites.
|
||||||
|
* Print debug information to the VM output.
|
||||||
|
|
||||||
|
### IPC
|
||||||
|
|
||||||
|
Interprocess communication can be done using Endpoints, Ports, or Channels.
|
||||||
|
**Endpoints** are like servers that can be called and provide a response. For
|
||||||
|
each call a "ReplyPort" capability is generated that the caller can wait for a
|
||||||
|
response on and the server can send its response to. **Ports** are simply
|
||||||
|
one-way streams of messages that don't expect a response. Example uses are for
|
||||||
|
process initialization information or for IRQ handlers. **Channels** are
|
||||||
|
for bidirectional message passing that I haven't found a use for and will
|
||||||
|
probably replace in the future with a byte-stream interface.
|
||||||
|
|
||||||
|
Message that are passed on these interfaces consist of two parts: a byte array,
|
||||||
|
and an array of capabilities. Each capability passed is removed from the
|
||||||
|
existing process and passed along to whichever process receives the request.
|
||||||
|
|
||||||
|
I'm fairly happy with these interfaces so far and was able to build a user-space
|
||||||
|
IDL (Yunq) on top of them to facilitate message and capability passing. However,
|
||||||
|
I'm concerned about their ability to handle certain concerns. For instance,
|
||||||
|
since endpoints aren't "owned" by a specific process, it is impossible to tell
|
||||||
|
if you are "shouting into the void" at a process that has crashed or isn't
|
||||||
|
listening to the specific endpoint anymore.
|
||||||
|
|
||||||
|
## User Space Programs
|
||||||
|
|
||||||
|
There are a few user-space programs that are run on the system:
|
||||||
|
|
||||||
|
* **Yellowstone**: The init process that starts all others and maintains a
|
||||||
|
registry of endpoints. (Because Yellowstone was first).
|
||||||
|
* **Denali**: A basic AHCI driver to read from disk. (D for disk).
|
||||||
|
* **VictoriaFallS**: A VFS server with a super simple read-only ext2
|
||||||
|
implementation. (I couldn't resist because it has VFS in it).
|
||||||
|
* **Teton**: A terminal application with a lightweight shell in it (should
|
||||||
|
eventually be split). (T for terminal).
|
||||||
|
* **Voyageurs**: PS/2 Keyboard driver with the intent of becoming the USB
|
||||||
|
driver. (Idk bytes traveling over USB are making a voyage I guess).
|
||||||
|
|
||||||
|
These programs are all bare-bones versions of what they could be in the future.
|
||||||
|
I hope to describe them in further detail in the future, but for now the
|
||||||
|
initialization process works like this.
|
||||||
|
|
||||||
|
1. Yellowstone, Denali, and VictoriaFallS binaries are loaded into memory as
|
||||||
|
modules by the bootloader.
|
||||||
|
2. The kernel loads and starts the Yellowstone process, passing it memory
|
||||||
|
capabilities to the Denali and VictoriaFallS binaries.
|
||||||
|
3. Yellowstone starts Denali and waits for it to register itself.
|
||||||
|
4. Yellowstone reads the GPT and then starts VictoriaFallS on the correct
|
||||||
|
partition and waits for it to register itself.
|
||||||
|
5. Yellowstone then reads the /init.txt file from the disk and starts each
|
||||||
|
process specified (one per line) in succession.
|
||||||
|
|
||||||
|
## Yunq IDL
|
||||||
|
|
||||||
|
As I began writing system services, I found a huge speed bump was creating
|
||||||
|
client and server classes for the service. I started by just passing structs as
|
||||||
|
a byte array and hardcoding whether or not the process expected to receive a
|
||||||
|
capability with the call. This approach worked but was painful and led to me
|
||||||
|
dreading each new service I added to the system (not how it should be for a
|
||||||
|
microkernel architecture!). Additionally I did things like avoiding repeated
|
||||||
|
fields or strings fields that weren't possible to pass in a single struct.
|
||||||
|
|
||||||
|
It was clear I needed some sort of IDL to handle this, but for months I waffled
|
||||||
|
on it as I tried to figure out how to incorporate an existing one into the
|
||||||
|
system. That didn't work for two reasons. First, we need a way to pass
|
||||||
|
capabilities with the messages. These kind of need to be sidechanneled because
|
||||||
|
the kernel can't just treat them as another string of bytes (they have to be
|
||||||
|
moved into the other processes capability space). Second, existing serialization
|
||||||
|
libraries tend to have dependencies, so porting them would require porting those
|
||||||
|
dependencies first. Granted, some of them just require super basic things like
|
||||||
|
say a libc implementation - but we don't even have that yet. All that to say I
|
||||||
|
ended up writing my own.
|
||||||
|
|
||||||
|
I was pleasantly surprised with how straightforward it ended up being. I think
|
||||||
|
it took me about 3 coding sessions to get the basic parsing and codegen going
|
||||||
|
for the language. It still doesn't have all of the features I planned for it
|
||||||
|
(like nested messages), but it works super well for setting up new services
|
||||||
|
quickly and easily. Currently the implementation is in python because I wanted
|
||||||
|
to get something working quickly, but I'll probably reimplement it in a compiled
|
||||||
|
language in the future with a focus on better error information.
|
||||||
|
|
||||||
|
## Closing thoughts
|
||||||
|
|
||||||
|
Overall, I'm very pleased with how this project has turned out. I feel like I've
|
||||||
|
definitely accomplished my goal to learn more about how operating systems are
|
||||||
|
actually implemented. It has been cool to be able to pull back the curtain and
|
||||||
|
see some of the simple primitives that underlay the complex features of an
|
||||||
|
operating system.
|
||||||
|
|
||||||
|
I aim to continue forward with this project - without throwing out the code
|
||||||
|
again as I did earlier this year. I'm happy with the base and look to iterate on
|
||||||
|
it, hopefully building something more useful in the future but definitely
|
||||||
|
learning more along the way.
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 268 KiB |
|
@ -0,0 +1,323 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<title>AcadiaOS 0.1.0</title>
|
||||||
|
<link rel="stylesheet" href="/css/styles.css">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="container">
|
||||||
|
<h1 class="page-title">AcadiaOS 0.1.0</h1>
|
||||||
|
<div class="date">Published 2023-12-06</div>
|
||||||
|
<p>For the last six months or so I’ve been periodically working on
|
||||||
|
developing a hobby operating system. A couple weeks ago I decided
|
||||||
|
that I should finally aim to cut a “release.” This very-early
|
||||||
|
release doesn’t include a bunch of user functionality. Namely you
|
||||||
|
can navigate a filesystem in a primitive manner and execute
|
||||||
|
binaries. The following image shows just about everything the OS can
|
||||||
|
do. (The black window is the OS running in QEMU and the larger gray
|
||||||
|
window is debug output sent to COM1).</p>
|
||||||
|
<figure>
|
||||||
|
<img src="images/acadiaos-0.1.0.png" alt="AcadiaOS in action" />
|
||||||
|
<figcaption aria-hidden="true">AcadiaOS in action</figcaption>
|
||||||
|
</figure>
|
||||||
|
<p>While there isn’t much to do as a user, there are a lot of
|
||||||
|
building blocks there that I spent the last 6 months learning about
|
||||||
|
and working on.</p>
|
||||||
|
<h2 id="what-i-knew-going-into-this">What I knew going into
|
||||||
|
this</h2>
|
||||||
|
<p>Frankly, not a lot.</p>
|
||||||
|
<p>I took an OS class in college, but while it covered OS
|
||||||
|
fundamentals the projects were based on writing modules for the
|
||||||
|
Linux kernel rather than working on our own barebones kernel and OS.
|
||||||
|
So while I vaguely knew of how things like process scheduling,
|
||||||
|
interrupts, and memory management worked, I had no experience
|
||||||
|
getting down to the brass tacks of how to actually implement these
|
||||||
|
things.</p>
|
||||||
|
<p>I had over the previous couple years spent some time writing a
|
||||||
|
small kernel to start learning some of these things. However, since
|
||||||
|
I used it as a testing ground for learning with no real design goals
|
||||||
|
or long term plan, it was kind of a mess. I had gotten to user space
|
||||||
|
with some primitive syscalls but it was memory issues and page
|
||||||
|
faults galore. So I decided to “reboot” things earlier this
|
||||||
|
year.</p>
|
||||||
|
<h2 id="design-goals">Design Goals</h2>
|
||||||
|
<p>I decided I wanted to write a microkernel based OS because I
|
||||||
|
figured the more of my messy code I can move to user space the
|
||||||
|
better. And also because that’s what OS nerds do. I’m not too
|
||||||
|
concerned about the performance cost of extra syscalls because by
|
||||||
|
god this thing isn’t gonna be too performant anyways.</p>
|
||||||
|
<p>Additionally, I wanted to try to make the system
|
||||||
|
capability-based. Trying a new permission model was appealing to me
|
||||||
|
because I’ve always felt the unix style one was a bit clunky. After
|
||||||
|
spending some time reading about seL4 and digging into the Zircon
|
||||||
|
interface I had a (very) rough idea of how these systems worked. I
|
||||||
|
have no illusions that my OS will every be “secure” but I find the
|
||||||
|
model interesting.</p>
|
||||||
|
<h2 id="references-and-resources">References and Resources</h2>
|
||||||
|
<p>Over the course of this project I used a lot of resources, not
|
||||||
|
least of which the OSDev.org <a
|
||||||
|
href="https://wiki.osdev.org">wiki</a> and <a
|
||||||
|
href="https://forum.osdev.org">forums</a>. The resources provided
|
||||||
|
there were invaluable, but the biggest lesson I learned since my
|
||||||
|
first time around writing a kernel was to rely on specs more than
|
||||||
|
other’s code samples and tutorials.</p>
|
||||||
|
<p>For the low-level stuff I spent a lot of time digging through
|
||||||
|
Intel and AMD’s monstrous programming manuals. It was helpful to use
|
||||||
|
the wiki to learn for instance that using the “iret” instruction is
|
||||||
|
a good way to jump to user-space for the first time, but from there
|
||||||
|
using the programming manuals to understand exactly how that
|
||||||
|
instruction works rather than just copying code from somewhere. I
|
||||||
|
had a similar experience with initializing the GDT in 64 bit
|
||||||
|
software. There are a lot of random claims out there on exactly how
|
||||||
|
you have to set it up, so it was much more efficient to just go dig
|
||||||
|
through the AMD64 spec however dry it may be.</p>
|
||||||
|
<p>As I worked my way up the stack, I used the SATA and AHCI specs
|
||||||
|
as well. They pose the additional complication of splitting things
|
||||||
|
up across multiple specs so you have to go back and forth a lot in
|
||||||
|
non-obvious ways. Hey at least they don’t try to charge you
|
||||||
|
thousands of dollars to get the spec like PCI.</p>
|
||||||
|
<p>I also found that when you needed examples of how to do something
|
||||||
|
specific it can be far better to look at an existing operating
|
||||||
|
system’s approach to help contextualize a specification. Andreas
|
||||||
|
Kling’s SerenityOS was invaluable for this for some low level x86
|
||||||
|
things. I also referenced the Zircon microkernel to figure out how
|
||||||
|
to use C++ templates to downcast capability pointers to their
|
||||||
|
specific objects types without relying on RTTI (run time type
|
||||||
|
information).</p>
|
||||||
|
<h2 id="kernel-implementation-details">Kernel Implementation
|
||||||
|
Details</h2>
|
||||||
|
<p>Ok enough about high level information, ambitions, and goals.
|
||||||
|
Let’s discuss a little bit more about what the actual system can do
|
||||||
|
at this point. I named the kernel Zion because it is another place I
|
||||||
|
love and it is also kind of fun to think of the operating system as
|
||||||
|
everything from (A)cadia down to (Z)ion.</p>
|
||||||
|
<p>This section will frequently reference the source code which is
|
||||||
|
available on my self-hosted <a
|
||||||
|
href="https://gitea.tiramisu.one">gitea</a> or mirrored to <a
|
||||||
|
href="https://github.com/dgalbraith33/acadia">GitHub</a>.</p>
|
||||||
|
<h3 id="low-level-x86-64-stuff">Low-level x86-64 stuff</h3>
|
||||||
|
<p>Because I found setting up paging, the higher half kernel, and
|
||||||
|
getting to long mode to be a pain the first time around, I decided
|
||||||
|
to use the <a
|
||||||
|
href="https://github.com/limine-bootloader/limine">limine
|
||||||
|
bootloader</a> to start the kernel this time around instead of GRUB
|
||||||
|
so I could focus on slightly higher level things. I have ambitions
|
||||||
|
to make the kernel more bootloader-agnostic in the future but for
|
||||||
|
now it is tightly coupled to the limine protocol.</p>
|
||||||
|
<p>On top of the things mentioned above, we use the limine protocol
|
||||||
|
to:</p>
|
||||||
|
<ul>
|
||||||
|
<li>Get a map of physical memory.</li>
|
||||||
|
<li>Set up a higher-half direct map of memory.</li>
|
||||||
|
<li>Find the RDSP.</li>
|
||||||
|
<li>Get a VGA framebuffer from UEFI.</li>
|
||||||
|
<li>Load the 3 init programs that are needed to bootstrap the
|
||||||
|
VFS.</li>
|
||||||
|
</ul>
|
||||||
|
<p>Following boot we immediately initialize the global descriptor
|
||||||
|
table (GDT) and interrupt descriptor table (IDT). The
|
||||||
|
<strong>GDT</strong> is mostly irrelevant for x86-64, however it was
|
||||||
|
interesting trying to get it to work with the sysret function which
|
||||||
|
expects two copies of the user-space segment descriptors to allow
|
||||||
|
returing to 32bit code from a 64 bit OS. Right now the system
|
||||||
|
doesn’t support 32 bit code (and likely never will) so we just
|
||||||
|
duplicate the 64 bit code segment.</p>
|
||||||
|
<p>The <strong>IDT</strong> is fairly straightforward and barebones
|
||||||
|
for now. I slowly add more debugging information to faults as I run
|
||||||
|
into them and it is useful. One of the biggest improvements was
|
||||||
|
setting up a seperate kernel stack for Page Faults and General
|
||||||
|
Protection Faults. That way if I broke memory related to the current
|
||||||
|
stack frame I get useful debugging information rather than an
|
||||||
|
immediate triple fault. I also recently added some very sloppy stack
|
||||||
|
unwind code so I can more easily find the context that the fault
|
||||||
|
occurred in.</p>
|
||||||
|
<p>Finally we also initialize the <strong>APIC</strong> in a
|
||||||
|
rudimentary fashion. The timer is used to trigger scheduling events
|
||||||
|
and we map PCI and PS/2 Keyboard interrupts to appropriate vectors
|
||||||
|
in the IDT.</p>
|
||||||
|
<h3 id="memory-management">Memory management</h3>
|
||||||
|
<p>Memory management seems to be one of those areas where every time
|
||||||
|
I make progress on something I discover about 4 more things I’ll
|
||||||
|
have to do down the line. I’m somewhat happy with the progress I’ve
|
||||||
|
made so far but I still have a lot to read up on and learn -
|
||||||
|
especially relating to caching policies for mapped pages.</p>
|
||||||
|
<p>For <strong>physical memory management</strong> I maintain the
|
||||||
|
available memory regions in two separate linked lists. One list
|
||||||
|
contains single pages for when those are requested, the other
|
||||||
|
contains the large memory regions which are populated during
|
||||||
|
initialization. This design allows us to easily reuse freed pages
|
||||||
|
(using the list of small pages) while still efficiently finding
|
||||||
|
large blocks for things like memory mapped IO (using the list of
|
||||||
|
large pages).</p>
|
||||||
|
<p>The one catch is that to build these linked lists we need an
|
||||||
|
available heap. And to have an available heap we need to be able to
|
||||||
|
allocate a physical memory region for it (and its necessary paging
|
||||||
|
structures). To accommodate this, we initialize a temporary physical
|
||||||
|
memory manager that just takes a hardcoded number of pages from the
|
||||||
|
first memory region and doles them out in sequence. Right now I
|
||||||
|
hardcode the number of necessary pages to exactly the number it
|
||||||
|
needs. This means if I change something that causes more pages to be
|
||||||
|
allocated earlier than they need to be it is obvious because things
|
||||||
|
break.</p>
|
||||||
|
<p>For <strong>virtual memory management</strong> I keep the higher
|
||||||
|
half (kernel) mappings identical in each address space. Most of the
|
||||||
|
kernel mappings are already availble from the bootloader but some
|
||||||
|
are added for heaps and additional stacks. For user memory we
|
||||||
|
maintain a tree of the mapped in objects to ensure that none
|
||||||
|
intersect. Right now the tree is innefficient because it doesn’t
|
||||||
|
self balance and most objects are inserted in ascending order
|
||||||
|
(i.e. it is essentially a linked list).</p>
|
||||||
|
<p>For user space memory structures we wait until the memory is
|
||||||
|
accessed and generates a page fault to actually map it in. In order
|
||||||
|
to map it in we check each paging structure in the higher-half
|
||||||
|
direct map (rather than using a recursive page structure) to ensure
|
||||||
|
it exists, allocating a page table if necessary. All physical pages
|
||||||
|
used for paging structures are freed when the process exits.</p>
|
||||||
|
<p>For <strong>kernel heap management</strong> I wrote a <a
|
||||||
|
href="https://en.wikipedia.org/wiki/Slab_allocation">slab-allocator</a>
|
||||||
|
for relatively small allocations (up to 128 bytes currently). I plan
|
||||||
|
on raising the limit for that as well as adding a buddy allocator
|
||||||
|
for larger allocations in the future but for now there is no need -
|
||||||
|
all of the allocations are 128 bytes or less! Larger allocations for
|
||||||
|
now are done using a linear allocator.</p>
|
||||||
|
<h3 id="scheduling">Scheduling</h3>
|
||||||
|
<p>Right now the scheduling process is very straight forward. Each
|
||||||
|
runnable thread is kept in an intrusive linked list and scheduled
|
||||||
|
for a single time slice in a round robin fashion.</p>
|
||||||
|
<p>Thread can block on other threads, semaphores, or mutexes. When
|
||||||
|
this happens they are flagged as blocked and moved to an intrusive
|
||||||
|
linked list on that object which is responsible for scheduling those
|
||||||
|
threads once the relevant state changes.</p>
|
||||||
|
<p>The context switching code simply dumps all of the registers onto
|
||||||
|
the stack and then writes the stack pointer into the thread
|
||||||
|
structure. It also writes the SSE registers to an allocated space on
|
||||||
|
the thread structure. I believe this code could be made more
|
||||||
|
efficient by only pushing callee-saved registers and using the x86
|
||||||
|
feature that allows you to lazily save the SSE registers only once
|
||||||
|
they are used. However for now I prefer this code be more reliable
|
||||||
|
than efficient (because it scares me and is a PITA to debug).</p>
|
||||||
|
<p>Finally, there are definitely critical sections in the kernel
|
||||||
|
code that are not mutex protected currently. It is on the TODO list
|
||||||
|
to do a good audit of this in preparation for SMP (AcadiaOS 0.2
|
||||||
|
anyone?).</p>
|
||||||
|
<h3 id="interface">Interface</h3>
|
||||||
|
<p>Most system calls the kernel provides either (a) create and
|
||||||
|
return a capability or (b) operate on an existing capability.
|
||||||
|
Capabilities can be duplicated and/or transmitted to other processes
|
||||||
|
using IPC.</p>
|
||||||
|
<p>For syscalls that operate on an existing capability, the kernel
|
||||||
|
checks that the capability exists, that it is of the correct type,
|
||||||
|
and that the caller has the correct permissions on it. Only then
|
||||||
|
does it act on the request.</p>
|
||||||
|
<p>The kernel provides APIs to:</p>
|
||||||
|
<ul>
|
||||||
|
<li>Manage processes and threads.</li>
|
||||||
|
<li>Synchronizes threads using mutexes and semaphores.</li>
|
||||||
|
<li>Allocate memory and map it into an address space.</li>
|
||||||
|
<li>Communicate with other processes using Endpoints, Ports, and
|
||||||
|
Channels.</li>
|
||||||
|
<li>Register IRQ handlers.</li>
|
||||||
|
<li>Manage Capabilites.</li>
|
||||||
|
<li>Print debug information to the VM output.</li>
|
||||||
|
</ul>
|
||||||
|
<h3 id="ipc">IPC</h3>
|
||||||
|
<p>Interprocess communication can be done using Endpoints, Ports, or
|
||||||
|
Channels. <strong>Endpoints</strong> are like servers that can be
|
||||||
|
called and provide a response. For each call a “ReplyPort”
|
||||||
|
capability is generated that the caller can wait for a response on
|
||||||
|
and the server can send its response to. <strong>Ports</strong> are
|
||||||
|
simply one-way streams of messages that don’t expect a response.
|
||||||
|
Example uses are for process initialization information or for IRQ
|
||||||
|
handlers. <strong>Channels</strong> are for bidirectional message
|
||||||
|
passing that I haven’t found a use for and will probably replace in
|
||||||
|
the future with a byte-stream interface.</p>
|
||||||
|
<p>Message that are passed on these interfaces consist of two parts:
|
||||||
|
a byte array, and an array of capabilities. Each capability passed
|
||||||
|
is removed from the existing process and passed along to whichever
|
||||||
|
process receives the request.</p>
|
||||||
|
<p>I’m fairly happy with these interfaces so far and was able to
|
||||||
|
build a user-space IDL (Yunq) on top of them to facilitate message
|
||||||
|
and capability passing. However, I’m concerned about their ability
|
||||||
|
to handle certain concerns. For instance, since endpoints aren’t
|
||||||
|
“owned” by a specific process, it is impossible to tell if you are
|
||||||
|
“shouting into the void” at a process that has crashed or isn’t
|
||||||
|
listening to the specific endpoint anymore.</p>
|
||||||
|
<h2 id="user-space-programs">User Space Programs</h2>
|
||||||
|
<p>There are a few user-space programs that are run on the
|
||||||
|
system:</p>
|
||||||
|
<ul>
|
||||||
|
<li><strong>Yellowstone</strong>: The init process that starts all
|
||||||
|
others and maintains a registry of endpoints. (Because Yellowstone
|
||||||
|
was first).</li>
|
||||||
|
<li><strong>Denali</strong>: A basic AHCI driver to read from disk.
|
||||||
|
(D for disk).</li>
|
||||||
|
<li><strong>VictoriaFallS</strong>: A VFS server with a super simple
|
||||||
|
read-only ext2 implementation. (I couldn’t resist because it has VFS
|
||||||
|
in it).</li>
|
||||||
|
<li><strong>Teton</strong>: A terminal application with a
|
||||||
|
lightweight shell in it (should eventually be split). (T for
|
||||||
|
terminal).</li>
|
||||||
|
<li><strong>Voyageurs</strong>: PS/2 Keyboard driver with the intent
|
||||||
|
of becoming the USB driver. (Idk bytes traveling over USB are making
|
||||||
|
a voyage I guess).</li>
|
||||||
|
</ul>
|
||||||
|
<p>These programs are all bare-bones versions of what they could be
|
||||||
|
in the future. I hope to describe them in further detail in the
|
||||||
|
future, but for now the initialization process works like this.</p>
|
||||||
|
<ol type="1">
|
||||||
|
<li>Yellowstone, Denali, and VictoriaFallS binaries are loaded into
|
||||||
|
memory as modules by the bootloader.</li>
|
||||||
|
<li>The kernel loads and starts the Yellowstone process, passing it
|
||||||
|
memory capabilities to the Denali and VictoriaFallS binaries.</li>
|
||||||
|
<li>Yellowstone starts Denali and waits for it to register
|
||||||
|
itself.</li>
|
||||||
|
<li>Yellowstone reads the GPT and then starts VictoriaFallS on the
|
||||||
|
correct partition and waits for it to register itself.</li>
|
||||||
|
<li>Yellowstone then reads the /init.txt file from the disk and
|
||||||
|
starts each process specified (one per line) in succession.</li>
|
||||||
|
</ol>
|
||||||
|
<h2 id="yunq-idl">Yunq IDL</h2>
|
||||||
|
<p>As I began writing system services, I found a huge speed bump was
|
||||||
|
creating client and server classes for the service. I started by
|
||||||
|
just passing structs as a byte array and hardcoding whether or not
|
||||||
|
the process expected to receive a capability with the call. This
|
||||||
|
approach worked but was painful and led to me dreading each new
|
||||||
|
service I added to the system (not how it should be for a
|
||||||
|
microkernel architecture!). Additionally I did things like avoiding
|
||||||
|
repeated fields or strings fields that weren’t possible to pass in a
|
||||||
|
single struct.</p>
|
||||||
|
<p>It was clear I needed some sort of IDL to handle this, but for
|
||||||
|
months I waffled on it as I tried to figure out how to incorporate
|
||||||
|
an existing one into the system. That didn’t work for two reasons.
|
||||||
|
First, we need a way to pass capabilities with the messages. These
|
||||||
|
kind of need to be sidechanneled because the kernel can’t just treat
|
||||||
|
them as another string of bytes (they have to be moved into the
|
||||||
|
other processes capability space). Second, existing serialization
|
||||||
|
libraries tend to have dependencies, so porting them would require
|
||||||
|
porting those dependencies first. Granted, some of them just require
|
||||||
|
super basic things like say a libc implementation - but we don’t
|
||||||
|
even have that yet. All that to say I ended up writing my own.</p>
|
||||||
|
<p>I was pleasantly surprised with how straightforward it ended up
|
||||||
|
being. I think it took me about 3 coding sessions to get the basic
|
||||||
|
parsing and codegen going for the language. It still doesn’t have
|
||||||
|
all of the features I planned for it (like nested messages), but it
|
||||||
|
works super well for setting up new services quickly and easily.
|
||||||
|
Currently the implementation is in python because I wanted to get
|
||||||
|
something working quickly, but I’ll probably reimplement it in a
|
||||||
|
compiled language in the future with a focus on better error
|
||||||
|
information.</p>
|
||||||
|
<h2 id="closing-thoughts">Closing thoughts</h2>
|
||||||
|
<p>Overall, I’m very pleased with how this project has turned out. I
|
||||||
|
feel like I’ve definitely accomplished my goal to learn more about
|
||||||
|
how operating systems are actually implemented. It has been cool to
|
||||||
|
be able to pull back the curtain and see some of the simple
|
||||||
|
primitives that underlay the complex features of an operating
|
||||||
|
system.</p>
|
||||||
|
<p>I aim to continue forward with this project - without throwing
|
||||||
|
out the code again as I did earlier this year. I’m happy with the
|
||||||
|
base and look to iterate on it, hopefully building something more
|
||||||
|
useful in the future but definitely learning more along the way.</p>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
Binary file not shown.
After Width: | Height: | Size: 268 KiB |
|
@ -7,6 +7,10 @@ body {
|
||||||
margin: auto;
|
margin: auto;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.date {
|
||||||
|
font-style: italic;
|
||||||
|
}
|
||||||
|
|
||||||
img {
|
img {
|
||||||
max-width: 100%;
|
max-width: 100%;
|
||||||
max-height: 500px;
|
max-height: 500px;
|
||||||
|
|
Loading…
Reference in New Issue