windows heap exploitation (win2ksp0 through winxpsp2) original cansecwest 04 presentation: matt...
TRANSCRIPT
Windows Heap Exploitation(Win2KSP0 through WinXPSP2)
Original CanSecWest 04 Presentation: Matt Conover & Oded Horovitz
XP SP2 Additions added/presented, Matt Conover @ SyScan 2004
Agenda “Practical” Windows heap internals How to exploit Win2K – WinXP SP1 heap overflows 3rd party (me ) assessment of WinXP SP2
improvements How to exploit WinXP SP2 heap overflows Summary
Windows Heap Internals
Many heaps can coexist in one process (normally 2-3)
PEB
2nd Heap2nd Heap
Default HeapDefault Heap
0x0010 Default
Heap
0x0080 Heaps
Count
0x0090 Heap List
0x70000
0x170000
Windows Heap InternalsImportant heap structures
SegmentsSegments
Lookaside List
Segment List
Free Lists
Virtual Allocation list
Windows Heap InternalsIntroduction to Free Lists
128 doubly-linked list of free chunks (from 8 bytes to 1024 bytes)
Chunk size is table row index * 8 bytes Entry [0] is a variable sized free lists contains
buffers of 1KB <= size < 512KB, sorted in ascending order
0
1
2
3
4
5
6
1400 2000 2000 2408
16 16
48 48
Windows Heap InternalsLookaside Table
Used for “fast” allocates and deallocates when available
Starts empty 128 singly-linked lists of busy chunks (free but left
marked as busy)0
1
2
3
4
5
6
16
48 48
Windows Heap InternalsWhy have lookasides at all? Speed!
Singly-linked Used to quickly allocate or deallocate No coalescing (leads to fragmentation) So the lookaside lists “fill up” quickly (4 entries)
Windows Heap InternalsBasic chunk structure – 8 Bytes
Previous chunk size
Self SizeSegment
IndexFlags
Unusedbytes
Tag index(Debug)
Overflow direction
0 1 2 3 4 5 6 7 8
01 – Busy02 – Extra present04 – Fill pattern08 – Virtual Alloc10 – Last entry20 – FFU140 – FFU280 – No coalesce
Windows Heap InternalsFree chunk structure – 16 Bytes
Previous chunk size
Self SizeSegment
IndexFlags
Unusedbytes
Tag index(Debug)
0 1 2 3 4 5 6 7 8
Next chunk Previous chunk
Windows Heap InternalsAllocation algorithm (high level)
If size >= 512K, virtual memory is used (not on heap) If < 1K, first check the Lookaside lists. If there is no free
entries on the Lookaside, check the matching free list If >= 1K or no matching entry was found, use the heap
cache (not discussed in this presentation). If >= 1K and no free entry in the heap cache, use
FreeLists[0] (the variable sized free list) If still can’t find any free entry, extend heap as needed
Windows Heap InternalsAllocate algorithm – FreeLists[0] This is usually what happens for chunk sizes > 1K FreeLists[0] is sorted from smallest to biggest Check if FreeLists[0]->Blink to see if it is big enough
(the biggest block) Then return the smallest free entry from free list[0] to
fulfill the request, like this:
While (Entry->Size < NeededSize)
Entry = Entry->Flink
Windows Heap InternalsAllocate algorithm – Virtual Allocate Used when ChunkSize > VirtualAlloc threshold
(508K) Virtual allocate header is placed on the beginning of
the buffer Buffer is added to busy list of virtually allocated
buffers (this is what Halvar’s VirtualAlloc overwrite is faking)
Windows Heap Internals
Free Algorithm (high level)• If the chunk < 512K, it is returned to a lookaside or
free list• If the chunk < 1K, put it on the lookaside (can only
hold 4 entries)• If the chunk < 1K and the lookaside is full, put it on
the free list• If the chunk > 1K put it on heap cache (if present) or
FreeLists[0]
Windows Heap InternalsFree Algorithm – Free to Lookaside
Free buffer to Lookaside list only if: The lookaside is available (e.g., present and unlocked) Requested size is < 1K (to fit the table) Lookaside is not “full” yet (no more than 3 entries already)
To add an entry to the Lookaside: Put to the head of Lookaside Point to former head of Lookaside Keep the buffer flags set to busy (to prevent coalescing)
Windows Heap InternalsFree Algorithm – Coalesce
BA C
A C
A
A + B Coalesced
Step 2: Buffer removed from free list
Step 3: Buffer removed from free list
Step 4: Buffer placed back on the free list
A + B + C Coalesced
Step 1: Buffer free
Windows Heap InternalsFree Algorithm – Coalesce
Where coalesce cannot happen: Chunk to be freed is virtually allocated Chunk to be freed will be put on Lookaside Chunk to be coalesced with is busy Highest bit in chunk flags is set …
Windows Heap InternalsFree Algorithm – Coalesce (cont)
Where coalesce cannot happen: Chunk to be freed is first no backward coalesce Chunk to be freed is last no forward coalesce The size of the coalesced chunk would be >= 508K
Windows Heap Internals
Summary – Questions?
Just remember:• Lookasides are allocated from and freed to before free lists• FreeLists[0] is mainly used for 1K <= ChunkSize < 512K• Coalescing only happens for entries going onto FreeList, not
lookaside list• Entries on a certain lookaside will stay there until they are
allocated from
Heap Exploitation: Basic Terms
4-byte Overwrite
Able to overwrite any arbitrary 32-bit address (WhereTo) with an arbitrary 32-bit value (WithWhat)
4-to-n-byte Overwrite
Using a 4-byte overwrite to indirectly cause an overwrite of an arbitrary-n bytes
Arbitrary Memory Overwrite Explained
Coalesce-On-Free 4-byte Overwrite Utilize coalescing algorithms of the heap This is the method first discussed by Oded and I at CSW04 – it is our
preferred method for reliable heap exploitation on all versions < XPSP2 Just make sure to fill the Lookaside[ChunkSize] (put 4 entries on heap)
before freeing a chunk of ChunkSize to ensure coalescing Arbitrary overwrite happens when the overflowed buffer gets freed
Index< 64
Flags!= 1
Fake Flink (WithWhat) Fake Blink (WhereTo)
Overflowstart
Arbitrary Memory Overwrite
Lookaside List Head Overwrite:4-to-n-byte overwrite What we want to do is overwrite a Lookaside list
head and then allocate from it We must be the first one to allocate that size We will get a chunk back pointing to whatever
location in memory we want Use this to overwrite a function pointer or put the
shellcode at a known writable location
Arbitrary Memory Overwrite
Lookaside List Head Overwrite: How To• Use the Coalesce-on-Free Overwrite, with these values:
• FakeChunk.Blink = &Lookaside[ChunkSize] where ChunkSize is a pretty infrequently allocated size
• FakeChunk.Flink = what we want a pointer to
• To calculate the FakeChunk.Blink value:• LookasideTable = HeapBase + 0x688• Index = (ChunkSize/8)+1• FakeChunk.Blink = LookasideTable + Index * EntrySize (0x30)
• Set FakeChunk.Flags = 0x20, FakeChunk.Index = 1-63, FakeChunk.PreviousSize = 1, FakeChunk.Size = 1
Exploition Made SimpleOverwrite PEB lock routine to point to PEB space
Put shellcode into PEB space
Then cause the PEB lock routine to execute
PEBHeader
~1k of payload
PEB lock/unlock function pointers0x7ffdf020, 0x7ffdf024
0x7ffdf130
Exploitation Made Simple
Win2K through WinXP SP1 in a single attempt: First 4-byte overwrite:
Blink = 0x7ffdf020, Flink = 0x7ffdf154
4-to-n-byte overwrite: Blink = &Lookaside[(n/8)+1] Flink = 0x7ffdf154
Be the first to allocate n bytes (cause HeapAlloc(n)): Put your shellcode into the returned buffer
All done! Either wait, or cause a crash immediately: For example, do 4-byte overwrite with Blink =
0xABABABAB
Exploitation Made Simple
Forcing Shellcode To Run Most applications (read: everyone but MSSQL) don’t
specially handle access violations An access violation results in ExitProcess() being
called Once the process attempts to exit, ExitProcess() is
called The first thing ExitProcess() does is call the PEB lock
routine Thus, causing crash = instant shellcode execution
Nice
Exploitation Made Simple
Demo
Heap Exploitation
Questions?
This technique we just covered is very reliably, providing success almost every time on all Win2K (all service packs) and WinXP (up to SP2)
On to XP SP2….
XP Service Pack 2
Effects on Heap Exploitation
New low fragmentation heap for chunks >= 16K
PEB “shuffling” (aka randomization)
New security cookie in each heap chunk
Safe unlinking: (usually) stops 4-byte overwrites
XP Service Pack 2
PEB Randomization In theory, it could have a big impact on heap
exploitation – though not in reality
Prior to XP SP2, it used to always be at the highest page available (0x7ffdf000)
The first (and ONLY the first) TEB is also randomized
They seem to never be below 0x7ffd4000
XP Service Pack 2PEB Randomization – Does it make any difference?
Not much, randomization is definitely a misnomerIf 2 threads are present:
We can write to 0x7ffdf000-0x7ffdffff, and2 other pages between 0x7ffd4000-0x7ffdefff
If 3 threads are present: 0x7ffde000-0x7ffdffff2 other pages between 0x7ffd4000-0x7ffdefff
…If 11 threads are present:
100% success, no empty pages
XP Service Pack 2PEB Randomization – Summary
Provides little protection for…
Any application that have m workers per n connections (IIS? Exchange?)
Any service in dllhost/services/svchost or any other “active” surrogate process
XP Service Pack 2Heap header cookie
Previous chunk size
Self SizeSegment
IndexFlags
Unusedbytes
Tag index(Debug)
0 1 2 3 4 5 6 7 8
Previous chunk size
Self SizeNew
CookieFlags
Unusedbytes
SegmentIndex
XP SP2Header
CurrentHeader
*reminder: overflow direction
XP Service Pack 2
Heap header cookie calculation If ((AddressOfChunkHeader / 8) XOR Chunk-
>Cookie XOR Heap->Cookie != 0) CORRUPT
Since the cookie has only 8-bits, it has 2^8 = 256 possible keys
We’ll randomly guess the security cookie, on average, 1 of every 256 attempts
XP Service Pack 2 On the normal WinXP SP2 system, corrupting a
chunk will do nothing
Since we only overwrite the Flink/Blink of the chunk, we corrupt no other chunks
Thus we can keep trying until we run out of memory
XP Service Pack 2Summary so far…
At this point, we see that we can with enough time trivially defeat all the other protection mechanisms.
On to “safe” unlinking…
XP Service Pack 2Safe Unlinking Safe unlinking means that RemoveListEntry(B) will make this
check:
(B->Flink)->Blink == B && (B->Blink)->Flink == B
In other words:
C->Blink == B && A->Flink == B
Can it be evaded? Yes, in one particular case.
A B C
Header to free
XP Service Pack 2UnSafe-Unlinking FreeList Overwrite Technique
p = HeapAlloc(n);
FillLookaside(n);
HeapFree(p);
EmptyLookaside(n);
Overwrite p[0] (somewhere on the heap) with:
p->Flags = Busy (to prevent accidental coalescing)
p ->Flink = (BYTE *)&ListHead[(n/8)+1] - 4
p ->Blink = (BYTE *)&ListHead[(n/8)+1] + 4
HeapAlloc(n); // defeats safe unlinking (ignore result)
p = HeapAlloc(n); // defeats safe unlinking
// p now points to &ListHead[(n/8)].Blink
XP Service Pack 2Defeating Safe Unlinking (before overwrite)
[0] Flink
[4] BlinkListHead[n]
[4] Blink
[0] FlinkFreeChunk
ListHead[n+1] [0] Flink
ListHead[n-1] [4] Blink
XP Service Pack 2Defeating Safe Unlinking: Step 1 (Overwrite)
[0] Flink
[4] BlinkListHead[n]
[4] Blink
[0] FlinkFreeChunk
ListHead[n+1] [0] Flink
ListHead[n-1] [4] Blink
Now call HeapAlloc(n) to unlink FreeChunk from ListHead
FreeChunk->Blink->Flink == *(*(FreeChunk+4)+0)
FreeChunk->Flink->Blink) == *(*(FreeChunk+0)+4)
Both point to FreeChunk, unlink proceeds!
XP Service Pack 2Defeating Safe Unlinking: Step 2 (1st alloc)
[0] Flink
[4] BlinkListHead[n]
ListHead[n+1] [0] Flink
ListHead[n-1] [4] Blink
FreeChunk->Blink->Flink = FreeChunk->Flink
FreeChunk->Flink->Blink = FreeChunk->Blink
Returns pointer to previous FreeChunk
XP Service Pack 2Defeating Safe Unlinking: Step 3 (2nd alloc)
[0] Flink
[4] BlinkListHead[n]
ListHead[n+1] [0] Flink
ListHead[n-1] [4] Blink
Returns pointer to &ListHead[n-1].Blink
Now the FreeLists point to whatever data the user puts in it
XP Service Pack 2Questions?
XP Service Pack 2
Unsafe-Unlinking FreeList Overwrite Technique For vulnerabilities where you can control the
allocation size, safe unlinking can be evadable. But is this reliable? Hardly. …
XP Service Pack 2
Unsafe-Unlinking FreeList Overwrite Technique (cont) We have to flood the heap with this repeating 8 byte
sequence:
[FreeListHead-4][FreeListHead+4]
And hope the Chunk’s Flink/Blink pair is within the range we can overflow
But there is an even easier method…
XP Service Pack 2Chunk-on-Lookaside Overwrite Technique In fact on XP SP2, there is an even easier method Lookasides lists take precedence over free lists This is quite convenient because… Lookaside lists (singly linked) are easier to exploit
than the free lists (doubly linked)
XP Service Pack 2Chunk-on-Lookaside Overwrites HeapAlloc checks the lookaside before the free list
There is no check to see if the cookie was overwritten since it was freed
It is a singly-linked list, thus the safe unlinking check doesn’t apply
Result: a clean exploitation technique (albeit with brute-forcing required)
XP Service Pack 2Chunk-on-Lookaside Overwrites (Technique Summary)
// We need at least 2 entries on lookaside
a_n[0] = HeapAlloc(n)
a_n[1] = HeapAlloc(n)
HeapFree(a_n[1])
HeapFree(a_n[0])
Overwrite a_n[0] (somewhere on the heap) with:
a_n[0].Flags = Busy (to prevent accidental coalescing)
a_n[0].Flink = AddressWeWant
HeapAlloc(n) // discard, this returns a_n[0]
p = HeapAlloc(n)
p now points to AddressWeWant
XP Service Pack 2Chunk-on-Lookaside Overwrite - Success rate?
Reqiures overwriting a chunk already freed to the lookaside
If an attacker overflows a buffer repeatedly, how often will he/she need to before succeeding?
XP Service Pack 2Chunk-on-Lookaside Overwrite – Empirical results
64K heap with 1 segment All chunk sizes sizes between 8-1024 bytes Max overflow size = 1016 bytes Random number of allocs between 10-1000 Free probability of 50%
Took an average of 84 allocations to be within overflow range
It will take at least 2 overwrites (one to overwrite a function pointer, one to place shellcode)
XP Service Pack 2Chunk-on-Lookaside Overwrite – Empirical results Application specific function pointer and writable
location for shellcode: 84*2 = 168 attempts to execute shellcode
Using PEB lock routine + PEB space (application generic): 84*2*12=2,016 attempts to execute shellcode The 12 is for the 12 possible locations of the PEB due to
PEB randomization
XP Service Pack 2Chunk-on-Lookaside Overwrite – Summary To exploit a non-application specific heap exploit will
take 2000+ attempts to do it reliably But now ask yourself… how long does it take
generate 2000 heap overwrite attempts? Lets be overly conservative and assume 5 minutes That will really slow down a worm… But will it help you if someone is specifically trying to
hack your machine?
XP Service Pack 2Low Fragmentation Heap (LFH) • Looks really solid… kudos to its author • Uses 32-bit cookie• Obscures address of Lookaside list heads:
ChunkSizes = *((DWORD *)Chunk) // (ChunkSize<<16|PrevChunkSize)
pLookasideEntry = (DWORD)Chunk / 8
pLookasideEntry ^= Lookaside->Key
pLookasideEntry ^= ChunkSizes
pLookasideEntry ^= RtlpLFHKey• …
XP Service Pack 2Low Fragmentation Heap (LFH)• The RtlpLFHKey is a “show stopper”:
push eax
call _RtlRandomEx@4
mov _RtlpLFHKey, eax
lea eax, [ebp+var_4]
push eax
call _RtlRandomEx@4
imul eax, _RtlpLFHKey
push esi
mov _RtlpLFHKey, eax
• …
XP Service Pack 2
Low Fragmentation Heap (LFH)• Must be enabled manually (via NTDLL!
RtlSetHeapInformation or KERNEL32!HeapSetInformation)
• It is used for chunks < 16K
• It is not used by anything on XP SP2 Professional
• What irony
SummaryWin2K – WinXP SP1
Fixed heap base and fixed PEB allow for writing very stable exploits
Overwriting FreeList/Lookaside list heads gives us the ability to overwrite any writable address with 1K of data
SummaryWinXP SP2 Decreases reliability (more bruteforcing is necessary) But with enough time, exploitation will still succeed XP SP2 will really slow worm propagation, but not
help a targeted victim ...
SummaryWinXP SP2 Heap corruption handling is weak PEB randomization is weak Safe unlinking is evadable Non-LFH cookie checks are weak LFH looks good
SummarySolutions Use low fragmentation heap by default
Just be sure it is the lowest address on the heap
Expand PEB randomization over 1MB or so Most machines have 1GB+ RAM these days
Inform user if heap corruption exceeds a threshold If I have an application with 50 corrupt chunks in 60
seconds, I want to know someone is owning me
Check security cookies on allocation also
SummaryThe eventual death of 4 byte overwrites…
Whether an attacker can predict the ChunkSize/PrevSize or not, he/she won’t be able to predict a larger security cookie (like LFH has).
Heap exploits will focus more on attacking application data on the heap (not the heap itself)