31c3 presentation - virtual machine introspection
TRANSCRIPT
Agenda
1. Motivation
2. What is VMI + Xen + demos– Isolation– Interpretation– Interposition
3. Cloud security + open problems– Kernel code integrity
4. Conclusion
Our motivation
● Malware collection● Malware analysis● Intrusion detection● Intrusion prevention● Stealthy debugging● Cloud security● Mobile security
Common approaches
● In-guest agents– Easy to implement– No isolation
● Network monitor– Has isolation– Limited or no context
● Scan VM disk and memory– Has isolation and context– Passive view
What is VMI
● Reconstruct high-level state information from hardware view– Similar concept to deep packet inspection
● Isolation– Increased resiliency
● Interpretation– Complete view of the system
● Interposition– Control over the execution flow
Isolation
● Reconstruct state without executing code in the guest– Avoid in-guest hooks
● Tampering is hard(er)– Hypervisor based isolation– Increased trust in the code
● Performance gain– No “anti-virus storm”
Interpretation
● Understand virtual hardware– Heavy focus on memory– vCPU, disk & network
● Reconstruction of state is hard and error-prone– Complexity– Requires lots of reverse engineering for Windows
● What data can we trust?
Paging woes
● x86, x86+PS, x86+PAE, IA32E
● Bits 9-11 of the PTE are available bits for the OS to use as it sees fit
● Paging is different for Windows and Linux!– if (entry & (1 << 11)) and not entry & (1 << 10)
● Accurate reconstruction is complex
Paging woes
● Paged out memory– Find and interpret swap file on disk– Semantics of the swap file defined by the OS– More complexity
● Xen to the rescue– Inject page faults via the hypervisor!– Memory pages can be transparently brought back– Takes time– Available starting with Xen 4.5
Paging woes
● Forensics tools need to scan for potential page tablesoverlay = {'VOLATILITY_MAGIC': [ None, {
'DTBSignature' : [ None, ['VolatilityMagic',
dict(value = "\x03\x00\x1b\x00") ]], …
● This is actually a signature for a process● With VMI we can just use the CR3 from the vCPU
Mapping the kernel
● Requires debug data● Microsoft gives easy access to it
– Proprietary format– Has been reverse engineered– Rekall nicely dumps it into JSON format
● Linux is more problematic– No cross-distro central repository available– Fedora DarkServer is a step in the right direction
Scanning woes
● Scanning for the kernel, processes, files, etc.– Relying on pool tag headers– 4-byte description (KDBG, Proc, File, etc.)– Meta-information about type of kernel heap allocation
● Partial structures, old structures, false positives
– Memory doesn't get reset to zero on free– More heuristics needed to validate hits– More heuristics = more complexity = more fragile
Anti-forensics
● 2012: One-byte Modification for Breaking Memory Forensic Analysis
● 2014: ADD - Complicating Memory Forensics Through Memory Disarray
Fundamental problems with trusting data!● Scanning for weak signatures● Inconsistent memory state
Interposition
● Coherent view into the system● Avoid scanning
– KDBG scan can be avoided:vCPU0 FS/GS → _KPCR → Kernel!
– Heap allocations can be monitored directly
● Native support built into Xen, no custom patching!– Designed for debugging– Relatively unknown and undocumented feature
Tracing on Xen
● 4 types of events currently supported (Intel only)– MOV-TO-CR0/CR3/CR4– Debugging breakpoints (INT3)– Extended Page Table (EPT) violations– Monitor Trap Flag (MTF) singlestepping
● The hardware would support many more– See full list in Intel SDM 3c 25.1.3
LibVMI – http://libvmi.com
● Hypervisor agnostic C library– Xen, KVM & raw memory dump support– x86, x86+PS, x86+PAE, IA32E, ARM– Windows and Linux guest support– Python interface– Write code once, deploy on all supported hypervisors– Read & write memory– Wrapper around Xen events!– LGPL
Malware files in memory
● Write-caching buffers writes to the disk
● Temporary files only ever present in memory!– Common for malware droppers
● Have to catch the delete event or the memory could get recycled
● Interposition is critical–
DRAKVUF – http://drakvuf.com
● Agentless dynamic malware analyzer– GPLv2– Built on Xen, LibVMI, Volatility & Rekall
● No in-guest agents = no in-guest artifacts– Malware sample started externally– Monitor both user- and kernel-mode malware– Extract ephemeral artifacts
VMIDBG
● Fresh out of the oven!– GDB integration!– https://github.com/Zentific/vmidbg
● Stealthy debugging– Currently supports Linux– WinDBG integration next!– Check the readme ;)
Cloud security
● Separate security policy for each cloud user● Start live monitoring before compromise happens
– Baseline of integrity
● Monitor and protect critical components & paths– IDT, GDT, SSDT, In-line hooks, etc.
● Trust data that is essential– If modifying it means it crashes the system, malware
(probably) won’t touch it
But wait..
● Can we really trust any data?
● Hardware reports incomplete trap information– Read-modify-write (fixed in software in Xen 4.5)
● The Tagged Translation Lookaside Buffer!– Intel VPID & AMD ASID (2008)– Entries only visible from within the same context– The page tables don’t necessarily represent what
translation the guest actually uses
The tagged TLB
● Unique tag for each vCPU so cached entries survive VMEXIT/VMENTRY
● A rootkit in the guest could muck with the pagetables after a translation is cached– Only in-guest code can access the cache!
● Effectiveness varies based on hypervisor and OS– Xen assigns new tag on each guest MOV-TO-CR3– Windows 7 frequently flushes global pages
Cloud security
● No need to move everything outside
● Secure in-guest agents– Better performance, better visibility– Hybrid approach– Hardware support coming: Intel #VE
● Alternative approaches– Reduce the size of the guest system– MirageOS, NetBSD rumpkernels, OSv
Secure in-guest kernel
● Blacklist approach– Deny malicious changes– Need to enumerate all possible things that could go wrong
● Whitelist approach– Allow only verified changes– Need to enumerate all valid changes that we want to allow
Kernel Code Integrity
● Linux employs run-time code self-patching● Need to differentiate between legitimate and
malicious changes● Load-time patching (architecture specific)
– Load-time patching can be validated during load time
● Run-time patching eg. for hardware hot-plugging– Run-time patching has to be validated continuously
Kernel run-time patching – Examples
● SMP Locks – vCPU hot-plug– Locks needed only if more than one vCPU is present– Could be used to replace entire functions
● Jump Labels– In a certain software state some branches are very unlikely– Jumps are patched out if a function is not required– The code must be consistent with the systems state
Simple Validation Approaches
● Lock the kernel– Deny all changes to the code at run-time– Disables legitimate run-time patching
● Hash the kernel– White-list all known kernel states– Number of hashes to maintain grows unbounded– Separating code from data is non-trivial– Weird memory optimizations
● Linux kernel has portions of its large code pages also mapped into user-space processes
Trap and validate using VMI
● Only allow patching on predefined code locations– For these locations the patching mechanism is known– Patches can be retraced and understood– The patch must match the systems state
● Code patching is not an atomic operation– System needs to be aware about the intermediate states
● Trap write events to kernel code– Validate that the current change is not malicious
Summary
● VMI supports a wide spectrum of applications– Isolation, Interpretation, Interposition– Balance depends on your use-case– Pure VMI is not a requirement for all cases– Hardware support is improving
● Tools are open-source!– LibVMI http://libvmi.com– DRAKVUF http://drakvuf.com
● Find us– #libvmi on Freenode | @tklengyel | [email protected]
Acknowledgement
● Zentific: Steve, Matt & Russ● LibVMI: Bryan & Tony● Volatility: MHL, moyix, Andrew & gleeda● Rekall: Scudette● DARPA: Mudge● TUM: George, Sebastian & Jonas
Food for thought
Run-time trust may “require deep changes to the way we construct systems, as we would need a way to tie its low-level implementation to a high-level semantic description of its runtime behavior in a way that is verifiable (it would do no good if programs could claim malicious behavior was something innocent). We can summarize this research path by asking whether we can ensure that a program
1) says what it does; 2) does what it says; and 3) can prove it.”
Brendan Dolan-Gavitt
Appendix
● https://randomascii.wordpress.com/2013/02/20/symbols-on-linux-part-three-linux-versus-windows/
● https://archive.org/details/ShmooCon2014_ADD_Complicating_Memory_Forensics_Through_Memory_Disarray
● http://volatility-labs.blogspot.com/2014/02/add-next-big-threat-to-memory.html
● http://www.rekall-forensic.com/posts/2014-02-19-profiles.html● http://www.rekall-forensic.com/posts/2014-02-21-do-we-need-
kdbg.html● http://www.rekall-forensic.com/posts/2014-10-25-pagefile.html
Appendix
● http://www.sec.in.tum.de/assets/Uploads/acsacmmfkittel.pdf● http://www.sec.in.tum.de/assets/Uploads/scalability-fidelity-
stealth.pdf● http://www.sec.in.tum.de/assets/Uploads/pitfalls-virtual-
machine.pdf● http://www.sec.in.tum.de/assets/Uploads/lengyelshcis2.pdf● http://www.sec.in.tum.de/assets/Uploads/virtual-machine-
introspection.pdf● http://xenproject.org/help/presentations-and-
videos/video/latest/xpus14-vm-security-on-the-outside-looking-in-by-steven-maresca-of-zentific.html