using algorithms to brute force algorithms...a journey through time and namespace
Post on 25-Jul-2015
3.531 Views
Preview:
TRANSCRIPT
Using Algorithms to Brute Force Algorithms
… a journey through time and namespace
Anthony Kasza Bsides Chicago 2015
Audience Participation: Answer a question, win a prize
Audience Participation: What is an algorithm?
algorithm noun Word used by programmers when they do not want to explain what they did.
[12]
Outline Background Malware Communications and Botnet Architectures Analyzing Domain Generation Algorithms Ramnit Ramnit’s DGA Brute Force Identification of Ramnit DGA Seeds Results Graphs Applications and Improvements
Me Anthony Kasza Security Researcher: OpenDNS @anthonykasza github.com/anthonykasza
Background
Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet
[10]
Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now?
[10]
Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home
[10]
Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home Botnets are resilient cloud based, often distributed, remote administration systems [10]
Audience Participation: Name a malware
Malware Communications: IP Open socket Beacon to IP address Easy to set up Easy to take down
Client
Implant
C2 Server
Client
Implant
Client
Implant
Malware Communications: P2P Open socket Beacon to super node peer(s) Very resilient
Peer consensus issues Complex to set up
Super node
Super node
Super node
Super node
[9]
Client
Implant
Client
Implant
Client
Implant
Malware Communications: DNS Open socket Issue DNS query
Client
Implant
C2 Server
Client
Implant
Client
Implant
DNS Resolver
Malware Communications: DNS Open socket Issue DNS query Open socket Beacon to IP address Relatively easy to set up Relatively easy to take down
Client
Implant
C2 Server
Client
Implant
Client
Implant
DNS Resolver
Audience Participation: Name a botnet that uses DNS
Malware Communications: DNS Resiliency Tricks
Fast Flux – DNS A records change quickly Double Flux – DNS A and NS records change quickly Domain Generation Algorithms (DGA) – C2 domain names are generated dynamically by a deterministic function within the implant at run time. Samples are "strings proof"
How To DGA
Client
DGA
Date Seed
Hash/PRNG
String TLD set
Domain name
Lexicon
query connect to IP
NXD
A
Start
End
Example DGA Output
vfxlsatformalisticirekb[.]com rd0ee55073a3776810962c124f02a99424[.]ws croialotvvnfliyjmvt[.]ru yxjsibeugmmj[.]in osghqrdmlyhh[.]net easebrainjobmarket[.]com
Malware Communications: DGA - Function that generates
domain names - Shared secret between
botnet implants and operators
- Often incorporates the date Operator registers domain “just in time” before the implant generates it
[3]
Client
Implant
Registrar Operator
DNS Resolver
C2 Server
Malware Communications: DGA - Function that generates
domain names - Shared secret between
botnet implants and operators
- Often incorporates the date Registrar ensures the domain is inserted into the DNS
[3]
Client
Implant
Registrar Operator
DNS Resolver
C2 Server
Malware Communications: DGA - Function that generates
domain names - Shared secret between
botnet implants and operators
- Often incorporates the date Implant generates and resolves the domain [3]
Client
Implant
Registrar Operator
DNS Resolver
C2 Server
Malware Communications: DGA - Function that generates
domain names - Shared secret between
botnet implants and operators
- Often incorporates the date Implant connects to C2 IPv4
[3]
Client
Implant
Registrar Operator
DNS Resolver
C2 Server
Malware Communications: DGA - Function that generates
domain names - Shared secret between
botnet implants and operators
- Often incorporates the date Repeat: Operator is constantly registering domain names
[3]
Client
Implant
Registrar Operator
DNS Resolver
C2 Server
Audience Participation: Name a malware that uses a DGA
Malware that uses a DGA Banjori DirCrypt Dyre GameoverZeus Hesperbot Matsnu Necurs Pushdo Pykspa Qakbot Ramnit Shiotob
Simbda/Shiz Symmi TinyBanker Bedep Emotet Gozi Nymaim Suppobox Urlzone VolatileCedar Cryptolocker Conficker
Murofet BankPatch Bobax Ramdo Flashback Kelihos Rovnix Torpig Many more…
[5]
Each DGA is Special Snowflake Conficker.C – generated 50k names per day Pushdo – DGA as a backup if C2 domain went down Kelihos – DGA as a backup if P2P network went down
newGOZ DGA domains… registered through a few common registrars typically registered 1hr before algo would generate them changed NS domains but reused NS IPv4s
[4] [11]
DGA Domain Query Periods Dyre
Ramnit
Matsnu
Pykspa
Bedep
~1 day
N/A
~2 weeks
~3 weeks
~1 week
Generalized DGA pseudo code… for i in domain_set_size: domain = generate_domain(date, magic) resolve domain if domain resolves contact domain StopIteration def generate_domain(date, magic): domain = '' for i in lexicon_item_count: item = random_select(lexicon, magic) domain = domain + item domain = domain + random_select(tld_set, magic) return domain
Generalized Algorithms Analyses Domain set size How many domains to generate Date Today's date
Seed A number used to ignite a PRNG
Salt A magic number or campaign ID
Lexicon A set of letters, n-grams, or words
Lexicon Items Count Number of items to use from lexicon
TLD set All possible TLDs
MD*, SHA*, Etc Some hash
PRNG Random numbers
Bitwise Math xor, shl/shr, mod, b64, ascii to hex
Names to contact These are often regex-able due to properties of the transformation function
Inputs Functions Outputs
An Algorithm Taxonomy from Inputs
Group Lexicon Domain set size
Salt/Seed
Date Examples
A LeNers Yes Yes Yes Necurs, GOZ, Symmi, Tinba, Pykspa
B LeNers Yes Yes No Ramnit, DirCrypt, VolaVleCedar, Ramdo
C.i LeNers Yes No Yes Conficker, Dyre, Cryptolocker, Pushdo, Qakbot
C.ii Words Yes No Yes Matsnu, Rovnix
Enter Ramnit
Audience Participation: Tell me anything about Ramnit
Ramnit Malware Worm/RAT Emerged 2010 “Borrowed” features from Zeus source 2011 Spread via EK, social media, bundled software, etc Uses a DGA
[7]
Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain
[1]
Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus
r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain [1]
Ramnit DGA Pseudo Code
Client
DGA
Seed uint32
LCG PRNG
string + ".com"
Domain Name
Lexicon [a-‐y]{8,19}
query connect to IP
NXD
A
Ramnit DGA Pseudo Code
Unknowns 1. Linear congruential
generator’s seed 2. How many times this
loop occurs
Client
DGA
Seed uint32
LCG PRNG
string + ".com"
Domain Name
Lexicon [a-‐y]{8,19}
query connect to IP
NXD
A
Brute Forcing Ramnit DGA Seeds Inputs: domain_set_size, seed, tld_set, lexicon Outputs: names I. Iterate over seed space (232) and identify candidate
seeds II. Find and generate the seeds’ associated
domain_set_size III. Determine the minimum set of seeds to produce all
domains (overlap in LCG output) [2]
Step 1: Identify Candidate Seeds 1. Seed the Ramnit DGA with every value 0-232
2. Generate the first domain from each seed – 27 hours on an AWS c3.8xlarge – 24 processes, each with its own CPU core and a portions
of the seed space – Resulting seed and domain tuples sorted and merged
3. Scan OpenDNS querylogs and find which domains received at least one query
4. Seeds which generated domains that received queries are candidate seeds
Audience Participation: Which are candidate seeds?
Candidate Seeds Example seed1, domain1 seed2, domain1 seed3, domain1 seed4, domain1
Step 2: Find Seeds’ Domain Set Size 1. Observe the domain’s hourly query counts for the
previous two weeks* 2. For each candidate seed, generate the next domain 3. Compare 2 to the seed’s composite query pattern
If they are similar: 1. Merge the pattern into the seed’s composite query pattern 2. Increment the seed’s domain set size 3. Goto 1 Otherwise: 1. Exit
* A vector with each position representing an hourly count of DNS queries
Audience Participation: What is this seed’s domain set size?
Seeds’ Domain Set Size Example seed1, domain1 seed1, domain2 seed1, domain3 seed1, domain4
Step 3: Minimum Seed Set for Domain Coverage
1. For each seed and its associated domain set… 2. Remove all domain sets that are subset of other
domain sets 3. Minimum seed set for domain coverage remains Seeds that remain aren’t necessarily “in the wild” They are seeds that generate all domains “in the wild”
Audience Participation: Which seeds would be eliminated?
Minimum Seed Set Example seed1: domain1, domain2 seed2: domain1, domain2, domain3 seed3: domain3, domain4 seed4: domain1, domain2, domain3, domain4 seed5: domain5
Brute Forcing Algorithm Weaknesses 1. The first domain from each seed is used to
located candidate seeds 2. No queries on that day means seed is ignored 3. Point in time analysis 4. DGAs collide with legitimate domain names
- 1 million monkeys typing in 1 million address bars will eventually browse to 4chan
Results
Results: Seeds, Domains, Clients 29 seeds, 3924 domains - Seeds confirmed by Symantec’s report
I found some seeds not listed in Symantec’s report - Not a big deal due to overlaps in Ramnit DGA’s LCG
seeds
I found some domains not listed in Symantec’s report - Bigger deal if Symantec is serious about takedowns
[7] [8]
Audience Participation: Was anyone here involved in the
Ramnit takedown?
Results: Patterns in Domain Queries by Seed
Results: Patterns in Domain Queries 1. Locate IPv4s that queried each domain 2. Create a graph of seed -> domains -> client IPv4s 3. Count connect components (I found two) S S S S S
D D D D D D D D D
C C C C C C
Results: Patterns in Domain Queries by IPv4 Groups
Applications and Improvements Generalize framework for use with all DGA implementations - Currently working with more than just Ramnit Vigilant monitoring instead of point in time search - Ramdo seeds are able to be updated by the C2 server - even if you RE the algorithm, you don't have the seed
unique to each compromised system Combine with other DGA detection techniques - co-occurrances and lexical features [6]
Conclusion Why should you care? - Many malware families are using DGAs - This is a new way to identify new badness
- Know the shared secret, find all the C2 domains
- Not all DGAs are created equal - Some are more difficult to track than others - malware authors are people too
- 3:30, “The Life and Times of an APT Malware Author”
Audience Participation: Are there any questions?
Thanks BsidesChicago
OpenDNS Johannes Bader Daniel Plohmann John Bambenek Thomas Mathew
Dhia Mahjoub Steve Mckinney
References http://johannesbader.ch/2014/12/the-dga-of-ramnit/ [1] https://labs.opendns.com/2015/02/18/at-high-noon-algorithms-do-battle/ [2] http://www.cc.gatech.edu/~ynadji3/docs/pubs/pleiades2012.pdf [3] http://www.slideshare.net/OpenDNS/shmoocon-2015-presentation [4] https://github.com/Andrewaeva/DGA [5] http://blogs.technet.com/b/mmpc/archive/2014/04/08/msrt-april-2014-ramdo.aspx [6] http://www.symantec.com/connect/blogs/ramnit-cybercrime-group-hit-major-law-enforcement-operation [7] http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/w32-ramnit-analysis.pdf [8]
http://www.malwaretech.com/2013/12/peer-to-peer-botnets-for-beginners.html [9] http://en.wikipedia.org/wiki/Botnet [10] http://commons.wikimedia.org/wiki/File:Snowflake-black.png [11] Somewhere on Twitter [12]
top related