using algorithms to brute force algorithms...a journey through time and namespace

Post on 25-Jul-2015

3.531 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Algorithms to Brute Force Algorithms

… a journey through time and namespace

Anthony Kasza Bsides Chicago 2015

Audience Participation: Answer a question, win a prize

Audience Participation: What is an algorithm?

algorithm noun Word used by programmers when they do not want to explain what they did.

[12]  

Outline Background Malware Communications and Botnet Architectures Analyzing Domain Generation Algorithms Ramnit Ramnit’s DGA Brute Force Identification of Ramnit DGA Seeds Results Graphs Applications and Improvements

Me Anthony Kasza Security Researcher: OpenDNS @anthonykasza github.com/anthonykasza

Background

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet

[10]  

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now?

[10]  

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home

[10]  

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home Botnets are resilient cloud based, often distributed, remote administration systems [10]  

Audience Participation: Name a malware

Malware Communications: IP Open socket Beacon to IP address Easy to set up Easy to take down

Client  

Implant  

C2  Server  

Client  

Implant  

Client  

Implant  

Malware Communications: P2P Open socket Beacon to super node peer(s) Very resilient

Peer consensus issues Complex to set up

Super  node  

Super  node  

Super  node  

Super  node  

[9]  

Client  

Implant  

Client  

Implant  

Client  

Implant  

Malware Communications: DNS Open socket Issue DNS query

Client  

Implant  

C2  Server  

Client  

Implant  

Client  

Implant  

DNS  Resolver  

Malware Communications: DNS Open socket Issue DNS query Open socket Beacon to IP address Relatively easy to set up Relatively easy to take down

Client  

Implant  

C2  Server  

Client  

Implant  

Client  

Implant  

DNS  Resolver  

Audience Participation: Name a botnet that uses DNS

Malware Communications: DNS Resiliency Tricks

Fast Flux – DNS A records change quickly Double Flux – DNS A and NS records change quickly Domain Generation Algorithms (DGA) – C2 domain names are generated dynamically by a deterministic function within the implant at run time. Samples are "strings proof"

How To DGA

Client  

DGA  

Date   Seed  

Hash/PRNG  

String   TLD  set  

Domain  name  

Lexicon  

query   connect  to  IP    

NXD  

A  

Start  

End  

Example DGA Output

vfxlsatformalisticirekb[.]com rd0ee55073a3776810962c124f02a99424[.]ws croialotvvnfliyjmvt[.]ru yxjsibeugmmj[.]in osghqrdmlyhh[.]net easebrainjobmarket[.]com

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Operator registers domain “just in time” before the implant generates it

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Registrar ensures the domain is inserted into the DNS

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Implant generates and resolves the domain [3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Implant connects to C2 IPv4

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Repeat: Operator is constantly registering domain names

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Audience Participation: Name a malware that uses a DGA

Malware that uses a DGA Banjori DirCrypt Dyre GameoverZeus Hesperbot Matsnu Necurs Pushdo Pykspa Qakbot Ramnit Shiotob

Simbda/Shiz Symmi TinyBanker Bedep Emotet Gozi Nymaim Suppobox Urlzone VolatileCedar Cryptolocker Conficker

Murofet BankPatch Bobax Ramdo Flashback Kelihos Rovnix Torpig Many more…

[5]  

Each DGA is Special Snowflake Conficker.C – generated 50k names per day Pushdo – DGA as a backup if C2 domain went down Kelihos – DGA as a backup if P2P network went down

newGOZ DGA domains… registered through a few common registrars typically registered 1hr before algo would generate them changed NS domains but reused NS IPv4s

[4]    [11]  

DGA Domain Query Periods Dyre

Ramnit

Matsnu

Pykspa

Bedep

~1 day

N/A

~2 weeks

~3 weeks

~1 week

Generalized DGA pseudo code… for i in domain_set_size: domain = generate_domain(date, magic) resolve domain if domain resolves contact domain StopIteration def generate_domain(date, magic): domain = '' for i in lexicon_item_count: item = random_select(lexicon, magic) domain = domain + item domain = domain + random_select(tld_set, magic) return domain

Generalized Algorithms Analyses Domain set size How many domains to generate Date Today's date

Seed A number used to ignite a PRNG

Salt A magic number or campaign ID

Lexicon A set of letters, n-grams, or words

Lexicon Items Count Number of items to use from lexicon

TLD set All possible TLDs

MD*, SHA*, Etc Some hash

PRNG Random numbers

Bitwise Math xor, shl/shr, mod, b64, ascii to hex

Names to contact These are often regex-able due to properties of the transformation function

Inputs Functions Outputs

An Algorithm Taxonomy from Inputs

Group   Lexicon   Domain    set  size  

Salt/Seed  

Date   Examples  

A   LeNers   Yes   Yes   Yes   Necurs,  GOZ,  Symmi,  Tinba,  Pykspa  

B   LeNers   Yes   Yes   No   Ramnit,  DirCrypt,  VolaVleCedar,  Ramdo  

C.i   LeNers   Yes   No   Yes   Conficker,  Dyre,  Cryptolocker,  Pushdo,  Qakbot  

C.ii   Words   Yes   No   Yes   Matsnu,  Rovnix  

Enter Ramnit

Audience Participation: Tell me anything about Ramnit

Ramnit Malware Worm/RAT Emerged 2010 “Borrowed” features from Zeus source 2011 Spread via EK, social media, bundled software, etc Uses a DGA

[7]  

Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain

[1]  

Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus

r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain [1]  

Ramnit DGA Pseudo Code

Client  

DGA  

Seed  uint32  

LCG  PRNG  

string   +  ".com"  

Domain  Name  

Lexicon  [a-­‐y]{8,19}  

query   connect  to  IP    

NXD  

A  

Ramnit DGA Pseudo Code

Unknowns 1.  Linear congruential

generator’s seed 2.  How many times this

loop occurs

Client  

DGA  

Seed  uint32  

LCG  PRNG  

string   +  ".com"  

Domain  Name  

Lexicon  [a-­‐y]{8,19}  

query   connect  to  IP    

NXD  

A  

Brute Forcing Ramnit DGA Seeds Inputs: domain_set_size, seed, tld_set, lexicon Outputs: names I.  Iterate over seed space (232) and identify candidate

seeds II.  Find and generate the seeds’ associated

domain_set_size III.  Determine the minimum set of seeds to produce all

domains (overlap in LCG output) [2]  

Step 1: Identify Candidate Seeds 1.  Seed the Ramnit DGA with every value 0-232

2.  Generate the first domain from each seed –  27 hours on an AWS c3.8xlarge –  24 processes, each with its own CPU core and a portions

of the seed space –  Resulting seed and domain tuples sorted and merged

3.  Scan OpenDNS querylogs and find which domains received at least one query

4.  Seeds which generated domains that received queries are candidate seeds

Audience Participation: Which are candidate seeds?

Candidate Seeds Example seed1, domain1 seed2, domain1 seed3, domain1 seed4, domain1

Step 2: Find Seeds’ Domain Set Size 1.  Observe the domain’s hourly query counts for the

previous two weeks* 2.  For each candidate seed, generate the next domain 3.  Compare 2 to the seed’s composite query pattern

If they are similar: 1.  Merge the pattern into the seed’s composite query pattern 2.  Increment the seed’s domain set size 3.  Goto 1 Otherwise: 1.  Exit

* A vector with each position representing an hourly count of DNS queries

Audience Participation: What is this seed’s domain set size?

Seeds’ Domain Set Size Example seed1, domain1 seed1, domain2 seed1, domain3 seed1, domain4

Step 3: Minimum Seed Set for Domain Coverage

1.  For each seed and its associated domain set… 2.  Remove all domain sets that are subset of other

domain sets 3.  Minimum seed set for domain coverage remains Seeds that remain aren’t necessarily “in the wild” They are seeds that generate all domains “in the wild”

Audience Participation: Which seeds would be eliminated?

Minimum Seed Set Example seed1: domain1, domain2 seed2: domain1, domain2, domain3 seed3: domain3, domain4 seed4: domain1, domain2, domain3, domain4 seed5: domain5

Brute Forcing Algorithm Weaknesses 1.  The first domain from each seed is used to

located candidate seeds 2.  No queries on that day means seed is ignored 3.  Point in time analysis 4.  DGAs collide with legitimate domain names

-  1 million monkeys typing in 1 million address bars will eventually browse to 4chan

Results

Results: Seeds, Domains, Clients 29 seeds, 3924 domains -  Seeds confirmed by Symantec’s report

I found some seeds not listed in Symantec’s report -  Not a big deal due to overlaps in Ramnit DGA’s LCG

seeds

I found some domains not listed in Symantec’s report -  Bigger deal if Symantec is serious about takedowns

[7]    [8]  

Audience Participation: Was anyone here involved in the

Ramnit takedown?

Results: Patterns in Domain Queries by Seed

Results: Patterns in Domain Queries 1.  Locate IPv4s that queried each domain 2.  Create a graph of seed -> domains -> client IPv4s 3.  Count connect components (I found two) S   S   S   S   S  

D   D   D   D   D  D   D   D   D  

C   C   C   C   C   C  

Results: Patterns in Domain Queries by IPv4 Groups

Applications and Improvements Generalize framework for use with all DGA implementations - Currently working with more than just Ramnit Vigilant monitoring instead of point in time search -  Ramdo seeds are able to be updated by the C2 server -  even if you RE the algorithm, you don't have the seed

unique to each compromised system Combine with other DGA detection techniques -  co-occurrances and lexical features [6]  

Conclusion Why should you care? -  Many malware families are using DGAs -  This is a new way to identify new badness

-  Know the shared secret, find all the C2 domains

-  Not all DGAs are created equal -  Some are more difficult to track than others -  malware authors are people too

-  3:30, “The Life and Times of an APT Malware Author”

Audience Participation: Are there any questions?

Thanks BsidesChicago

OpenDNS Johannes Bader Daniel Plohmann John Bambenek Thomas Mathew

Dhia Mahjoub Steve Mckinney

References http://johannesbader.ch/2014/12/the-dga-of-ramnit/ [1] https://labs.opendns.com/2015/02/18/at-high-noon-algorithms-do-battle/ [2] http://www.cc.gatech.edu/~ynadji3/docs/pubs/pleiades2012.pdf [3] http://www.slideshare.net/OpenDNS/shmoocon-2015-presentation [4] https://github.com/Andrewaeva/DGA [5] http://blogs.technet.com/b/mmpc/archive/2014/04/08/msrt-april-2014-ramdo.aspx [6] http://www.symantec.com/connect/blogs/ramnit-cybercrime-group-hit-major-law-enforcement-operation [7] http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/w32-ramnit-analysis.pdf [8]

http://www.malwaretech.com/2013/12/peer-to-peer-botnets-for-beginners.html [9] http://en.wikipedia.org/wiki/Botnet [10] http://commons.wikimedia.org/wiki/File:Snowflake-black.png [11] Somewhere on Twitter [12]

top related