using algorithms to brute force algorithms...a journey through time and namespace

61
Using Algorithms to Brute Force Algorithms … a journey through time and namespace Anthony Kasza Bsides Chicago 2015

Upload: opendns

Post on 25-Jul-2015

3.531 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Using Algorithms to Brute Force Algorithms

… a journey through time and namespace

Anthony Kasza Bsides Chicago 2015

Page 2: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Answer a question, win a prize

Page 3: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: What is an algorithm?

Page 4: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

algorithm noun Word used by programmers when they do not want to explain what they did.

[12]  

Page 5: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Outline Background Malware Communications and Botnet Architectures Analyzing Domain Generation Algorithms Ramnit Ramnit’s DGA Brute Force Identification of Ramnit DGA Seeds Results Graphs Applications and Improvements

Page 6: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Me Anthony Kasza Security Researcher: OpenDNS @anthonykasza github.com/anthonykasza

Page 7: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Background

Page 8: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet

[10]  

Page 9: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now?

[10]  

Page 10: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home

[10]  

Page 11: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications Let’s pretend… We all just compromised 10k hosts for our botnet What do we do now? Have our malware phone home Botnets are resilient cloud based, often distributed, remote administration systems [10]  

Page 12: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Name a malware

Page 13: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: IP Open socket Beacon to IP address Easy to set up Easy to take down

Client  

Implant  

C2  Server  

Client  

Implant  

Client  

Implant  

Page 14: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: P2P Open socket Beacon to super node peer(s) Very resilient

Peer consensus issues Complex to set up

Super  node  

Super  node  

Super  node  

Super  node  

[9]  

Client  

Implant  

Client  

Implant  

Client  

Implant  

Page 15: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DNS Open socket Issue DNS query

Client  

Implant  

C2  Server  

Client  

Implant  

Client  

Implant  

DNS  Resolver  

Page 16: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DNS Open socket Issue DNS query Open socket Beacon to IP address Relatively easy to set up Relatively easy to take down

Client  

Implant  

C2  Server  

Client  

Implant  

Client  

Implant  

DNS  Resolver  

Page 17: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Name a botnet that uses DNS

Page 18: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DNS Resiliency Tricks

Fast Flux – DNS A records change quickly Double Flux – DNS A and NS records change quickly Domain Generation Algorithms (DGA) – C2 domain names are generated dynamically by a deterministic function within the implant at run time. Samples are "strings proof"

Page 19: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

How To DGA

Client  

DGA  

Date   Seed  

Hash/PRNG  

String   TLD  set  

Domain  name  

Lexicon  

query   connect  to  IP    

NXD  

A  

Start  

End  

Page 20: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Example DGA Output

vfxlsatformalisticirekb[.]com rd0ee55073a3776810962c124f02a99424[.]ws croialotvvnfliyjmvt[.]ru yxjsibeugmmj[.]in osghqrdmlyhh[.]net easebrainjobmarket[.]com

Page 21: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Operator registers domain “just in time” before the implant generates it

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Page 22: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Registrar ensures the domain is inserted into the DNS

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Page 23: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Implant generates and resolves the domain [3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Page 24: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Implant connects to C2 IPv4

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Page 25: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware Communications: DGA -  Function that generates

domain names -  Shared secret between

botnet implants and operators

-  Often incorporates the date Repeat: Operator is constantly registering domain names

[3]  

Client  

Implant  

Registrar   Operator  

DNS  Resolver  

C2  Server  

Page 26: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Name a malware that uses a DGA

Page 27: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Malware that uses a DGA Banjori DirCrypt Dyre GameoverZeus Hesperbot Matsnu Necurs Pushdo Pykspa Qakbot Ramnit Shiotob

Simbda/Shiz Symmi TinyBanker Bedep Emotet Gozi Nymaim Suppobox Urlzone VolatileCedar Cryptolocker Conficker

Murofet BankPatch Bobax Ramdo Flashback Kelihos Rovnix Torpig Many more…

[5]  

Page 28: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Each DGA is Special Snowflake Conficker.C – generated 50k names per day Pushdo – DGA as a backup if C2 domain went down Kelihos – DGA as a backup if P2P network went down

newGOZ DGA domains… registered through a few common registrars typically registered 1hr before algo would generate them changed NS domains but reused NS IPv4s

[4]    [11]  

Page 29: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

DGA Domain Query Periods Dyre

Ramnit

Matsnu

Pykspa

Bedep

~1 day

N/A

~2 weeks

~3 weeks

~1 week

Page 30: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Generalized DGA pseudo code… for i in domain_set_size: domain = generate_domain(date, magic) resolve domain if domain resolves contact domain StopIteration def generate_domain(date, magic): domain = '' for i in lexicon_item_count: item = random_select(lexicon, magic) domain = domain + item domain = domain + random_select(tld_set, magic) return domain

Page 31: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Generalized Algorithms Analyses Domain set size How many domains to generate Date Today's date

Seed A number used to ignite a PRNG

Salt A magic number or campaign ID

Lexicon A set of letters, n-grams, or words

Lexicon Items Count Number of items to use from lexicon

TLD set All possible TLDs

MD*, SHA*, Etc Some hash

PRNG Random numbers

Bitwise Math xor, shl/shr, mod, b64, ascii to hex

Names to contact These are often regex-able due to properties of the transformation function

Inputs Functions Outputs

Page 32: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

An Algorithm Taxonomy from Inputs

Group   Lexicon   Domain    set  size  

Salt/Seed  

Date   Examples  

A   LeNers   Yes   Yes   Yes   Necurs,  GOZ,  Symmi,  Tinba,  Pykspa  

B   LeNers   Yes   Yes   No   Ramnit,  DirCrypt,  VolaVleCedar,  Ramdo  

C.i   LeNers   Yes   No   Yes   Conficker,  Dyre,  Cryptolocker,  Pushdo,  Qakbot  

C.ii   Words   Yes   No   Yes   Matsnu,  Rovnix  

Page 33: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Enter Ramnit

Page 34: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Tell me anything about Ramnit

Page 35: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Ramnit Malware Worm/RAT Emerged 2010 “Borrowed” features from Zeus source 2011 Spread via EK, social media, bundled software, etc Uses a DGA

[7]  

Page 36: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain

[1]  

Page 37: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Ramnit DGA Pseudo Code class RandInt: # LCG PRNG, random uint32 def __init__(self, seed): self.seed = seed def rand_int_modulus(self, modulus): ix = self.seed ix = 16807*(ix % 127773) - 2836*(ix / 127773) / & 0xFFFFFFFF self.seed = ix return ix % modulus

r = RandInt(seed) # seed = ? for i in domain_set_size: # domain_set_size = ? seed_a = r.seed domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19} seed_b = r.seed domain = '' for i in domain_length: char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y] domain += char domain += ".com” # tld_set = [“.com”] m = seed_a*seed_b r.seed = (m + m//(2**32)) % 2**32 yield domain [1]  

Page 38: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Ramnit DGA Pseudo Code

Client  

DGA  

Seed  uint32  

LCG  PRNG  

string   +  ".com"  

Domain  Name  

Lexicon  [a-­‐y]{8,19}  

query   connect  to  IP    

NXD  

A  

Page 39: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Ramnit DGA Pseudo Code

Unknowns 1.  Linear congruential

generator’s seed 2.  How many times this

loop occurs

Client  

DGA  

Seed  uint32  

LCG  PRNG  

string   +  ".com"  

Domain  Name  

Lexicon  [a-­‐y]{8,19}  

query   connect  to  IP    

NXD  

A  

Page 40: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Brute Forcing Ramnit DGA Seeds Inputs: domain_set_size, seed, tld_set, lexicon Outputs: names I.  Iterate over seed space (232) and identify candidate

seeds II.  Find and generate the seeds’ associated

domain_set_size III.  Determine the minimum set of seeds to produce all

domains (overlap in LCG output) [2]  

Page 41: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Step 1: Identify Candidate Seeds 1.  Seed the Ramnit DGA with every value 0-232

2.  Generate the first domain from each seed –  27 hours on an AWS c3.8xlarge –  24 processes, each with its own CPU core and a portions

of the seed space –  Resulting seed and domain tuples sorted and merged

3.  Scan OpenDNS querylogs and find which domains received at least one query

4.  Seeds which generated domains that received queries are candidate seeds

Page 42: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Which are candidate seeds?

Page 43: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Candidate Seeds Example seed1, domain1 seed2, domain1 seed3, domain1 seed4, domain1

Page 44: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Step 2: Find Seeds’ Domain Set Size 1.  Observe the domain’s hourly query counts for the

previous two weeks* 2.  For each candidate seed, generate the next domain 3.  Compare 2 to the seed’s composite query pattern

If they are similar: 1.  Merge the pattern into the seed’s composite query pattern 2.  Increment the seed’s domain set size 3.  Goto 1 Otherwise: 1.  Exit

* A vector with each position representing an hourly count of DNS queries

Page 45: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: What is this seed’s domain set size?

Page 46: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Seeds’ Domain Set Size Example seed1, domain1 seed1, domain2 seed1, domain3 seed1, domain4

Page 47: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Step 3: Minimum Seed Set for Domain Coverage

1.  For each seed and its associated domain set… 2.  Remove all domain sets that are subset of other

domain sets 3.  Minimum seed set for domain coverage remains Seeds that remain aren’t necessarily “in the wild” They are seeds that generate all domains “in the wild”

Page 48: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Which seeds would be eliminated?

Page 49: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Minimum Seed Set Example seed1: domain1, domain2 seed2: domain1, domain2, domain3 seed3: domain3, domain4 seed4: domain1, domain2, domain3, domain4 seed5: domain5

Page 50: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Brute Forcing Algorithm Weaknesses 1.  The first domain from each seed is used to

located candidate seeds 2.  No queries on that day means seed is ignored 3.  Point in time analysis 4.  DGAs collide with legitimate domain names

-  1 million monkeys typing in 1 million address bars will eventually browse to 4chan

Page 51: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Results

Page 52: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Results: Seeds, Domains, Clients 29 seeds, 3924 domains -  Seeds confirmed by Symantec’s report

I found some seeds not listed in Symantec’s report -  Not a big deal due to overlaps in Ramnit DGA’s LCG

seeds

I found some domains not listed in Symantec’s report -  Bigger deal if Symantec is serious about takedowns

[7]    [8]  

Page 53: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Was anyone here involved in the

Ramnit takedown?

Page 54: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Results: Patterns in Domain Queries by Seed

Page 55: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Results: Patterns in Domain Queries 1.  Locate IPv4s that queried each domain 2.  Create a graph of seed -> domains -> client IPv4s 3.  Count connect components (I found two) S   S   S   S   S  

D   D   D   D   D  D   D   D   D  

C   C   C   C   C   C  

Page 56: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Results: Patterns in Domain Queries by IPv4 Groups

Page 57: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Applications and Improvements Generalize framework for use with all DGA implementations - Currently working with more than just Ramnit Vigilant monitoring instead of point in time search -  Ramdo seeds are able to be updated by the C2 server -  even if you RE the algorithm, you don't have the seed

unique to each compromised system Combine with other DGA detection techniques -  co-occurrances and lexical features [6]  

Page 58: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Conclusion Why should you care? -  Many malware families are using DGAs -  This is a new way to identify new badness

-  Know the shared secret, find all the C2 domains

-  Not all DGAs are created equal -  Some are more difficult to track than others -  malware authors are people too

-  3:30, “The Life and Times of an APT Malware Author”

Page 59: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Audience Participation: Are there any questions?

Page 60: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

Thanks BsidesChicago

OpenDNS Johannes Bader Daniel Plohmann John Bambenek Thomas Mathew

Dhia Mahjoub Steve Mckinney

Page 61: Using Algorithms to Brute Force Algorithms...A Journey Through Time and Namespace

References http://johannesbader.ch/2014/12/the-dga-of-ramnit/ [1] https://labs.opendns.com/2015/02/18/at-high-noon-algorithms-do-battle/ [2] http://www.cc.gatech.edu/~ynadji3/docs/pubs/pleiades2012.pdf [3] http://www.slideshare.net/OpenDNS/shmoocon-2015-presentation [4] https://github.com/Andrewaeva/DGA [5] http://blogs.technet.com/b/mmpc/archive/2014/04/08/msrt-april-2014-ramdo.aspx [6] http://www.symantec.com/connect/blogs/ramnit-cybercrime-group-hit-major-law-enforcement-operation [7] http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/w32-ramnit-analysis.pdf [8]

http://www.malwaretech.com/2013/12/peer-to-peer-botnets-for-beginners.html [9] http://en.wikipedia.org/wiki/Botnet [10] http://commons.wikimedia.org/wiki/File:Snowflake-black.png [11] Somewhere on Twitter [12]