libor mořkovský - recognizing malware

Download Libor Mořkovský - Recognizing Malware

Post on 16-Apr-2017

136 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

  • Recognizing malwareLibor Mokovsk

  • Computer virus

    bacterial cellbased on work by Anderson Brito

  • Computer virus

    executable fi le

    entry point

  • Computer virusInserting code into fi les is never good.

    executable fi le

    entry point

  • image courtesy ofLooking Glass Studios

    MalwareHow do you recognize a thief?

  • image courtesy ofLooking Glass StudiosTwentieth Century Fox

    MalwareHow do you recognize a thief?

  • MalwareHow do you recognize a thief?

    image courtesy ofLooking Glass StudiosTwentieth Century FoxParamount Pictures

  • Malwarecompletely different behaviors are considered badwe need a judge to decide who crossed the line

  • Malware | Many facesunlike real thieves, malware can be duplicatednot only duplicated, but also modifiedall this is done by machinestoo much work to judge each one manually

  • Finding similar files

    ooooooooooooo oo o

    oo

    oooo

    ooooooooo

    o

    ooooo

    o

    ooooooooo

    o

    o

    MDS1

    MD

    S2

    class

    oooooooo

    CLEAN

    MALWARE

    QUERY

    UNKNOWN

  • Finding similar filesneed a file representationneed a distance function

  • Finding similar files | File vectoreach executable file is represented by a feature vectorthe PE format is complex, so we keep exactly one version of the extractor code (C++) the vector comprises static and dynamic features, the exact content is proprietary

    Database record One record = constant vector of over 100 attributes

    the file fingerprint Each attribute has a data type and semantic

    Attribute Data Type Semantic

    sha256 32 byte array CHECKSUM

    pe_sect_cnt uint16_t VALUE

    pe_sect_rawoff_entry uint32_t OFFSET

    The complete contents of the vector are kept secret static and dynamic features of PE executables

  • Finding similar files | Distancesum of partial distanceseach distance operator assigned manuallyweights assigned manually to equalize contribution

    Nearest neighbor query

    Compound distance function Data type and semantic determine partial dist. func.

    Data Type Semantic Partial distance function

    32 byte array CHECKSUM RETURN_ZERO

    uint16_t VALUE EQUAL_RET32

    uint32_t OFFSET LOG

    Each partial distance function = one kernel function Over 100 kernels for every NN query

    Intermediate results kept in the Scratchpad

  • Finding similar files | Data~60 M data pointssparse and well separated (in many cases)

  • Finding similar files | Implementationwe started with GPUstheir high memory throughput allows naive implementation and rapid prototypingcolumn-oriented database

  • Classification | Requirementsfind easily what is responsible for a mistake transparency fix the problem quickly tractability

  • Classification | AlgorithmInstance based classifier.

  • Classifi cation | Optimizationsscaling and HW problems with GPUswe invested in algorithmic optimizations:VP-tree, distance bounded searchhand optimized distance function (assembly)CPU version is ~100x faster

  • Classification | Deployment

    Fi

    leSH

    Aan

    dus

    er id

    Fi

    lepr

    evale

    nce

    File

    classification

    Filefingerprint

    Generic detections

    File classifications and Evo-gen detections

    Threats

    Set updates

    Medusa

    Scavenger

    Avast users

    FileRep

  • Classification | Deployment

    Fi

    leSH

    Aan

    dus

    er id

    Fi

    lepr

    evale

    nce

    File

    classification

    Filefingerprint

    Generic detections

    File classifications and Evo-gen detections

    Threats

    Set updates

    Medusa

    Scavenger

    Avast users

    FileRep

  • Classification | Deployment

    Fi

    leSH

    Aan

    dus

    er id

    Fi

    lepr

    evale

    nce

    File

    classification

    Filefingerprint

    Generic detections

    File classifications and Evo-gen detections

    Threats

    Set updates

    Medusa

    Scavenger

    Avast users

    FileRep

  • Rule generatordetect more variants in the wild(our) rule is a conjunction of several conditionsknown as Win32:Evo-Gencompletely different optimization problem than classification - still uses the GPU

  • Q&A