carolinacon presentation on streaming analytics

23
CarolinaCon 11 One Step Closer to the Matrix: Machine Learning and Augmented Reality in Streaming Data Rob Weiss John Eberhardt

Upload: john3eberhardt

Post on 16-Jul-2015

146 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: CarolinaCon Presentation on Streaming Analytics

CarolinaCon 11

One Step Closer to the Matrix: Machine Learning and Augmented Reality in Streaming Data

Rob WeissJohn Eberhardt

Page 2: CarolinaCon Presentation on Streaming Analytics

What’s the Story?

• Rob and John have been working together for years• Rob is a Network Engineer and Hacker• John is a Data Scientist and Architect

• Two Great Tastes that Taste Great Together• Different perspectives bring new answers

• Rob and John are interested in how to create a paradigm shift in user interaction with data and network security• We are also probably slightly insane

CarolinaCon 11

Page 3: CarolinaCon Presentation on Streaming Analytics

The Defender’s Challenge

• The attacker has an inherent advantage – no rules!• So the defense problem is asymmetric• Classical methods fail more rapidly as computing power

becomes cheaper and more readily available• The Fortress or “Big Walls” security model is outdated and,

frankly, ineffective• Qualified people are in short supply• Can we crowdsource network defense?

CarolinaCon 11

Page 4: CarolinaCon Presentation on Streaming Analytics

How We Got Started

• A research project in a galaxy far, far away• We started modeling zero day attacks• We combined machine learning and streaming analytics to

detect novel patterns statistically• It worked well enough, but there were limitations

• Not sensitive enough• Not specific enough• Proprietary software limited flexibility• It still required a pretty sophisticated operator – and

those are in short supply• So . . .

CarolinaCon 11

Page 5: CarolinaCon Presentation on Streaming Analytics

Taking a Different Approach

CarolinaCon 11

• Could we do for raw data what GUIs did for computers and revolutionize human interaction with data?

• Complex streaming analytics are not tractable to the human

• The “last mile” requires a user interface that creates flow for the human analyst out of data

• Harness the power of metaphor to explain complex concepts to the human analyst (e.g. Windows)

• Streaming Analytics + Streaming User Experience = “Data Looming”

• Can we really make a prosthetic for the brain?

Page 6: CarolinaCon Presentation on Streaming Analytics

What? Don’t Flip Out . . .

CarolinaCon 11

Page 7: CarolinaCon Presentation on Streaming Analytics

Data Looming

• Can you point out every individual thread and show me how it is woven? Probably not.

• Can you tell me what it is? I sure hope so!

CarolinaCon 11

Data Looming

Watch threads on a loom – to the naked eye, the loom is too complex and moving too quickly for you to pick out the details, but you can quickly see when the overall pattern changes – usually within very few iterations. A simple, intuitive, scalable visualization of streaming analytics allows the human analyst to connect the “last mile” of disconnected events and is at the heart of what we are doing – merging complex streaming analytics with the sparse pattern detection capabilities of the human brain.

Page 8: CarolinaCon Presentation on Streaming Analytics

Pattern Recognition is For the Birds

A child can learn to recognize this pattern in 15 seconds, but a computer still can’t.

#1 - Eagle #2 - Swan #3 - ????

CarolinaCon 11

Page 9: CarolinaCon Presentation on Streaming Analytics

Getting to The Big Idea

Zero Day Work

William Gibson’sNeuromancer The Matrix

John Maeda’s Simplicityby Design

Open Source Network Expertise Data ScienceExpertise

Crowdsourcing

Hacktastic Innovation Explosion!!!

CarolinaCon 11

Page 10: CarolinaCon Presentation on Streaming Analytics

How I Did It by Victor Frankenstein

• Accelerate data analysis by extending streaming analytics to broader groups of less skilled human analysts

• Combine the speed, precision and recall of a computer, through an immersive interface, with the inherent sparse pattern recognition capabilities of the human brain• Streaming Analytics allow for rapid, real time

adjudication of data and make the user experience dynamic

• An immersive user experience makes complex analytics data “real” to the human and enables experiential learning

• Combining them in a single environment enables sparse pattern recognition in dynamic systems

CarolinaCon 11

Page 11: CarolinaCon Presentation on Streaming Analytics

How I Did It Continued (Abby Normal)

• Data: Streaming data from sensors, collectors, files, etc.• Platform: Streaming analytics process and analyze these

data, including attribution to the real world• Visual Language Construct: Integrates streaming data,

streaming analytics, and streaming user experience in a pluggable architecture

• Streaming User Experience: Immersive 3-D user experience allows analysts to interact directly with streaming data and analytics

CarolinaCon 11

Page 12: CarolinaCon Presentation on Streaming Analytics

Architecture (Meet the Architect)

Data Sensor (N+1)

Data Collector (N+1)

Kafka

Zookeeper

Kafka Queue

Nimbus

Worker Node

Storm

Trident-ML

Analytics Platform

Visual Language Construct

Streaming User Experience

Analytics and Countermeasures

Game Players

CarolinaCon 11

Page 13: CarolinaCon Presentation on Streaming Analytics

Design Principles

Principle Enables

Open Source Components Supports integration of streaming analytics and immersive user experience to create a dynamic feedback loop –rapidly adapt the platform from lessons learned from human experience

Streaming Analytics Accelerating analytics to keep pace with data collection (facilitating high collection rate)

Immersive Streaming User Experience

Extending the user interface to allow broader groups of analysts to use sophisticated analytics (addressing the recruiting challenge)

Pluggable Architecture “Bring your own” tools and analytics supports crowdsourcing and allows for aggressive exploitation of new analytics and user experience paradigms

CarolinaCon 11

Page 14: CarolinaCon Presentation on Streaming Analytics

Larry Byrd: Network Defender of the Future

A basketball player can watch your network. When an attack occurs, our player can quickly identify pattern shift using the same brain computation as when the player identifies a

shift in the offensive strategy of the opposing basketball team. Think about this as a data prosthetic for the human brain.

CarolinaCon 11

Page 15: CarolinaCon Presentation on Streaming Analytics

Enough of Us Talking at You

• Fight fire with fire – crowdsource all comers and create an asymmetric defense

• Align economic incentives, human behaviors, and defense objectives

• Do for data what GUIs did for computers – make it accessible!

• This isn’t about technology . . . it’s about revolutionizing the way humans interact with data to enable a game-changing leap forward

CarolinaCon 11

Page 16: CarolinaCon Presentation on Streaming Analytics

Innovation Is Often Strange

CarolinaCon 11

Page 17: CarolinaCon Presentation on Streaming Analytics

But Wait, There’s More!

Altamira Technologies Corporation 2014CarolinaCon 11

Page 18: CarolinaCon Presentation on Streaming Analytics

Demo Concept

Concept• Normal work environment – “normal” patterns give way to aberrations• This behavior is focused on network data, but could easily be any other

streaming dataDesign• Analytics cluster traffic based on source and destination port patterns

over time using k-means clustering• Cubes represent nodes on the network; streaming spheres represent

packets• Colors represent the behavior of nodes / packets based upon traffic –

Green is a client, Blue is a Server, Yellow is “undetermined behavior”

CarolinaCon 11

Green (client) Blue (server) Yellow (??)

Source Centroid 54760 1001 5066

Dest Centroid 791 54518 5511

Page 19: CarolinaCon Presentation on Streaming Analytics

Questions I Can Ask

• Is a given node on the network behaving as expected?• Watch the node colors - they should be consistent in a normal network:

some white nodes, a lot of blue (client) nodes, and some green nodes. What happens over time?

• Does my use of source and destination ports mark me out as a client or server? Does my role appear consistent or change?• The node colors indicate what they are – watch the colors of the nodes –

machines should have clear and consistent roles• Is my pattern of nodes that I am interacting with consistent? Am I interacting

with different partners?• Watch the stream patterns – machines should interact with consistent

groups• Do my behaviors adhere to regular time cycles? Can I apply time cycles to any of

the above (e.g., a workday)?• Watch the patterns change as cyclical time progresses in our “workday”

CarolinaCon 11

Page 20: CarolinaCon Presentation on Streaming Analytics

DEMO TIME!

Altamira Technologies Corporation 2014CarolinaCon 11

Page 21: CarolinaCon Presentation on Streaming Analytics

About Rob and John

• Rob Weiss is a senior systems engineer at G2 (www.g2-inc.com) with over 24 years of experience in government and commercial markets. He started with Legos and is now a tool builder and problem solver. Currently runs the Altamira Red Team and performs information security research, looking for hard problems to solve. Twitter: @3XPlo1T2

• John Eberhardt is a Data Scientist at 3E Services (www.3eservicesllc.com) with 20 years of quantitative problem solving and a penchant for trying to decipher symbolism in obscure 16th century literature. John has experience in analytical problem solving in healthcare, life sciences, security, financial services, consumer products, and transportation. Twitter: @JohnSEberhardt3

CarolinaCon 11

Page 22: CarolinaCon Presentation on Streaming Analytics

Repositories

• Apache Storm: https://github.com/apache/storm• Trident-ML: https://github.com/pmerienne/trident-ml• Rob Weiss: https://github.com/j105rob

CarolinaCon 11

Page 23: CarolinaCon Presentation on Streaming Analytics

Squiggly (probably won’t use this)

• A self organizing system consists of groups A, B, and C interacting

• Hence, the current state of A is {A|B,C}

• They influence each other {B|A,C}, {C|A,B} which means the system is described by f{{A|B,C},{B|A,C},{C|A,B}}

• However these groups are neither unitary nor static, which means at any given time they can have sub-attributes {Ai...An}, {Bi...Bn}, {Ci...Cn} that are unknown

• So now the system is described by f{{Ai | {Bi...Bn}, {Ci...Cn}},{Bi |{Ai...An}, {Ci...Cn}},{Ci |{Ai...An}, {Bi...Bn}}}

• How do you solve this np-hard problem?