the cia’s “grand challenges” with big data from structure:data 2013

Post on 27-Jan-2015

108 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation from Ira "Gus" Hunt, CIA #dataconf More at http://event.gigaom.com/structuredata/

TRANSCRIPT

1

!

Ira A. (Gus) Hunt Chief Technology Officer

Beyond Big Data

Riding the Technology Wave

Our Mission

We are the nation's first line of defense. We accomplish what others cannot accomplish and go where others cannot go. We carry out our mission by:

Collecting information that reveals the plans, intentions and capabilities of our adversaries and provides the basis for decision and action. Producing timely analysis that provides insight, warning and opportunity to the President and decisionmakers charged with protecting and advancing America's interests. Conducting covert action at the direction of the President to preempt threats or achieve US policy objectives.

2  

3  

4 Big Bets

–  Acquire, federate, secure and exploit. Grow the haystack, magnify the needles. Revolutionize Big Data Exploitation

Accelerate Operational Excellence

Serve CIA by supporting the IC

Drive Performance through Talent Management

–  Assume a leadership role in IC activities that matter to CIA; Build to share

–  Innovate IT operations and run IT like a business.

–  Focus on continuous learning and diversity of thought, experience, background

1  

4  

2  3  

6 Key Technology Enablers

–  World-class abilities to discover patterns, correlate information, understand plans and intentions, and find and identify operational targets in a sea of data. Big Data analytics as a service

Advanced Mission Analytics—Analytics as a Service

Enterprise Widgets and Services

Security as a Service

Data Harbor—Data as a Service

–  One environment, all data, protected and secure.--ubiquitous encryption, enterprise authentication, audit, DRM, secure ID propagation, and Gold Version C&A.

–  A customizable, integrated and adaptive webtop that lets analysts, ops officers, and targeters to “have it their way”. Personalization in context.

–  An ultra-high performance data environment that enables CIA missions to acquire, federate, and position and securely exploit huge volumes data. Data in context.

1  

4  Cloud Computing—Infrastructure as a Service

–  Capacity ahead of demand. Large scale, elastic, commodity hosting, storage, and compute 5  

–  Immediate, secure and appropriate access to people, data and tools from anywhere at anytime Secure Mobility 0  

6

It’s a

Big Data

World

Google > 100 PB

> 1T indexed URLs > 3 million servers

> 7.2B page-views/day

7

8

FaceBook > 1 billion users

> 300PB; +> 500TB/day > 35% of world’s photographs

9

YouTube > 1000PB

+>72 hours/minute >37 million hours/year > 4 billion views/day

10

World Population > 7,057,065,162

11

Twitter > 124B tweets/year

> 390M/day ~4500/sec

12

Global Text Messages > 6.1T per year

> 193,000 per second > 876 per person per year

13

US Cell Calls > 2.2 T minutes/year

> 19 minutes / person / day (uncompressed < 1 YouTube/year)

14

3 Driving Forces

Social

15

Mobile

Cloud

16

Big Data

+ + =

17

+ + Increases the velocity of

innovation

18

Accelerates social Change

+ +

19

20

Altered the Flow

of Information

+ +

3 Emerging

Forces

Nano

22

Bio

Sensors

23

Microphone Image 3-axis accelerometer Touch Light Proximity Geolocation

Mobile Sensor Platform

Communicator, Tricorder, Transporter

24

Pacemaker Blood sugar tester Insulin controller Health monitor Exercise coach Remote tune-ups Early warning system

Mobile Health Platform

25

Identity by 3-axis accelerometer Gender (71%) Height--tall or short (80%) Weight--heavy or light (80%) You by your gait (100%)

Mobile Sensor Platform

Actitracker—Android App

26

The inanimate becomes sentient

+ + + +

+ =

27

Smarter Planet

Cars drive themselves

Machines know your needs

+ + + +

+ =

28

Drive radical efficiencies Enhance social engagement Improve information sharing Enables global reach Green (automatic routing) Improve our health Stop/prevent crime …

+ + + +

+ =

2  

3  

Sensors are Really Big

Sensors are unbounded 1  

Sensors are indiscriminate

Sensors are promiscuous

2  

3  

The Internet of Things is Bigger

Everything is Connected 1  

Everything is a Sensor

Everything Communicates

31

That’s the

Really Big Data

Challenge of the future

32

Why We Care

33

Why We Care

34

Why We Care

35

Why We Care

2  3  

Impact of Big Data

Know what we know

Discover the gaps in our knowledge

More effective use of expensive or long lead collection assets

1  

4  Focus targeting to fill the gaps

Better global coverage to limit surprise 5  Enhance understanding and improve analysis 6  

37

Implications

2  3  

4 Rules of Big Data

It’s the data…

Power to the people

Context, context, context

1  

4  Latency breeds contempt

- Apologies to James Carville

- Apologies to the Black Panthers

- Apologies to Aesop

- Apologies to Lord Harold Samuel

39

It’s the Data…

Data vs Tools—A History Lesson

•  Sophisticated tools without the data are useless

•  Mediocre tools with the data are frustrating

•  Analysts will always opt for frustration over futility, if that is their only option

2  

3  

Our Job Leverage the Big Data world

Find the Information that Matters

Connect the Dots

Understand the Plans of our Adversaries Safeguard our national security

1  

4  

The Problem

42

2  

3  

Our Problem: Which 5K

Don’t know the future value of data

We cannot connect dots we don’t have

Traditional, requirements driven, collection fails in the Big Data world

- Can’t task for data you don’t know you do need - The few cannot know the needs of the many - Global Coverage requires Global Data

1  

2  3  

Characteristics of Big Data

More is always better

Signal to noise only gets worse

Requirements are usually hindsight

1  

4  Enumeration not modeling

45

•  Analysts and operators are not data engineers •  Need insight and understanding •  Ask a question and get a coherent answer •  Cannot know what data sets contain

information of value to them •  Imbue data services and tools with those

smarts •  Smart Data, smart tools, smarter intelligence

Data as a Service

46

Power to the People

47

•  Analytics and tools are hard to use •  Specialists are required to derive value •  Skilled people are in short supply •  Algorithms are dense and arcane •  Require a lot of hand curation •  Built for business not for intelligence

Today

48

New Fields of Expertise

Data Scientist Information Engineer

Data Science

Data science combines elements from many fields:

Math Statistics Data Engineering Pattern Recognition and Learning Advanced Computing Visualization Uncertainty Modeling Data Warehousing High performance computing

* Wikipedia

*

50

The power of big data can only be fully realized

when it is in the hands of the average user

Big Data Democracy Wins

Tomorrow

•  Elegant, powerful and easy to use tools and visualizations

•  Machines to do more of the heavy lifting

•  Intelligent systems that learn from the user

•  Correlation not search

•  “Curiosity layer”– machines that are curious on your behalf

52

People

Places

Organizations

Time

Events

Concepts

Things

7 Universal Constructs for Analytics

53

User Built Recipes

Keep it Simple

•  Data Scientists focus on hard problems

•  Build reusable components that anyone can apply—Recipes

•  Share them widely—Apps Store/Apps Mall—Recipe Book

•  Let users assemble components their way •  Experiment and fail quickly to succeed faster

55

Latency Breeds Contempt

Its All About Speed

•  Hadoop/Map Reduce—batch •  Flexible, powerful, slow

•  Equivalent of Real-Time Map/Reduce •  Flexible, powerful and fast •  Demel, Caffeine, Impala, Apache Drill, Spanner…

•  Recursive Streams processing w/

complex analytics

•  In-memory—peta-scale RAM architectures •  Distributed, in-memory analytics

Tectonic Technology Shifts

Traditional Processing Data on SAN

Move Data to Question Backup

Vertical scaling Capacity after demand

DR Size to peak load

Tape SAN Disk

RAM limited

Mass Analytics/Big Data Data at processor Move Question to Data Replication management Horizontal scaling Capacity ahead of demand COOP Dynamic/elastic provisioning SAN Disk SSD Peta-scale RAM

New Computing Architectures

•  Data close to compute •  Power at the edge •  Optical Computing/Optical Bus •  End of the motherboard—shared pools of

everything •  Software defined everything—compute,

storage, networking, data center •  Network is the bottleneck and constraint

59

Context, Context, Context

Everything in Your Frame of Reference

•  Widgets—Webtop in context to business

•  Schema on Read—Data in context to your question

•  User assembled analytics—answers in context to your questions

•  Elastic computing—computing in context to your demand

61

Closing Thoughts

62

High Noon in the

Information Age

63

It is nearly within our grasp to compute on all human

generated information

64

FaceBook > 1 billion users

> 35% of all photographs

65

The inanimate is rapidly becoming sentient

Smarter Planet

Cars drive themselves

Machines know your needs

66

3rd Wave of Computing

Cognitive Machines

Watson

67

Moving faster than government can keep up The legal system is woefully behind What are your rights? Who owns your data? Driving the pace of social change Exponentially increasing cyber threats

+ + + +

+ =

68

top related