big data driven: official statistics

24
© 2011 IBM Corporation Information Management Big Data Driven: Official Statistics Amish Patel, Big Data Leader for Government, Europe [email protected]

Upload: chavi

Post on 14-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Big Data Driven: Official Statistics. Amish Patel, Big Data Leader for Government, Europe [email protected]. Agenda. Drivers for leveraging Big Data Implications of Big Data on Official Statistics Challenges & Opportunities Industrialisation and Collaborative model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Big Data Driven:Official StatisticsAmish Patel, Big Data Leader for Government, [email protected]

Page 2: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Agenda

Drivers for leveraging Big Data

Implications of Big Data on Official Statistics–Challenges & Opportunities–Industrialisation and Collaborative model–New products and indicators

Page 3: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Drivers for leveraging big data

Page 4: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Page 5: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Data AVAILABLE to an organization

Data an organization can PROCESS

The Big Data Conundrum

The economies of deletion have changed….– Leading us into new opportunities and challenges

The percentage of available data an enterprise can analyze is decreasing proportionately to the data available to that enterprise– Quite simply, this means as enterprises, we are getting

“more naive” about our business over time

Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s bottom line

Page 6: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Implications Of Big Data On Official Statistics

6

Page 7: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Challenges & Opportunity1. Impact on Policy and Development issues

2. Methodological: bridging the gaps by combining multiple data sources

3. Technology (processing and storage)

4. Security/Privacy

5. Governance

6. Financial

Page 8: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

1. Impact On Policy And Development IssuesExample: Leveraging Big Data for Currency of National Statistics

Page 9: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

2. MethodologicalExample: Bridging the gaps by combining multiple data sources

Page 10: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

3. Technology – Processing and StorageExample: Storage is key to your Infrastructure

Smarter Storage

Designed for dataDeliver insights in seconds through systems built to process a variety of data at scale

Incorporates cloud technologies to improve service quality, speed of delivery and efficiency

Optimize performance and cost by matching workloads with the best platform

to meet specific workload requirementsSelf-OptimizingSelf-Optimizing

Cloud AgileCloud Agile

Efficient by DesignEfficient by Design

10

Page 11: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Data Footprint ReductionActive Data Backup

Data

Real-time Compression

40-80%Best

40-80%

20-30% 80-95 %Best

DataDeduplication

• Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as

the data is being read and written – Short patterns for frequent data

– Longer patterns for infrequent data.

– Can achieve 40 to 80 percent reduction in storage capacity.

• Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data.

– Store only one unique instance of the data

– Redundant data replaced with pointer

Page 12: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Storage Tiers – A trade-off between performance and cost

Server

Cache, Flashand Solid-State Drives

Hard Disk Drives

Tape

Cloud

FasterPerformance

LowerCost

Technologies allow us to place and move data to the

appropriate storage tier to balance between performance

and cost

Page 13: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Key Characteristics

4. Security/PrivacyNeed real-time data activity monitoring for security & compliance

Single Integrated Appliance Non-invasive/disruptive, cross-platform architecture Dynamically scalable SOD enforcement for DBA access Auto discover sensitive resources and data Detect or block unauthorized & suspicious activity Granular, real-time policies

Who, what, when, how

Continuous, policy-based, real-time monitoring of all data traffic activities, including actions by privileged users

Database infrastructure scanning for missing patches, mis-configured privileges

and other vulnerabilities

Data protection compliance automation Collector Appliance

Host-based Probes (S-TAPs)

Data Repositories (databases, warehouses, file

shares, Big Data)

100% visibility including local DBA access Minimal performance impact Does not rely on resident logs that can easily be erased

by attackers, rogue insiders No environment changes Prepackaged vulnerability knowledge base and

compliance reports for SOX, PCI, etc. Growing integration with broader security and

compliance management vision

Page 14: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

5. GovernanceVision for information integration & governance

Internal App Data

Data Warehous

e

Data Warehous

e

Traditional

Sources

Traditional

Sources

StructuredRepeatable

Linear

Transaction Data

ERP data

Mainframe Data

OLTP System Data

HadoopStreamsHadoopStreams

New Sources

New Sources

UnstructuredExploratory

Iterative

Web Logs

Social Data

Text & Images

Sensor Data

RFID

DataWarehouse

HadoopStreams

TraditionalSources

NewSources

InformationIntegration,

Governance & Context

Accumulation

Systems Of Record and Systems Of Engagement

Traditional ApproachStructured, analytical, logical

Systems of Record

New ApproachCreative, holistic thought, intuition

Systems Of Engagement

Page 15: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Governance concerns for big data customers

How do I integrate and link my big data

environment with my current one ?

How do I integrate and link my big data

environment with my current one ?

How do I create a trusted view of my

customers and products

for big data ?

How do I create a trusted view of my

customers and products

for big data ?

Is a governed and auditable archive possible

with big data ?

Is a governed and auditable archive possible

with big data ?

How do I cleanse and validate the results of my big

data analysis ?

How do I cleanse and validate the results of my big

data analysis ?

How do I protect data in a big data

environment ?

How do I protect data in a big data

environment ?

Agile. Simple. Trusted

Information.

Page 16: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Governance in an exploratory Big Data environment1. Ensure trust & compliance

•Lineage of data as it enters and leaves the big data system

•Secure the big data systems from breaches

•Create masked dev and test analytics clusters

2. Accelerate time to value•High performance data provisioning•Integrated data integration and stream

analytics platform

3. Lower total cost of ownership•Simplified tooling to improve productivity

of developers and testers•Automated system security •Complete visibility into the data

movement and lifecycle

High Performance and high quality data loads

Secured BigInsights to

prevent any data breaches

Create privatized data in real time or on the cluster to ensure data

protection

Low cost historical archive loaded to Hadoop

for exploratory analytics

Integration for improved segmentation of analytical

data sources

Page 17: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

6. Financial

Invest and

define

Motivate and

educate

Incubate and

evaluate

• To private Company for value-added services to citizens

• Pay to private Company for inexpensive services

• Typically cloud-based

• Services free or discounted

• Funded by other parts of the business

• Can be non-profit organisations

Citizens-Pay NS-Pay Businesses-Pay

Business ModelEngagement Model

Information(catalogue and datasets)

NS co-investsAccelerate evolution of ecosystem

Services built & maintained by community on top of open-data

Link Data

Link D

ata,

aggr

egat

e dat

a

Increase value

of open-data

NS

Page 18: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

Industrialisation and Collaborative ModelLeverage City Forward model for National Statistics

Page 19: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

How safe ismy neighborhood?

Which career isright for me?

What type ofeducation do I need?

Sources: http://www.chicagocitycrime.com/, http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm, http://cityforward.org

Impact on Everyday Life

Page 20: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

New Products and IndicatorsEvolving beyond statistics to predictive analytics, sharing complementary datasets with private sector and citizens

Examples:

Predictive models for healthcare cost reduction and outcome optimisation

Epidemic outbreak surveillance – hotspots, progression waves

Aligning public services (federal, regional and city level) to existing and predictive demographic data

Page 21: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

21

Example: Traffic Management for Sustainability and Efficiency

Multimodal Data Streams– GPS– Cell-phones (location tracking)– Public Transport (bus, docking)– Pollution measurements– Weather Conditions (including road conditions)– Optical traffic flow detectors– Travel time data based on plate recognition– Induction loop detector data– Accidents in network as they are being recorded– Road closures (road work, etc)– Still pictures from road cameras

Real Time Traffic Monitoring & Information

(Multimodal) Travel Planner

GPSData

Streams

Real Time Transformation

Logic

Real Time Geo

Mapping

Real Time Speed & Heading

Estimation

Real Time Aggregates &

Statistics

DataWarehouseWeb

Server

GoogleEarth

Offlinestatisticalanalysis

Interactivevisualization

Storageadapters

Page 22: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

22

Thank You

Page 23: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

www.sendsteps.comPrepare to react; keep your phone ready!

TXT 1

2

Text to +316 4250 0030

Type Session <space> WS2 <space> your answer

Internet 1

2

Go to sendc.com

Log in with Session

Posting messages is anonymousNo additional charge per message

3 Type WS2 <space> your answer

Page 24: Big Data Driven: Official Statistics

© 2011 IBM Corporation

Information Management

What kind of Use-case enabled by Big Data technology do you think will add value to your organisation for calculating official statistics?

Internet Go to sendc.com and log in with Session Type WS2 <space> Your answer

TXT Send to 06 4250 0030: Session Type WS2 <space> Your answer