big data driven: official statistics
DESCRIPTION
Big Data Driven: Official Statistics. Amish Patel, Big Data Leader for Government, Europe [email protected]. Agenda. Drivers for leveraging Big Data Implications of Big Data on Official Statistics Challenges & Opportunities Industrialisation and Collaborative model - PowerPoint PPT PresentationTRANSCRIPT
© 2011 IBM Corporation
Information Management
Big Data Driven:Official StatisticsAmish Patel, Big Data Leader for Government, [email protected]
© 2011 IBM Corporation
Information Management
Agenda
Drivers for leveraging Big Data
Implications of Big Data on Official Statistics–Challenges & Opportunities–Industrialisation and Collaborative model–New products and indicators
© 2011 IBM Corporation
Information Management
Drivers for leveraging big data
© 2011 IBM Corporation
Information Management
© 2011 IBM Corporation
Information Management
Data AVAILABLE to an organization
Data an organization can PROCESS
The Big Data Conundrum
The economies of deletion have changed….– Leading us into new opportunities and challenges
The percentage of available data an enterprise can analyze is decreasing proportionately to the data available to that enterprise– Quite simply, this means as enterprises, we are getting
“more naive” about our business over time
Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s bottom line
© 2011 IBM Corporation
Information Management
Implications Of Big Data On Official Statistics
6
© 2011 IBM Corporation
Information Management
Challenges & Opportunity1. Impact on Policy and Development issues
2. Methodological: bridging the gaps by combining multiple data sources
3. Technology (processing and storage)
4. Security/Privacy
5. Governance
6. Financial
© 2011 IBM Corporation
Information Management
1. Impact On Policy And Development IssuesExample: Leveraging Big Data for Currency of National Statistics
© 2011 IBM Corporation
Information Management
2. MethodologicalExample: Bridging the gaps by combining multiple data sources
© 2011 IBM Corporation
Information Management
3. Technology – Processing and StorageExample: Storage is key to your Infrastructure
Smarter Storage
Designed for dataDeliver insights in seconds through systems built to process a variety of data at scale
Incorporates cloud technologies to improve service quality, speed of delivery and efficiency
Optimize performance and cost by matching workloads with the best platform
to meet specific workload requirementsSelf-OptimizingSelf-Optimizing
Cloud AgileCloud Agile
Efficient by DesignEfficient by Design
10
© 2011 IBM Corporation
Information Management
Data Footprint ReductionActive Data Backup
Data
Real-time Compression
40-80%Best
40-80%
20-30% 80-95 %Best
DataDeduplication
• Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as
the data is being read and written – Short patterns for frequent data
– Longer patterns for infrequent data.
– Can achieve 40 to 80 percent reduction in storage capacity.
• Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data.
– Store only one unique instance of the data
– Redundant data replaced with pointer
© 2011 IBM Corporation
Information Management
Storage Tiers – A trade-off between performance and cost
Server
Cache, Flashand Solid-State Drives
Hard Disk Drives
Tape
Cloud
FasterPerformance
LowerCost
Technologies allow us to place and move data to the
appropriate storage tier to balance between performance
and cost
© 2011 IBM Corporation
Information Management
Key Characteristics
4. Security/PrivacyNeed real-time data activity monitoring for security & compliance
Single Integrated Appliance Non-invasive/disruptive, cross-platform architecture Dynamically scalable SOD enforcement for DBA access Auto discover sensitive resources and data Detect or block unauthorized & suspicious activity Granular, real-time policies
Who, what, when, how
Continuous, policy-based, real-time monitoring of all data traffic activities, including actions by privileged users
Database infrastructure scanning for missing patches, mis-configured privileges
and other vulnerabilities
Data protection compliance automation Collector Appliance
Host-based Probes (S-TAPs)
Data Repositories (databases, warehouses, file
shares, Big Data)
100% visibility including local DBA access Minimal performance impact Does not rely on resident logs that can easily be erased
by attackers, rogue insiders No environment changes Prepackaged vulnerability knowledge base and
compliance reports for SOX, PCI, etc. Growing integration with broader security and
compliance management vision
© 2011 IBM Corporation
Information Management
5. GovernanceVision for information integration & governance
Internal App Data
Data Warehous
e
Data Warehous
e
Traditional
Sources
Traditional
Sources
StructuredRepeatable
Linear
Transaction Data
ERP data
Mainframe Data
OLTP System Data
HadoopStreamsHadoopStreams
New Sources
New Sources
UnstructuredExploratory
Iterative
Web Logs
Social Data
Text & Images
Sensor Data
RFID
DataWarehouse
HadoopStreams
TraditionalSources
NewSources
InformationIntegration,
Governance & Context
Accumulation
Systems Of Record and Systems Of Engagement
Traditional ApproachStructured, analytical, logical
Systems of Record
New ApproachCreative, holistic thought, intuition
Systems Of Engagement
© 2011 IBM Corporation
Information Management
Governance concerns for big data customers
How do I integrate and link my big data
environment with my current one ?
How do I integrate and link my big data
environment with my current one ?
How do I create a trusted view of my
customers and products
for big data ?
How do I create a trusted view of my
customers and products
for big data ?
Is a governed and auditable archive possible
with big data ?
Is a governed and auditable archive possible
with big data ?
How do I cleanse and validate the results of my big
data analysis ?
How do I cleanse and validate the results of my big
data analysis ?
How do I protect data in a big data
environment ?
How do I protect data in a big data
environment ?
Agile. Simple. Trusted
Information.
© 2011 IBM Corporation
Information Management
Governance in an exploratory Big Data environment1. Ensure trust & compliance
•Lineage of data as it enters and leaves the big data system
•Secure the big data systems from breaches
•Create masked dev and test analytics clusters
2. Accelerate time to value•High performance data provisioning•Integrated data integration and stream
analytics platform
3. Lower total cost of ownership•Simplified tooling to improve productivity
of developers and testers•Automated system security •Complete visibility into the data
movement and lifecycle
High Performance and high quality data loads
Secured BigInsights to
prevent any data breaches
Create privatized data in real time or on the cluster to ensure data
protection
Low cost historical archive loaded to Hadoop
for exploratory analytics
Integration for improved segmentation of analytical
data sources
© 2011 IBM Corporation
Information Management
6. Financial
Invest and
define
Motivate and
educate
Incubate and
evaluate
• To private Company for value-added services to citizens
• Pay to private Company for inexpensive services
• Typically cloud-based
• Services free or discounted
• Funded by other parts of the business
• Can be non-profit organisations
Citizens-Pay NS-Pay Businesses-Pay
Business ModelEngagement Model
Information(catalogue and datasets)
NS co-investsAccelerate evolution of ecosystem
Services built & maintained by community on top of open-data
Link Data
Link D
ata,
aggr
egat
e dat
a
Increase value
of open-data
NS
© 2011 IBM Corporation
Information Management
Industrialisation and Collaborative ModelLeverage City Forward model for National Statistics
© 2011 IBM Corporation
Information Management
How safe ismy neighborhood?
Which career isright for me?
What type ofeducation do I need?
Sources: http://www.chicagocitycrime.com/, http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm, http://cityforward.org
Impact on Everyday Life
© 2011 IBM Corporation
Information Management
New Products and IndicatorsEvolving beyond statistics to predictive analytics, sharing complementary datasets with private sector and citizens
Examples:
Predictive models for healthcare cost reduction and outcome optimisation
Epidemic outbreak surveillance – hotspots, progression waves
Aligning public services (federal, regional and city level) to existing and predictive demographic data
© 2011 IBM Corporation
Information Management
21
Example: Traffic Management for Sustainability and Efficiency
Multimodal Data Streams– GPS– Cell-phones (location tracking)– Public Transport (bus, docking)– Pollution measurements– Weather Conditions (including road conditions)– Optical traffic flow detectors– Travel time data based on plate recognition– Induction loop detector data– Accidents in network as they are being recorded– Road closures (road work, etc)– Still pictures from road cameras
Real Time Traffic Monitoring & Information
(Multimodal) Travel Planner
GPSData
Streams
Real Time Transformation
Logic
Real Time Geo
Mapping
Real Time Speed & Heading
Estimation
Real Time Aggregates &
Statistics
DataWarehouseWeb
Server
GoogleEarth
Offlinestatisticalanalysis
Interactivevisualization
Storageadapters
© 2011 IBM Corporation
Information Management
22
Thank You
© 2011 IBM Corporation
Information Management
www.sendsteps.comPrepare to react; keep your phone ready!
TXT 1
2
Text to +316 4250 0030
Type Session <space> WS2 <space> your answer
Internet 1
2
Go to sendc.com
Log in with Session
Posting messages is anonymousNo additional charge per message
3 Type WS2 <space> your answer
© 2011 IBM Corporation
Information Management
What kind of Use-case enabled by Big Data technology do you think will add value to your organisation for calculating official statistics?
Internet Go to sendc.com and log in with Session Type WS2 <space> Your answer
TXT Send to 06 4250 0030: Session Type WS2 <space> Your answer