TURNING INFORMATION CHAOS INTO RELIABLE DATA
Nannette Kelly - Northrop Grumman
Roderick McLean - Lockheed Martin
William Patrick, Sr. – Northrop Grumman
Brian Keller – Booz Allen Hamilton
February 27, 2014
Information Overload• Data creation/delivery exceeding standard management
tools• Volume, variety, velocity, and variability• Interesting facts:
– Every 6 hours, the NSA gathers as much data as is stored in the entire Library of Congress
– Facebook’s photo collection has over 140 billion photos– In 2012, every day 2.5 quintillion bytes of data created, with
90% of the world’s data created in the last two years alone– Twitter averages 500 million tweets per day
Actionable Information
Collection
Analysis Evolution
Processing AnalysisData
Actionable Information
Business Intelligence
• Derived from diverse, dynamic sources
• Derived from stable, fixed sources
Data Analytics
• Varied types
• Fixed types
• Iterative
• Serial
• Pattern Analysis
From
To
ReportingBig Data
Business Relevance• Provides customer/environmental insights• Establishes a competitive advantage• Shapes marketing strategies• Reduces uncertainty• Enables optimization• Improves decision making• Increases productivity
Provides Critical Information to Drive Positive Business Outcomes
Ref. Turn Information into a Strategic Asset - SAP
Data Management Framework
ObjectivesStrategy1
Process/MethodsOrganization
Controls2 3
Data Architecture
Applications
System4
5
• Define objectives; Confirm data strategy alignment to business strategy
• Define process/data owners, roles, and responsibilities
• Define data usage in analysis, process control, and business management. Establish processes to monitor and ensure data quality.
• Develop data structures to address company-wide requirements
• Select, design, and implement software applications to accomplish strategic objectives
1
2
3
4
5
Strategy
• Define key business objectives or problems to solve
• Clarify data required for strategic choices
• Identify what’s required to establish a competitive advantage
Acquire, Grow, Retain Customers
Create New Business Models Improve IT Economics
Manage Risks Optimize Operations and Reduce Fraud
Transform Financial Processes
Ref. IBM Use Cases (IBMbigdatahub.com)
Controls• Proactively secure data and comply with
privacy regulations• Understand retention requirements• Incorporate Data Quality Management and
define quality metrics• Document organization roles/responsibilities• Define data reporting, access and latency
requirements• Establish analytics driven business processes• Fight bureaucracy and organizational silos
Data Architecture
• Categorize data and usage– Content format: structured, semi-structured, or
unstructured – Type: transactional, meta data, – Analysis: real-time or batch– Processing methodology: predictive analysis, analytical,
query/reporting– Data source: web, machine generated, data entry, etc.
• Define data structures to support cross-business needs
• Document data definitions
Applications
Acquisition
Data Management
Visualization
Analytics
• R• Python• SQL• MapReduce/
Hive/Pig
• Flat Files• Relational
Databases• Hadoop/NoSQL• MongoDB
• Jpg/png• BI (Spoyfire,
Jaspersoft)• Web Apps (ext-js,
d3.js)
• Web Crawlers• Social Media• Network Logs• Sensor Networks• SAP
Various Toolsets are Available to Fulfill Data Intelligence Needs
http://datacommunitydc.org/blog/2013/05/stepping-up-to-big-data-with-r-and-python/
http://datacommunitydc.org/blog/2013/05/stepping-up-to-big-data-with-r-and-python/
Approach Ease of Learning
Availability on Systems
Analysis Flexibility
Java
Hive
Pig
CommercialTools
Streaming frameworksStreaming
Various Choices Available to Implement Analytics…
Also works outside of Hadoop with no code changes!
Implementation Methodology
Problem Statement
What data is available to work with?
Pain Points
Where is data located?
What architecture to support data?
What does Customer seek to accomplish?
Legal & Compliance Regulations
Security Concerns
Budget, Resource
Reductions
What type of analytics used,
needed?
Existing Tools, Custom Code
Data Analytics
Visualization
Predictive Modeling
Infrastructure
Deliver, Train
Motivation/ Constraints
Business Discovery
Data Discovery Build Decision
Support
Existing Data ArchitectureLimitations
Architect
Data Mining, Scientist
Techniques
Infrastructure Architecture
Ingest Data Process
Tools/Product Selection
Design
Data Architecture
Organization’s Culture
Data Ecosystem
What additional data
is required?
Analytics, VisualizationPresentation
Continuous Improvement
Data Exploration
Action Planning
ResultEvaluation
Market Pressures &
Mission Expansion
New/Changing Operational
Reqmts
Summary
• Begin with the end in mind• Incorporate controls to drive data quality• Protect the data