data analytics in real world (may 2016)
TRANSCRIPT
1 2 3
6 5 4
7 8 9
Master's in Computer Application
Systems EngineerGeneral Manager & Technical Director
Senior ConsultantDevelopment DirectorInnovation & Research Director
Chief Technology Officer
Lead 13 New Products, Features across 30+ Products
Data Driven, Multi-tier, Social Media, Mobile, Cloud, Analytics
Agile, User Centered Design, Lean Startup,
Mindfulness
India
USA
Data Analytics in Real World 2
Challenges for Data Analytics in Real World
Technological
Rapidly evolving Technology Stack
Shift towards Open Source to contain costs
Shift from One standard way of doing things to Contextual use case driven
Shift from On-prem sol to Cloud and Hybrid cloud models
New types of access & usage patterns
Real Time, On- Demand, Exploratory, Internet of Things
Two different types of projects
Production Bread & Butter
Experimental - High unknowns, don’t know what you don’t know
Organizational & Cultural
ROI - lead time for first set of outcomes
Data cleansing & ingestion 80-90%
Lack of Domain Expertise, Not asking or solving for right questions
Learning curve - crucial for successful rollout of project
Data Driven decision making still new
Comfort level with high unknowns
Test driven approach - A/B Testing
Data Analytics in Real World 3
Architectural Patterns & Solutions
Lambda Architecture
Real-time speed layer + Batch Processing layer + Serving Layer
Edge Analytics – Internet of Things
Distributed analytics closer to source
Data Center as a Computer
Cluster computing, dynamic workloads
Blockchain
Distributed ledger, internet of value
Data Analytics in Real World 4
Edge Analytics
Cloudlets with Edge
Analytics
Video
IOT
Automotive
Source: CMU
Data Analytics in Real World 6
Client Server Era
Small Apps, Big Servers
Static Partitioned
Cloud Era
Big Apps, Small Servers, Micro-services
Elastic Partitioned
Data Center as a Computer
Source: Andreesen Horowitz
Data Analytics in Real World 7
Dynamic Workloads Resource Utilization
Distributed Systems Kernel
General Purpose dynamic shared
cluster for multiple workloads
When resources become idle, can be
reused by other schedulers
Source: Apache Mesos
Data Analytics in Real World 8
Blockchain
Decentralized ledger - Protocol underlying
Bitcoin Cryptocurrency
Merkel Tree - Each block: Timestamp +
Crypto Hash of prior blocks + Data
Open + Trust + Secure
Data Integration, provenance, privacy
Internet of Value
Source: Economist.com
Data Analytics in Real World 9
Key Takeaways
Continuous Learning
Interpersonal Skills – Yes, and
Data Driven experimental approach
Contextual Use Case driven technology stack
Automation for rapid iterations and reproducible results
Meditation
Data Analytics in Real World 10
Resources
Lambda Architecture: http://lambda-architecture.net
Edge Analytics: https://www.cs.cmu.edu/~satya/docdir/satya-edge2015.pdf
Apache Mesos Whitepaper: https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
Bitcoin Whitepaper: https://bitcoin.org/bitcoin.pdf
Data Analytics in Real World 12