stream processingfiles.meetup.com/18978602/pune_apex_meetup_03feb_2016_strea… · •open source...
TRANSCRIPT
![Page 1: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/1.jpg)
Stream ProcessingKey Driver for Enabling Instant Insights on Big Data
![Page 2: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/2.jpg)
Pritesh Maker
• Background• Presently leading Engineering at DataTorrent• Over a decade of experience in Data Management technologies including Data
Integration, Data Virtualization and Data Quality & Profiling • Past Roles include leading Engineering at Informatica for their Big Data
Management products and core Data Engine • Interested in All Things Data!
• Education• BS in Computer Science from University of Texas at Austin• MBA from Haas School of Business, University of California at Berkeley
• Connect with me• LinkedIn: https://www.linkedin.com/in/priteshmaker• Twitter: @priteshmaker
![Page 3: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/3.jpg)
Why is Stream Processing Vital?
![Page 4: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/4.jpg)
SOURCE DATA
MS Queue’s
Events
XML Files
Databases
Sensor data
Social
Enterprise
Repositories
RDBMS
EDW
NoSQL
Feed m
Feed 2
Feed 1
Load
(Optional) Staging Area
Traditional Analytics – Data at Rest
Business Analytics
Business Intelligence
Visualization Tools
Vis
ual
ize
Analyze
Extract Transform
Feed n
Feed 2
Feed 1
Visualize
![Page 5: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/5.jpg)
Next Generation – Data in Motion • Organizations need to react to changing business conditions in real time
• Faster decision making across all industries • Few companies outside of financial markets, telecom & utilities have experience with
streaming
• Newer data sources – like sensors, social media feeds • Higher Volume and Greater Velocity • More unstructured and semi-structured data
• Democratization of technologies • Open Source Projects • Large Scale Compute & Storage – Hadoop, NoSQL• Streaming Technologies – Apex, Spark, Storm etc. • Real-time dashboards and alert notification systems
• Beyond niche use cases • Broad applicability but needs more adoption
![Page 6: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/6.jpg)
Stream vs. Batch Processing Pipelines
Ingest
Archive
Transform
Normalize
Transform Analyze ActionVisualize/
PersistIngest
Stream Processing Data Pipeline
Batch Processing Data Pipeline
Extract Transform Load Analyze Action
![Page 7: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/7.jpg)
Stream Processing•Continuous processing on data as it flows through a
system•Allows users to act on events instantaneously via
alerts•Processing related to time (event time vs. processing
time)• Real-Time – diff between event time and processing
time is negligible
Enables your Data In Motion Architecture
![Page 8: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/8.jpg)
Big Data Application Types
Data Discovery
Da
ta v
elo
cit
y
IoT
Fraud
CDR
CDC
Reporting
SQL
Operations
Data Discovery
SQL on
Streams
Streaming
Disovery
Ad Hoc
Query
Batch
Processing
Stream
Processing
Stream
Processing
![Page 9: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/9.jpg)
Sample Streaming Analytics Patterns
Preprocessing
• Filtering events
• Transforming attributes
Alerts & Thresholds
• Based on complex conditions
Computing within Windows
• Aggregations
Combining Event Streams
• Correlation
• Error detection
Enrichment
• Looking up database, reference data
Temporal Events
• Detecting events within time windows
Tracking
• Tracking events over space & time
Trend Detection
• Rise, Fall
• Outliers
Source: https://iwringer.wordpress.com/2015/08/03/patterns-for-streaming-realtime-analytics/
![Page 10: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/10.jpg)
Stream Processing Use Cases
![Page 11: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/11.jpg)
Financial Services
• Detect fraudulent activity in real-time
• Risk Analysis
• Deliver personalized products and
offerings
• Make decisions in real-time for trading
and transactional platforms
![Page 12: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/12.jpg)
Financial services big data fabric
Secure, fault tolerant, data
ingestion, formatting & archiving.
Data access layer for application
processing
Financial Data
SMTP Logs
Historical
Application n
Application 1
Persistent
Encrypt Compliance Alert on error
Archive
![Page 13: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/13.jpg)
Telecom
• Real-time network monitoring and
protection
• Quality of service and Customer
Satisfaction
• Take action based on users’ location
• Automatic resource allocation and load
balancing
![Page 14: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/14.jpg)
Online Advertising
• Dynamic bidding
• Real-time targeting & personalization
• Maximize click-through and
conversion rates.
• Reporting that can be updated
continuously
![Page 15: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/15.jpg)
Online advertising dynamic inventory purchases
High volume auto-scaling fault
tolerant event stream.
Dimensional computing to identify
performing ads.Ad Server 1
Ad Server 800
Real-time
Dashboard
Ad Placement
Strategy
Oracle DB
Fault-Tolerant
Flume
In-memory
analytic cube
Campaign
Analysis
![Page 16: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/16.jpg)
Internet of Things
• Environment monitoring
• Infrastructure management
• Manufacturing
• Energy management
• Public Building & Home automation
• Transportation
![Page 17: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/17.jpg)
IoT secure ingestion and predictive analysis
High performance, multi-customer
secure, data ingestion. Complex
event processing with historical
data for predictive maintenance
Sensor 2
Sensor 1
Sensor N
Application n
Application 1
Persistent
Data
Governance
Complex
Event Process
Predictive
maintenance
![Page 18: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/18.jpg)
Stream Processing: Conclusion
• Lots of untapped potential!• Gives your business a competitive edge!
• Open Source and Big Data technologies • Built to address the scale and latency
demands
• Broad use cases • Across industries and verticals
![Page 19: Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies](https://reader035.vdocuments.site/reader035/viewer/2022070802/5f02cc2b7e708231d4060f59/html5/thumbnails/19.jpg)