© 2015 IBM Corporation
© 2015 IBM Corporation
Agenda Introduction to Streams
Use Cases / References / Samples
Demo
© 2015 IBM Corporation3
IBM InfoSphere Streams for Context-Aware Stream ComputingExperience the power of now: secure, continuous, dynamic
Real-Time Action
Context-Aware
AnalyticsData
AcquireBroadest range of data types
AnalyzeContinuous multimodal analytics
ActRight time, right method
© 2015 IBM Corporation
5
Why InfoSphere Streams?
Integration with existing architectures
Privacy built in
IBM services and support
Top Performance Real-Time Analytics
Enterprise Ready Context Awareness
TextGeospatialImage/VideoAcousticStatisticalNatural language processingTime seriesStatistics/Mathematics Predictive
Allows building context and profiles of entities and correlating streaming data with contextual information. Lookup historical data in databases and Big Data repositoriesNot just looking at each event or a small collection of events independently
Telco: 200 B messages / dayTrade Application: 5.7 million messages per second, 30 micros second latencyLow CPU and Memory footprint
© 2015 IBM Corporation
Introduction to Stream Processing
Incremental tuple by tuple processing
FX rate
Internal Crossing
Weather
Exchange
Value Added Feed
Stream
Tuple
Operator
© 2015 IBM Corporation
InfoSphere Streams Application Pattern
© 2013 IBM Corporation7
Ingest Prepare Detect and Predict
Decide Act
Store
Transform
Filter
Correlate
Aggregate
Enrich
Classification
Patterns
Anomalies
Scoring
Business Rules
Conditional Logic
Notify
Publish
Execute
Visualize
Sensors
Social
Machine Data
Location
Audio
Video
Text
Warehouse, Hadoop, Operational Store, Files
© 2015 IBM Corporation 8
InfoSphere Streams OverviewIntegrated Development Environment
Scale-Out Runtime Analytic Toolkits
Development and Management Functional and OptimizedFlexibility and Scalability
Cloud and on premise available for flexible deployment
© 2015 IBM Corporation 9
Development Environment
Integrated Development Environment
Development and Management
Streams Processing Language
Visual Composition Tools
© 2015 IBM Corporation
Integration with Analytic Tools
10
Integrated Development Environment
Development and Management
Streaming to Excel
Wrappers for legacy code written in Java, C++, Python, R, and Matlab
© 2015 IBM Corporation
Integration with Languages
11
Integrated Development Environment
Development and Management
Wrappers for legacy code written in Java, C++, R, and Matlab
© 2015 IBM Corporation
Monitoring and Debugging Support
12
Integrated Development Environment
Development and Management
Web based Monitoring Console
© 2015 IBM Corporation 13
RuntimeScale-Out Runtime
Flexibility and Scalability
•High-performance clustered runtime•Large scale deployment •RHEL, CentOS, SUSE Linux Enterprise Server •X86 and Power multicore hardware•InfiniBand support•Ethernet support
© 2015 IBM Corporation
Streams Runtime Illustrated
x86 host x86 host x86 host x86 host
Optimizing scheduler assigns jobs to hosts, and continually manages resource allocationOptimizing scheduler assigns jobs to hosts, and continually manages resource allocation
Commodity hardware – laptop, blades or high performance clustersCommodity hardware – laptop, blades or high performance clusters
MetersCompany Filter
Usage Model
Usage ContractText Extract Season Adjust Daily
Adjust
Temp Action
© 2015 IBM Corporation
Streams Runtime Illustrated
x86 host x86 host x86 host x86 host x86 host
Optimizing scheduler assigns PEs to hosts, and continually manages resource allocationOptimizing scheduler assigns PEs to hosts, and continually manages resource allocation
Commodity hardware – laptop, blades or high performance clustersCommodity hardware – laptop, blades or high performance clusters
MetersCompany Filter
Usage Model
Usage Contract
Temp Action
Dynamically add hosts and jobsDynamically add hosts and jobs
New jobs work with existing jobsNew jobs work with existing jobs
Text Extract Degree History
Compare History
Store History
Meters
Season Adjust Daily Adjust
Text Extract
© 2015 IBM Corporation
Runtime: Advanced Features
16
Scale-Out Runtime
Flexibility and Scalability User Defined ParallelismApplication Resiliency
System High Availability
© 2015 IBM Corporation
Tooling
Domain Metadata Catalog
Instance Services
Host Controller
PEC
PEC
Host
Instance
Domain
Host Controller
PEC
PEC
Host
Instance Services
Host Controller
PEC
PEC
Host
Instance
Host Controller
PEC
PEC
Host
Instance Metadata Catalog Instance Metadata Catalog
Domain Services
Streams Domain
© 2015 IBM Corporation© 2014 IBM Corporation
Automated System High Availability “Without specialized HA skills, an administrator can quickly and easily configure Streams to be resilient and use a single console to manage multiple instances with common users and hosts.”
New next generation architecture◦ Simpler Setup & Administration◦ More Secure◦ More Resilient◦ More Automatic◦ More Dynamic◦ New JMX API
18
© 2015 IBM Corporation
Service A(leader)(standby)
Service A
Scenario 1: Management Host Failure
Services are running with a HA Count of 3
A Host failure is detected
If a Service on the Host was the leader, a standby takes over
A replacement service is started
Another Host becomes available and is tagged for management services
The Services are load balanced across the management hosts
Resource A Resource B Resource C
Service A Service A
“Management” “Management” “Management”
(standby) (standby)(leader)
Service A(standby)
Resource D“Management”
AUTOMATIC
© 2015 IBM Corporation
Scenario 2: Application Host Failure
An Applications PEs are running across several Hosts
A Host failure is detected
PEs are started on alternative application Hosts
Streams are reconnected
Resource A Resource B Resource C“Application” “Application” “Application”
Source
Source
Sink 1
Sink 2
Op 2
Op 1 Op 1
AUTOMATIC
© 2015 IBM Corporation
Analytic Toolkits
21
Analytic Toolkits
Functional and Optimized
© 2015 IBM Corporation
Toolkits and Accelerators to Speed Up Development
Standard ToolkitRelational Operators
Filter Sort Functor JoinPunctor Aggregate
Adapter OperatorsFileSource UDPSourceFileSink UDPSinkDirectoryScan Export TCPSource ImportTCPSink MetricsSink
Utility OperatorsCustom SplitBeacon DeDuplicateThrottle Union Delay ThreadedSplitBarrier DynamicFilterPair GateJavaOp SwitchParse FormatDecompress CharacterTransform
XML OperatorXMLParse
IBM Supported ToolkitsDatabase DataStageBig Data Data ExplorerMessaging InternetText Analytics MiningSPSS CEPTime Series GeospatialFinancialRKafka MLlib
Open-Source ToolkitsJSON HTTP/RESTOpenCV AccumuloHbase Documents ….
User-Defined ToolkitsExtend the language by adding user-defined operators, types,and functions
© 2015 IBM Corporation
IBM Streams Community Github (http://ibmstreams.github.io/)
◦38 Active Open-Source Projects!◦More Agile◦Decouple from product release cycle◦In-sync with open-source ecosystem
◦ streamsx.hbase◦ streamsx.thrif◦ streamsx.mongoDB◦ streamsx.document
◦ Streamsx.inet◦ Streamsx.hdfs◦ Streamsx.messaging
◦ Streamsx.json◦ Streamsx.bytes◦ Streamsx.datetime◦ Streamsx.mathIBM Streams Github - http://ibmstreams.github.io/
StreamsDev - https://developer.ibm.com/streamsdev/
© 2015 IBM Corporation
Use Cases / References
University of Ontario Institute of Technology (UOIT) uses big data to improve quality of care for neonatal babies
Need
• Performing real-time analytics using physiological data from neonatal babies
• Continuously correlates data from medical monitors to detect subtle changes and alert hospital staff sooner
• Early warning gives caregivers the ability to proactively deal with complications
Benefits
• Detecting life threatening conditions 24 hours sooner than symptoms exhibited
• Lower morbidity and improved patient care
2525
© 2015 IBM Corporation
Operations Analysis
Only vendor combining at-rest vehicle data with real time data-in-use from vehicles for single, integrated view and analysis within and outside of Hadoop environment
• Predict demand for replacement parts and service
• Monetize telematics data
• Provide drivers assistance
Advanced Condition Monitoring
26
Bharti Airtel reduces billing costs and improves customer satisfaction.
Need
Could not achieve real time billing which required handling billions of Call Detail Records (CDR) per day and de-duplication against 15 days worth of CDR data
Benefits• Real-time mediation and analysis of 5B CDRs per day
• Data processing time reduced from 12 hrs to 1 min
• Hardware cost reduced to 1/8th
• Proactively address issues (e.g. dropped calls) impacting customer satisfaction.
2727Home
© 2016 IBM Corporation
“IBM Analytics gave us the power to break the big stories first, driving greater fan engagement.”
—Alexandra Willis, Head of Digital and Content, AELTC
Wimbledon 2015Real-time analytics helps share the moments that matter with tennis fans worldwide
Business challengeTo keep tennis fans engaged with its coverage of The Championships 2015, the AELTC wanted to use real-time match data to attract fans’ attention and encourage them to visit its digital platforms.
TransformationReal-time analysis of match data instantly notifies the AELTC’s content team about key events such as record serves or career milestones, helping them break news online before competitors can react.
Alexandra Willis,Head of Digital and Content, AELTC
Known to millions of fans simply as “Wimbledon”, The Championships is the oldest of tennis’ four Grand Slams, and one of the world’s highest-profile sporting events. Organized by the All England Lawn Tennis Club (AELTC) it has been a global sporting and cultural institution since 1877.
Business benefits:
Brokenews of records and milestones within seconds, faster than competing media
Boostedfans’ engagement with the tournament by sharing the moments that mattered
71 millionvisits to wimbledon.com proved the success of the digital strategy
Solution components• IBM Streams • IBM SPSS® Modeler• IBM Emerging Technology Services• IBM Global Business Services®
Share this
Media & Entertainment
© 2015 IBM Corporation© 2013 IBM Corporation
CenterPoint Energy Advanced Metering System Deployment
29
© 2015 IBM Corporation
Energy companies stand to save millions by using the power of Big Data to predict ice floes
Need
• 25% of the world’s remaining oil reserves are present in the Arctic where the harsh and hostile environment presents significant challenges for energy companies like ConocoPhillips
• High Arctic exploration operations will require a sophisticated solution that applies various algorithms, analytics, and simulated models of
satellite and metocean data to detect, track and forecast ice floe trajectories.
Benefits
• Anticipates saving roughly USD300 million per season by reducing drilling mobilization costs
• Estimates savings of USD1 billion per production platform by optimizing design requirements and ice management operations
30
Only vendor providing a comprehensive Big Data platform supporting the world’s largest oil exploration and production company’s Global Analytics Platform initiative
This company is in the process of implementing a multi-step roadmap to gain visiblity to trusted information and enabling analytics capability across the enterprisee. Achieved $2M IT savings during the initial implementation of the combined Composite* Information Server and PureData for Analytics solution. Expanding into a wider set of enabling analytics to include both data-in motion and data-at-rest analytics across different business use cases.
* IBM Business Partner Composite Software on Infosphere Server 31
© 2015 IBM Corporation
Demo