Download - Implementing a Highly Scalable Stock Prediction System with R, Apache Geode and Spring XD
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SPRINGONE2GXWASHINGTON, DC
Implementing a highly scalable Stock prediction system with R, Apache Geode and Spring XD
Fred Melo@fredmelo_br
William Markito@william_markito
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
About us
Fred Melo
Technical Director for Data
@fredmelo_br
2
William Markito
Enterprise Architect for GemFire
@william_markito
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 3
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 4
It's all about DATA
Data SourcesLook for patterns
Prediction
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
What do we want to build?
5
"Smart System"
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
… in our specific case
6
Trading Data
"Smart System"
Historical Data Repository
Learns with historical trends"How were the medium average price and relative strength reading when the latest failures happened? "
Live data becomes historical over time
Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"
Historical
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
… in our specific case
7
Trading Data
"Smart System"
Historical Data Repository
Learns with historical trends
"How were the medium average price and relative strength reading when the latest failures happened? "
Live data becomes historical over time
Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"
Historical
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 8
Live Data
Data Temperature
Hot
Cold
Greenplum DB
Apache Geode / GemFire1- Live data is ingested into the grid
3 - Results are pushed immediately to deployed applications
4 - “Hot" data ages, becoming part of the historical dataset
Machine Learning model 5 - Re-training is triggered,
updating the model with the latest historical data
Spring XD
Spring XD
The ML pipeline data flow
2 - Trained ML model compares new data to historical patterns
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 9
Live Data
Apache Geode / GemFire1- Live data is ingested into the grid
2 - Trained ML model compares new data to historical patterns
3 - Results are pushed immediately to deployed applications
Machine Learning model
4 - Re-training is triggered, updating the model with the latest historical data
Spring XD
Spring XD
Simplified demo model Data Temperature
Hot
Warm
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 10
Transform Sink
SpringXD
ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 11
Eating it in small bites…
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 12
SpringXD GemFire
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
• Cache
• Configurable through XML, ,Java
• Region
• Distributed j.u.Map on steroids
• Highly available, redundant
• Member
• Locator, Server, Client
• Callbacks
• Listener, Writer, AsyncEventListener, Parallel/Serial
Apache Geode & GemFire Concepts
13
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Geode & GemFire, why ?
• Performance
• Consistency
• Resiliency
14
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Geode & GemFire, why ?
15
© Copyright 2014 Pivotal. All rights reserved.
Pivotal GemFire High Availability and Fault Tolerance in 6 acts
Failing data copies are replaced transparently
Data is replicated to other clusters and sites (WAN)
Network segmentations are identified and fixed automatically
Client and cluster disconnections are handled gracefully
Data is persisted on local disk for ultimate durability
“split brain”
Failed function executions are restarted automatically
restart
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Some interesting cases…
16
China RailwayCorporation
5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second
* http://pivotal.io/big-data/pivotal-gemfire
Indian Railways
7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Use cases and industries
17
Indian RailwaysChina Railway Corporation
World: ~7,349,000,000
~36% of the world population
Population: 1,251,695,6161,401,586,609
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
• Commercial product available since 2004
• Native clients in Java, C++, C#, REST
• Event Subscriptions and Continuous Queries
• Configurable WAN Gateway between clusters
• Enterprise Support, commercial features
Apache Geode & Pivotal GemFire
• Open Sourced in April/2015
• Java Native Client, REST
• 98% of GemFire API
• Event subscriptions
• ~30 contributors
• Under Incubation
18
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 19
SpringXD GemFire
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SpringXD Basic Concepts
• Streams
• Pipelines
• Sources
• Sinks
• Filters
• Taps
20
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SpringXD Basic Concepts
21
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
A simple example
22
twittersearch --consumerKey=XXX —consumerSecret=XXX --query=SpringOne2GX --outputType=application/json | gemfire-json-server --useLocator=true --host=localhost --port=10334 --regionName=tweets --keyExpression=payload.getField('id_str')
twittersearch --query=SpringOne2GX | gemfire-json-server --host=localhost--regionName=tweets
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 23
SpringXD GemFire
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Spark Concepts
•RDD
•Dataframe
•Driver
•Worker
24
"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Spark Concepts
•RDD
•Dataframe
•Driver
•Worker
25
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 26
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Machine Learning Model (e.g. Linear Regression)
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 27
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Machine Learning Model (e.g. Linear Regression)
Features Label
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 28
Transform Sink
SpringXD
ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 29
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Learn more!
30
https://github.com/Pivotal-Open-Source-Hub/geode-security-sampleshttps://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoThttps://github.com/Pivotal-Open-Source-Hub/geode-social-demo
http://pivotal-open-source-hub.github.io/StockInference-Spark/
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Thank you
31
@william_markito @fredmelo_br
Related: Building Highly-Scalable Spring Applications with In-Memory, Distributed Data Grids
by John Blum & Luke ShannonSeptember 15, 2015 -10:30 - Salon M
http://pivotal-open-source-hub.github.io/StockInference-Spark/