real-time, geospatial, maps by neil dahlke
TRANSCRIPT
Real-Time, Geospatial, Maps
Neil Dahlke
29 June 2016
Agenda
2
▪PowerStream▪Supercar▪Q&A▪Drinks
Renewable Energy
in the News
BCC: http://www.bbc.com/news/science-environment-36420750
Investment in renewablesreached $286 billion worldwide
in 2015
Germany Just Got Almost All of Its Power From Renewable Energy
May 15, 2016
Bloomberg: http://www.bloomberg.com/news/articles/2016-05-16/germany-just-
got-almost-all-of-its-power-from-renewable-energy
Denmark is aiming for 50% renewable energy sources within the next five yearsIndependent: http://www.independent.co.uk/environment/germany-just-got-almost-all-of-its-power-from-renewable-energy-a7037851.html
42% of electricity produced from wind turbines in 2015
The Guardian: http://www.theguardian.com/environment/2016/jan/18/denmark-broke-world-record-for-wind-power-in-2015
Portugal Runs for Four Days Straight on Renewable Energy Alonehttp://www.theguardian.com/environment/2016/may/18/portugal-runs-for-four-days-straight-on-renewable-energy-alone
22% of electricityprovided by wind in 2015
MemSQL PowerStreamPredicting the global health of wind turbines
Sensors
Wind Turbine Wind Farm
MemSQL PowerStream197,000 wind turbines around the world
1 to 2 million data points per secondwith MemSQL Streamliner
Simulation Details
11
Data producers (Python programs) push to Kafka▪1M data points per second from 200k turbines▪Generated sensor data is based on predetermined turbine failure
modelTransform models individual turbine (2 components per turbine) failures w/ machine learning, determining: How fast is the turbine deteriorating? How bad does the turbine get before being
repaired?
How does it work?
REAL-TIME INPUTS
REAL-TIMEAPPLICATION
Demo Architecture and Data Flow
13
REAL-TIME INPUTS
REAL-TIMEAPPLICATION
Demo Architecture and Data FlowSimulated sensor data is written to Kafka
14
Extract
REAL-TIME INPUTS
StreamlinerREAL-TIME
APPLICATION
Demo Architecture and Data FlowSimulated sensor data is written to KafkaStreamliner Extractor pulls data from Kafka into Spark
15
Extract, Transform
REAL-TIME INPUTS
StreamlinerREAL-TIME
APPLICATION
Demo Architecture and Data FlowSimulated sensor data is written to KafkaStreamliner Extractor pulls data from Kafka into SparkStreamliner Transformer then “scores” the failure model (ML algorithm)
• Failure model is scored through performing a regression on incoming sensor data values
16
Extract, Transform, Load
REAL-TIME INPUTS
StreamlinerREAL-TIME
APPLICATION
Demo Architecture and Data FlowSimulated sensor data is written to KafkaStreamliner Extractor pulls data from Kafka into SparkStreamliner Transformer then “scores” the failure model (ML algorithm)
• Failure model is scored through performing a regression on incoming sensor data valuesStreamliner Loader inserts the data into MemSQL
17
Cluster Architecture
18
Aggregator Nodes
Leaf Nodes
Cluster Architecture
19
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
ZooKeeperSpark Master
Internet-of-Things simulation depicting
health of wind turbines globally.
8 machines - AWS C4-2X large instances, at $0.311 per hour per machine,
annual cost ~ $22,000.
Cluster Architecture
20
Visual Layer
21
▪MemSQL data is rendered in a web UI• Turbine Health (green, yellow, red)
▪Draw positions of turbines on a MapBox map• A geospatial query is sent to MemSQL each time the map
view is moved▪Alerts based on predicted turbine health▪Data points shown on the UI map are all from real-time
queries• Real-time in this case = 1 second interval
Demo
The On-Demand
Economy
24
MemSQL Supercar
Real-time asset tracking and analysis
We live in an on-demand economy
Consumers are conditioned to instant services, like Uber, Stripe, and Airbnb
Where does that leave enterprises?
Racing to meet internal and external expectations for speed and personalization
Batch processing in the enterprise enemy
Enterprises must move from overnight to real-time, intra-day operations
Cluster Architecture
▪One single 16 core machine w/ 64 GB RAM is enough to handle all of the data in real time. ▪That’s really it
Data ProducerKafkaSpark
MemSQL AggMemSQL Leaf
ZooKeeperSpark Master
31
Simulation Details▪NYC Taxi and Limo Commission Trip Record Data
• Downloads available each year fo’ free
▪Simulation utilizes dataset from NYE (one of the busiest days for cabs in NYC)
▪Drivers are assigned pickups and dropoffs from real data set
▪Routes are replayed over time
32
Extract, Transform, Load
REAL-TIME INPUTS
StreamlinerREAL-TIME
APPLICATION
Demo Architecture and Data FlowSimulated driver data is written to KafkaStreamliner Extractor pulls data from Kafka into SparkStreamliner Transformer parses the CSV and transforms it to a Spark DataFrameStreamliner Loader inserts the data into MemSQL
33
Demo
Q&A
Resources▪Powerstream blog post
http://blog.memsql.com/powerstream-demo/
▪Powerstream recordinghttps://youtu.be/DhP324uNZMI?t=589
▪Supercar blog posthttp://blog.memsql.com/real-time-geospatial-intelligence-with-supercar/
▪Supercar recordinghttps://www.youtube.com/watch?v=2txICCLUV-Y
▪Today’s talks will be published soon.
36
Thank You