extracting real-time insights from streaming data · dilip kumar, vp of technology for amazon go,...
TRANSCRIPT
![Page 1: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/1.jpg)
Extracting Real-Time Insights from Streaming Data
Roger Barga, GM
Robotics & Autonomous Systems
Amazon Web Services
![Page 2: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/2.jpg)
#Real-time
![Page 3: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/3.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
Process Real-Time Streaming Data
Key Requirements?
Ingest Transform Analyze React Persist
![Page 4: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/4.jpg)
Amazon CloudWatch Logs
Amazon S3 Events
AWS Metering Amazon.com’s Catalog
We All Use Kinesis
![Page 5: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/5.jpg)
At Hearst, the Clickstream is the real-timetransmission, collection, and processing of thedata generated by Hurst customers across all theirdevices
Action Data
Scrolling or Mouse movement
Event Data
Highlighting Text, Listening to Audio, Playing Video
Geospatial Data
Lat/Lon, GPS, Movement, Proximity
Sensor Data
Pulse, Gait, Body Temperature
![Page 6: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/6.jpg)
What business is Hearst in?
Magazines20 U.S. titles & nearly 300 international titles
![Page 7: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/7.jpg)
What business is Hearst in?
Newspapers15 daily & 34 weekly titles
![Page 8: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/8.jpg)
What business is Hearst in?
Broadcasting30 television & 2 radio stations
![Page 9: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/9.jpg)
What business is Hearst in?
Business MediaOperates more than 20 business-to businesses with significant holdings in the auto, electronic, medical
and financial industries
![Page 10: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/10.jpg)
What business is Hearst in?
Hearst has over 300 websites world-wide, which results in 1TB of data per day and over 20 billion
pageviews per year.
![Page 11: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/11.jpg)
What business is Hearst in?
“Hearst is in the Data Creation Business”
![Page 12: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/12.jpg)
Buzzing@Hearst
![Page 13: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/13.jpg)
Business Value
Real-Time Reactions Instant feedback on articles from their audiences
Promoting Popular Content Cross-ChannelIncremental re-syndication of popular articles across properties(e.g. trending newspaper articles can be adopted by magazines)
Authentic InfluenceInform Hearst editors to write articles that are more relevant to their audiences
Understanding EngagementInform both editors what channels their audiences are leveraging to read Hearst articles
INCREMENTAL REVENUE
25% more page views
15% more visitors
![Page 14: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/14.jpg)
Customer Examples
1 billion events per
week from
connected devices
Near-real-time
home valuation
(Zestimates)
Live clickstream
dashboards refreshed
under 10ms
100 GB/day
clickstreams from
250+ sites
50 billion daily ad
impressions, sub-
50 ms responses
Online stylist
processing 10
million events/day
Migrated data bus
from Kafka to
Kinesis
Facilitate
communications
between 100+
microservices
Analyze billions of
network flows in
real-time
Real-time home
recommendations
![Page 15: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/15.jpg)
Stream New Data in Seconds Get actionable insights quickly
Streaming
Ingest video & data as it’s generated
Process data on the fly
Real-timeAnalytics/ML, alerts, actions
![Page 17: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/17.jpg)
Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of cameras to track the shopping activities of customers. The innovations his team concocted helped influence an AWS service called Kinesis, which allows customers to stream video from multiple devices to the Amazon cloud, where they can process it, analyze it, and use it to further advance their machine learning efforts.
![Page 18: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/18.jpg)
![Page 19: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/19.jpg)
Speed Matters
![Page 20: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/20.jpg)
![Page 21: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/21.jpg)
Kinesis Streaming Services
Robust Random Cut ForrestSummary of a dynamic data stream, highly efficient, wide number of use cases…
![Page 22: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/22.jpg)
![Page 23: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/23.jpg)
Random Cut Tree
Recurse: The cutting stops when each point is isolated.
Range-biased Cut
0 10
5
![Page 24: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/24.jpg)
Random Cut Forest
Each tree built on a random sample.
…
![Page 25: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/25.jpg)
Random Sample of a Stream
Reservoir Sampling [Vitter]
Maintain random sample of 5 points in a stream?
Keep heads with probability
Discard tails with probability
56
16
![Page 26: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/26.jpg)
Insert – Case I
Start with the Root
If the point falls inside the bounding box
follow the path to the appropriate child
![Page 27: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/27.jpg)
Insert – Case II
Theorem: Insert generates a tree T’ ~ T( )
![Page 28: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/28.jpg)
What is an Outlier?
![Page 29: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/29.jpg)
Anomaly Score: Displacement
A point is an anomaly if its insertion greatly increases the tree size ( = sum of path lengths from root to leaves
= description length).
Inlier:
![Page 30: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/30.jpg)
Anomaly Score: Displacement
Outlier
![Page 31: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/31.jpg)
NYC Taxi Ridership
0
5000
10000
15000
20000
25000
30000
20
14-1
2-0
1 0
0:0
0:0
0
20
14-1
2-0
1 0
4:0
0:0
0
20
14-1
2-0
1 0
8:0
0:0
0
20
14-1
2-0
1 1
2:0
0:0
0
20
14-1
2-0
1 1
6:0
0:0
0
20
14-1
2-0
1 2
0:0
0:0
0
20
14-1
2-0
2 0
0:0
0:0
0
20
14-1
2-0
2 0
4:0
0:0
0
20
14-1
2-0
2 0
8:0
0:0
0
20
14-1
2-0
2 1
2:0
0:0
0
20
14-1
2-0
2 1
6:0
0:0
0
20
14-1
2-0
2 2
0:0
0:0
0
20
14-1
2-0
3 0
0:0
0:0
0
20
14-1
2-0
3 0
4:0
0:0
0
20
14-1
2-0
3 0
8:0
0:0
0
20
14-1
2-0
3 1
2:0
0:0
0
20
14-1
2-0
3 1
6:0
0:0
0
20
14-1
2-0
3 2
0:0
0:0
0
20
14-1
2-0
4 0
0:0
0:0
0
20
14-1
2-0
4 0
4:0
0:0
0
20
14-1
2-0
4 0
8:0
0:0
0
20
14-1
2-0
4 1
2:0
0:0
0
20
14-1
2-0
4 1
6:0
0:0
0
20
14-1
2-0
4 2
0:0
0:0
0
20
14-1
2-0
5 0
0:0
0:0
0
20
14-1
2-0
5 0
4:0
0:0
0
20
14-1
2-0
5 0
8:0
0:0
0
20
14-1
2-0
5 1
2:0
0:0
0
20
14-1
2-0
5 1
6:0
0:0
0
20
14-1
2-0
5 2
0:0
0:0
0
20
14-1
2-0
6 0
0:0
0:0
0
20
14-1
2-0
6 0
4:0
0:0
0
20
14-1
2-0
6 0
8:0
0:0
0
20
14-1
2-0
6 1
2:0
0:0
0
20
14-1
2-0
6 1
6:0
0:0
0
20
14-1
2-0
6 2
0:0
0:0
0
20
14-1
2-0
7 0
0:0
0:0
0
20
14-1
2-0
7 0
4:0
0:0
0
20
14-1
2-0
7 0
8:0
0:0
0
20
14-1
2-0
7 1
2:0
0:0
0
20
14-1
2-0
7 1
6:0
0:0
0
20
14-1
2-0
7 2
0:0
0:0
0
numPassengers
Mon
8am
6pm
4pm
Sat
11pm11am
Tue Wed Thu Fri
Date aggregated every 30 minutes,Shingle Size: 48
1Public Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
![Page 32: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/32.jpg)
NYC Taxi Data
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
20
14-0
9-1
6 2
1:3
0:0
020
14-0
9-1
9 1
3:0
0:0
020
14-0
9-2
2 0
4:3
0:0
020
14-0
9-2
4 2
0:0
0:0
020
14-0
9-2
7 1
1:3
0:0
020
14-0
9-3
0 0
3:0
0:0
020
14-1
0-0
2 1
8:3
0:0
020
14-1
0-0
5 1
0:0
0:0
020
14-1
0-0
8 0
1:3
0:0
020
14-1
0-1
0 1
7:0
0:0
020
14-1
0-1
3 0
8:3
0:0
020
14-1
0-1
6 0
0:0
0:0
020
14-1
0-1
8 1
5:3
0:0
020
14-1
0-2
1 0
7:0
0:0
020
14-1
0-2
3 2
2:3
0:0
020
14-1
0-2
6 1
4:0
0:0
020
14-1
0-2
9 0
5:3
0:0
020
14-1
0-3
1 2
1:0
0:0
020
14-1
1-0
3 1
2:3
0:0
020
14-1
1-0
6 0
4:0
0:0
020
14-1
1-0
8 1
9:3
0:0
020
14-1
1-1
1 1
1:0
0:0
020
14-1
1-1
4 0
2:3
0:0
020
14-1
1-1
6 1
8:0
0:0
020
14-1
1-1
9 0
9:3
0:0
020
14-1
1-2
2 0
1:0
0:0
020
14-1
1-2
4 1
6:3
0:0
020
14-1
1-2
7 0
8:0
0:0
020
14-1
1-2
9 2
3:3
0:0
020
14-1
2-0
2 1
5:0
0:0
020
14-1
2-0
5 0
6:3
0:0
020
14-1
2-0
7 2
2:0
0:0
020
14-1
2-1
0 1
3:3
0:0
020
14-1
2-1
3 0
5:0
0:0
020
14-1
2-1
5 2
0:3
0:0
020
14-1
2-1
8 1
2:0
0:0
020
14-1
2-2
1 0
3:3
0:0
020
14-1
2-2
3 1
9:0
0:0
020
14-1
2-2
6 1
0:3
0:0
020
14-1
2-2
9 0
2:0
0:0
020
14-1
2-3
1 1
7:3
0:0
020
15-0
1-0
3 0
9:0
0:0
020
15-0
1-0
6 0
0:3
0:0
020
15-0
1-0
8 1
6:0
0:0
020
15-0
1-1
1 0
7:3
0:0
020
15-0
1-1
3 2
3:0
0:0
020
15-0
1-1
6 1
4:3
0:0
020
15-0
1-1
9 0
6:0
0:0
020
15-0
1-2
1 2
1:3
0:0
020
15-0
1-2
4 1
3:0
0:0
020
15-0
1-2
7 0
4:3
0:0
020
15-0
1-2
9 2
0:0
0:0
0
numPassengers
![Page 33: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/33.jpg)
NYC Taxi Data
0
10
20
30
40
50
60
70
80
90
100
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
20
14-0
9-1
6 2
1:3
0:0
020
14-0
9-1
9 1
5:3
0:0
020
14-0
9-2
2 0
9:3
0:0
020
14-0
9-2
5 0
3:3
0:0
020
14-0
9-2
7 2
1:3
0:0
020
14-0
9-3
0 1
5:3
0:0
020
14-1
0-0
3 0
9:3
0:0
020
14-1
0-0
6 0
3:3
0:0
020
14-1
0-0
8 2
1:3
0:0
020
14-1
0-1
1 1
5:3
0:0
020
14-1
0-1
4 0
9:3
0:0
020
14-1
0-1
7 0
3:3
0:0
020
14-1
0-1
9 2
1:3
0:0
020
14-1
0-2
2 1
5:3
0:0
020
14-1
0-2
5 0
9:3
0:0
020
14-1
0-2
8 0
3:3
0:0
020
14-1
0-3
0 2
1:3
0:0
020
14-1
1-0
2 1
5:3
0:0
020
14-1
1-0
5 0
9:3
0:0
020
14-1
1-0
8 0
3:3
0:0
020
14-1
1-1
0 2
1:3
0:0
020
14-1
1-1
3 1
5:3
0:0
020
14-1
1-1
6 0
9:3
0:0
020
14-1
1-1
9 0
3:3
0:0
020
14-1
1-2
1 2
1:3
0:0
020
14-1
1-2
4 1
5:3
0:0
020
14-1
1-2
7 0
9:3
0:0
020
14-1
1-3
0 0
3:3
0:0
020
14-1
2-0
2 2
1:3
0:0
020
14-1
2-0
5 1
5:3
0:0
020
14-1
2-0
8 0
9:3
0:0
020
14-1
2-1
1 0
3:3
0:0
020
14-1
2-1
3 2
1:3
0:0
020
14-1
2-1
6 1
5:3
0:0
020
14-1
2-1
9 0
9:3
0:0
020
14-1
2-2
2 0
3:3
0:0
020
14-1
2-2
4 2
1:3
0:0
020
14-1
2-2
7 1
5:3
0:0
020
14-1
2-3
0 0
9:3
0:0
020
15-0
1-0
2 0
3:3
0:0
020
15-0
1-0
4 2
1:3
0:0
020
15-0
1-0
7 1
5:3
0:0
020
15-0
1-1
0 0
9:3
0:0
020
15-0
1-1
3 0
3:3
0:0
020
15-0
1-1
5 2
1:3
0:0
020
15-0
1-1
8 1
5:3
0:0
020
15-0
1-2
1 0
9:3
0:0
020
15-0
1-2
4 0
3:3
0:0
020
15-0
1-2
6 2
1:3
0:0
020
15-0
1-2
9 1
5:3
0:0
0
numPassengers Anomaly Score
NYC Marathon
Christmas
New Years
Snowstorm
![Page 34: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/34.jpg)
Robust Random Cut ForestsQuick Summary
• Random forests (RF) define ensemble models;• The cut in RCF corresponds to specific choice of partitioning;
• Take a set of points, compute bounding box; • Choose an axis proportional to the length (biased), then choose an
uniform random cut in range;• Recurse on both sides.
• Isolation Forests: Partition at random, dimensions unbiased;• Why RCFs? Can maintain RCF tree distributions efficiently and do
so online as data is streaming in. Distance preserving.
![Page 35: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/35.jpg)
Proceedings 33rd International Conference on Machine Learning, New York, NY, USA, 2016.
![Page 36: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/36.jpg)
What we are going to cover…
Random Cut Forests
Anomaly Detection
Density Estimation Forecasting
Semi-Supervised Learning
Attribution & Directionality
![Page 37: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/37.jpg)
Explainable/Transparent/Interpretable ML
“If my time-series data with 30 features yields an unusually high anomaly score. How do I explain why this particular point in the time- series is unusual? [..] Ideally I’m looking for some way to visualize “feature importance” for a specific data point.”
--- Robin Meehan, Inasight.com
![Page 38: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/38.jpg)
What is Attribution?
It’s the ratio of the “distance” of the anomaly from normal.
(It’s a distance in space of repeated patterns in the data.)
![Page 39: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/39.jpg)
What is Attribution?
An
om
aly Sco
re
![Page 40: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/40.jpg)
NYC Taxi Ridership Data1
1Public Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
![Page 41: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/41.jpg)
An
om
aly
Sco
re
![Page 42: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/42.jpg)
Explaining Anomalies
![Page 43: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/43.jpg)
The Moving Example
A Fan/Turbine
1000 pts in each blade
Gaussian, for simplicity
Blades designed unequal
Rotate counterclockwise
3 special “query” points
100 trees, 256 points each
![Page 44: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/44.jpg)
Anomaly Score at P1
What is going on at 90 degrees?
Blade overhead = Not an anomaly
![Page 45: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/45.jpg)
All 3 Blades
![Page 46: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/46.jpg)
Attribution
p1 is far away in x-coord most of the time
But what is happening to y?
x coordinate’s contribution for p1?
![Page 47: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/47.jpg)
Directionality
Sharp transition when the blademoves from above to below at p1!Total score plummets.
Slowly rotating awayTotal score remains high
![Page 48: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/48.jpg)
![Page 49: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/49.jpg)
Forecasting
Exhibit B: Forecasting using RCFs
Not just the next value!
Ability to “see past” anomalies
Auto-detect periodicity …
Forecasting implies missing value imputation!
Wait, this is just a sine wave … its easy ...
![Page 50: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/50.jpg)
Maybe not…
![Page 51: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/51.jpg)
Realistic Data?
ECG (one lead)
Periodicity unclear …
Shingle Size = 185
![Page 52: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/52.jpg)
Test on hold out data
![Page 53: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/53.jpg)
Paper in preparation!(Sudipto Guha)
![Page 54: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/54.jpg)
Semi-Supervised Learning on Data Streams
One Hour
Amazon.com Orders per Minute
![Page 55: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/55.jpg)
![Page 56: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/56.jpg)
![Page 57: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/57.jpg)
![Page 58: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/58.jpg)
![Page 59: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/59.jpg)
![Page 60: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/60.jpg)
![Page 61: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/61.jpg)
![Page 62: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/62.jpg)
![Page 63: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/63.jpg)
![Page 64: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/64.jpg)
![Page 65: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/65.jpg)
In a long stream…
![Page 66: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/66.jpg)
![Page 67: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/67.jpg)
![Page 68: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/68.jpg)
![Page 69: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/69.jpg)
![Page 70: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/70.jpg)
![Page 71: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/71.jpg)
![Page 72: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/72.jpg)
Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018.
![Page 73: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/73.jpg)
What we saw today…
Random Cut Forests
Anomaly Detection
Density Estimation Forecasting
Semi-Supervised Learning
Attribution & Directionality
Detecting Anomalies in Streaming Graphs
![Page 74: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/74.jpg)
![Page 75: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of](https://reader034.vdocuments.site/reader034/viewer/2022042218/5ec4616d17d06d7cdf35ba85/html5/thumbnails/75.jpg)
Contributors To This Project
Roger Barga, Dhivya Eswaran, Gaurav Ghare, Kapil Chhabra, Sudipto Guha, Shiva Kasiviswanathan, Nina Mishra,
Gourav Roy, Joshua Tokle, and Tal Wagner