![Page 1: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/1.jpg)
Synthesis of Streaming Data from Multiple Sensors via Embedded Data
Extraction
April 15th, 2004 Project Report
Magdiel Galán
CSE591: DataMiningDr. Huan LiuSpring 2004
http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt
![Page 2: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/2.jpg)
Outline Problem/Project Description Sampling Smoothing Clustering Current Status Plans
![Page 3: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/3.jpg)
Project Description Synthesis of Streaming Data from
Multiple Sensors (~100’s) via Embedded Data Extraction for mission critical applications.
Work in conjunction with Motorola’s Human Interface Lab (on-going project) Simulation Environment
![Page 4: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/4.jpg)
Project Description
Goal: Develop driver assistance system that provide feedback, but not control, during unsafe instances.
From distractions caused by cellphones, PDAs, eMail, Why: Targeting a government initiative to create a
safer car environment in the information age explosion
How: Develop intelligent system by mining Streaming Data from multiple automotive sensors
Development work being done using driving simulator with projections screens with up to 400 parameters/sensors including video links for eye-gaze and foot-pedal movement
![Page 5: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/5.jpg)
Sample Cases Case Scenario #1:
Passing Slow Traffic which slowed down due to an accident
which you are also rubber-necking while fidgetting with your radio
Case Scenario #2: Making a left turn
while hearing directions from MapTracker while checking at the time because you are late
while reaching for the cellphone with on-coming call
![Page 6: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/6.jpg)
Simulation Environment
150 Simulated View
![Page 7: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/7.jpg)
Driving Experience
GasGas
EngineTempBatt
Oil
PDA
GearShift
CD
CellPhone
A/C
Air Bag
Acceleration
Lateral Acc.
Sonar Proximity Sensor
Wheel Rotation Brake Pressure
RPMs
GPS Internet
Driver
![Page 8: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/8.jpg)
Motivation Primary Interest: Robotics
Merging of Sensors/Sensor Fusion optical proximity (IR, sonar, radar) location (GPS, visual maps) movement (actuators, rotations) system (battery, temperature, bump switches)
Problem: decide agent’s next best action vs. a goal
Not too dissimilar from an Automobile environment Other Applications:
Manufacturing Environment Increase Yields/Productivity/Reduce Defects using quality
control daily monitor data (100’s Parameters 1K’s) Pentium Ex.: Oxide Thickness, Poly Width, Boron
Implant Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…
![Page 9: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/9.jpg)
Stream Data Properties Numerical/Continuous
Speed Steering/Heading Acceleration (Forward/Lateral) Distance (Lane Edge, Vehicle on Front)
Categorical Lane Position Gear: P/R/D/OD/L1/L2 Headlights On/Off Radio/CD ON Incoming Call
Sampling Rate: 60Hz
![Page 10: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/10.jpg)
Critical/Special Conditions
Left/Right Turn Passing/Changing Lanes U-Turn Reverse Tailgating Not On Road
![Page 11: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/11.jpg)
Some Warning Signs Lane Drifting Erratic Behavior
droopy eyes eyes not facing the road foot/pedal movement do not correspond
with road conditions Incoming Call while performing
Critical Maneuver
![Page 12: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/12.jpg)
Goal
Identify Instances outside normal patterns as an indication of an Abnormal Situation Hence – Need to draw Driver’s Attention
to Impending Situation Ultimate Goal:
Develop bootsrapping mechanism that combines driving situation classifiers (i.e. LeftTurn/Passing) together with instance selection methods in active learning
Bootsrapping – selecting high utility data for re-training
![Page 13: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/13.jpg)
Instance Selection Properties Instance representative Instance selection reduce rows Ideal outcome instance selection
choose a data subset achieves same result as whole data with little or no performance PP deterioration
Should be model independent ∆ ∆ P(MP(Mii) ≐ ∆P(M) ≐ ∆P(Mjj))
[LM01]
![Page 14: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/14.jpg)
Problem#1: Sampling
Initial step towards instance selection: select representative subset… Divide into collection of elements which
must cover the whole population without overlapping [GHL01]
These are called sampling units
![Page 15: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/15.jpg)
Sampling Results
Sampling at 10mS (x-axis: signal duration; y-axis: count)
![Page 16: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/16.jpg)
Problem#2: Smoothing Reduce/Filter out noise and outliers. Smoothing Techniques used:
Bin Median/Rolling Average [LM01]/[D03] Median preferred over Mean since less
sensitive to outliers Tresholding/Bin Boundaries
[LM01]/[HK01] 10% offset treshold
![Page 17: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/17.jpg)
PreSmoothing - RAW Data
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
![Page 18: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/18.jpg)
RAW Data Map/Course
Route Map – starting point at (0,0)
![Page 19: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/19.jpg)
Smoothing Results - Median
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
![Page 20: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/20.jpg)
Smoothing Results - Median
![Page 21: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/21.jpg)
Smoothing Results - Threshold
![Page 22: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/22.jpg)
Smoothing Results - Threshold
![Page 23: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/23.jpg)
Dr. Liu’s Incremental Instance Selection AlgorithmGiven: Data streams with instances IOutput: indicative instances
For each data streamDo the following incrementally Create a profile P for I Check new instance i against P if i is an outlier of P
Return i else
Update P with iEnd do
![Page 24: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/24.jpg)
Outliers
![Page 25: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/25.jpg)
Problem#3: Clustering Why?
Data is Unclassified Previous results using Numerical Data on
most significant key parameters Develop clusters exemplifying ALL
attributes Select instances that do not belong to a
cluster as triggering mechanism
![Page 26: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/26.jpg)
Stream Clustering Challenges Large “Unclassified” Data Base Fast On-Line Resolution within small
window 0.5 – to 2 or 3 seconds
One Pass Only restriction (need fast I/O) Mix of Numerical and Categorical Data
Traditional algorithms do not work well for categorical attributes (remember P/R/D/OD/L1/L2, or CD On)
Centroid approach cannot be used Hard to reflect the properties of the neighborhood of
the points
Memory Constraints
![Page 27: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/27.jpg)
Clustering Techniques vs. Streaming Data SVM
Good at handling multidimensional data Not good – need classified data, lots of
I/O, data in memory BIRCH
Good at handling mulidimensional data, large databases; single scan, linear I/O time
Not good – predominantly for “numerical” type of attributes; order dependent
![Page 28: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/28.jpg)
Clustering Techniques vs. Streaming Data (2)
CURE (Clustering Using REpresentative)[D03] Good at handling outliers; hierarchical Not good – random sampling (won’t fit
streaming) ROCK (RObust Clustering Using LinKs)
[D03] Good at Hierarchical clustering for
categorical attributes Not good: Random sampling for scale up
![Page 29: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/29.jpg)
My 1st Clustering Attempt…
Move in Reverse
![Page 30: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/30.jpg)
My 1st Clustering Attempt(2)
Zoom Next Page
![Page 31: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/31.jpg)
My 1st Clustering Attempt(3)
Move in Reverse
![Page 32: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/32.jpg)
Current Status/Plans This is an ON-GOING project Cluster Technique Development
Evolve from known methods? Generalization of the technique
Not just Automobile Streaming Data
![Page 33: Synthesis of Streaming Data from Multiple Sensors via Embedded Data Extraction April 15 th, 2004 Project Report Magdiel Galán CSE591: DataMining Dr. Huan](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d565503460f94a346de/html5/thumbnails/33.jpg)
References [LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”.
Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library
[GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library
[HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8 Data Cleaning, Clustering. Morgan Kaufman. ASU Library
[D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5. Mining Techniques, Classification, Clustering. ASU Library