spatial data mining hari agung departemen ilmu komputer fmipa ipb [email protected]
TRANSCRIPT
22004/09/09
• Motivation and General Description• Data Mining: Basic Concepts • Data Mining Techniques • Spatial Data Mining• Spatial Data Mining Scenarios in Meteorology
and Weather Forecasting• Conclusions• Questions & Discussions
32004/09/09
Spatial Data Mining• Spatial Patterns
– Spatial outliers– Location prediction– Associations, co-locations– Hotspots, Clustering, trends, …
• Primary Tasks– Mining Spatial Association Rules– Spatial Classification and Prediction – Spatial Data Clustering Analysis– Spatial Outlier Analysis
• Example: Unusual warming of Pacific ocean (El Nino) affects weather in USA…
42004/09/09
Spatial Data Mining Results• Understanding spatial data, discovering
relationships between spatial and nonspatial data, construction of spatial knowledge bases, etc.
• In various forms– The description of the general weather patterns in a set
of geographic regions is a spatial characteristic rule.– The comparison of two weather patterns in two
geographic regions is a spatial discriminant rule.– A rule like “most cities in Canada are close to the
Canada-US border” is a spatial association rule• near(x,coast) ^ southeast(x, USA) ) hurricane(x), (70%)
– Others: spatial clusters,…
52004/09/09
What is Spatial Data?
Used in/for: GIS - Geographic Information Systems Meteorology Astronomy Environmental studies, etc.
• The data related to objects that occupy space– traffic, bird habitats, global
climate, logistics, ... • Object types:
– Points, Lines, Polygons,etc.
62004/09/09
Basic Concepts (1)• Spatial data mining follows along the same functions
in data mining, with the end objective to find patterns in geography, meteorology, etc.
• The main difference (Spatial autocorrelation)– the neighbors of a spatial object may have an influence on
it and therefore have to be considered as well
• Spatial attributes– Topological
• adjacency or inclusion information
– Geometric• position (longitude/latitude), area, perimeter, boundary polygon
72004/09/09
Basic Concepts (2)
• Spatial neighborhood– Topological relation
• “intersect”, “overlap”, “disjoint”, …
– distance relation• “close_to”, “far_away”,…
– direction/orientation relation• “left_of”, “west_of”,…
• Global model might be inconsistent with regional models
Global Model
Local Model
82004/09/09
Applications
• NASA Earth Observing System (EOS): Earth science data
• National Inst. of Justice: crime mapping• Census Bureau, Dept. of Commerce: census
data• Dept. of Transportation (DOT): traffic data• National Inst. of Health(NIH): cancer
clusters• ……
102004/09/09
Meteorological Data Mining
• Motivation– Lot of analysis methods must be applied to fast growing
data for climate studies
• Result– Appropriate presentation instruments (graphs, maps,
reports, etc) must be applied
• Examples– Spatial outliers can be associated with disastrous natural
events such as tornadoes, hurricane, and forest fires– Associations between disaster events and certain
meteorological observations
112004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
• SKICAT(SKy Image Cataloging and Analysis Tool ) (Caltech, US)
• The Palomar Observatory discovered 22 quasars with the help of data mining
• the Second Palomar Observatory Sky Survey (POSS-II) – decision tree methods– classification of galaxies, stars and other
stellar objects• About 3 TB of sky images were
analyzed
Case Studies (1): Astronomy
122004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
Case Studies (2): NCAR & UCAR• National Center for Atmospheric Research (NCAR) &
University Corporation for Atmospheric Research(UCAR), US– http://www.ucar.edu/
• “Automatic Fuzzy Logic-based systems now compete with human forecasts”
• Richard Wagoner, Deputy Director at Research Applications Program(RAP), NCAR
• Intelligent Weather System (IWS)– Detection and forecast in the areas of en-route turbulence,
en-route icing, ceiling/visibility, and convective hazards in the aviation community
– Road winter maintenance, airport operations, and flash flood forecasting
132004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
Case Studies (3): CrossGrid (EU)
• Objective– To develop, implement and exploit new Grid components
for interactive compute and data intensive applications like flooding crisis team decision support systems, air pollution combined with weather forecasting
• Main tasks in Meteorological applications package– Data mining for atmospheric circulation patterns
• Find a set of representative prototypes of the atmospheric patterns in a region of interest
– Weather forecasting for maritime applications– Ocean wave forecasting by models of various complexity
142004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
• Data– ERA-15 using a T106L31 model (from 1978 to 1994) with 1.125◦ resolution– Terabytes– Comprises data from approx. 20 variables (such as temperature,humidity,
pressure, etc.) at 30 pressure levels of a 360x360 nodes grid
6
SOM Application for DataMining
Downscaling Weather Forecasts
AdaptiveCompetitive
Learning
Sub-grid details scape from numerical models
152004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
Dept. of Applied Mathematics
Universidad de Cantabria
Santander, Spain
162004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
Case Studies (4): Typhoon Image Data Mining• Objective
– To establish algorithms and database models for the discovery of information and knowledge useful for typhoon analysis and prediction
– Content-based image retrieval technology to search for similar cloud patterns in the past
– Data mining technology to extract spatio-temporal pattern information which is meaningful from the meteorology viewpoints
• Result– Alignment of Multiple Typhoons, Explore by Projection to
2D Plane, Diurnal Analysis
182004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
Case Studies (6): Rainfall Classification University of Oklahoma Norman• To classify significant and interesting features within a
two-dimensional spatial field of meteorological data– Observed or predicted rainfall
• Data source– Estimates of hourly accumulated rainfall– Using radar and raingage data
• “Attributes” for classification– Statistical parameters representing the distribution of rainfall
amounts across the region
• Classification Method– Hierarchical cluster analysis
192004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
What we can learn from those scenarios?
• Data Mining is a promising way for meteorological analysis
• Very strong interaction between scientists and the knowledge discovery system is necessary
• The users define features of the meteorological phenomena based on their expert knowledge
• The system extracts the instances of such phenomena
• Then, further analysis of phenomena is possible
202004/09/09 Hong Kong Observatory Hong Kong Meteorological Society
Conclusions
• Data mining: discovering interesting patterns from large amounts of data
• A natural evolution of database technology, in great demand, with wide applications
• A KDD process includes data mining, and other steps• Data Mining can be performed in a variety of
information repositories• Data mining Tasks: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.