migue final presentation_v28
DESCRIPTION
Bio-inspired computational techniquesapplied to the clustering and visualization of spatio-temporal geospatial dataTRANSCRIPT
1
Bio-inspired computational techniques applied to the clustering and visualization
of spatio-temporal geospatial data
Miguel BARRETO-SANZ
June 27, 2011
2
More data has been created since 2005 than in the previous 40,000 years
3
1980 First
commercial
vendors of
Geographical
information
Systems (GIS)
software
1972 Landsat 1,
1st civilian
Earth
observation
satellite
1993 It is
launched
the 24th
Navstar
satellite
completing
the Global
Positioning
System
2000 Civilian
demand
for GPS
products
2010 Social networks
Geotag
2005 Google
Earth
2006 GPS
receiver
built into
cell
phones
1997 Tropical
Rainfall
Measuring
Mission
(TRMM)
1992 Internet
explosion
Geospatial data timeline
4
These data are critical for
decision support, but their
value depends on our ability
to extract useful information
5
NASA earth observatory (Information from several missions
e.g. Terra, TRMM, SRTM)
Challenges
• Highly-dimensional • Large quantity of data • Unlabeled samples (labeling is
expensive and time consuming process)
-30.1
30.5
Mean annual
temperature (ºC)
0
12084
Annual
precipitation (mm)
Worldclim (climate data from weather stations)
Elevation Slope Aspect
Landscape
Class
Moisture
Solar
Radiation Exposure Curvature
Derivate variables
6
Spatio-temporal challenges Spatio-temporal representations at several levels
Fuzzy boundaries in geographical space
Variables and clusters evolved in a temporal context
Visualization of clusters in geographical and feature space
Hours
Days
Months
Years
7
Thesis
Clustering
Visualization and projection
Spatio-temporal data
FGHSON Tree-structured SOM component planes SOM GHSOM Colombia (Ecoregions)
South America (Ecoregions)
Colombia
(agroecozones,
ecoregions)
8
Visualization and projection
9
3 1
3 2
Data set SOM training
Visualization
Visualization by using Self-organizing Maps
10
Correlation hunting
Exploration
Similar
Partial correlations
Visualization by using Self-organizing Maps
11
Climate variables. • Average Temperature (TempAvg) • Average Relative Humidity (RHAvg) • Radiation (Rad) • Precipitation (Prec) Soil variables. • Order (Ord) • Texture (Tex) • Deep (Dee) Topographic variables. • Landscape (Ls) • Slope (Sl). Other variables. • Water Balance (WB) • Variety (Var) Production
A real world problem: Classification of agro-ecological variables related with
productivity in the sugar cane culture.
Total 54 variables
12
5 Variables
Classical approach: scatter plot matrix
13
23 Variables
Classical approach: scatter plot matrix
14
54 Variables
Classical approach: scatter plot matrix
15
5 Variables
SOM component planes
16
23 Variables
SOM component planes
17
54 Variables SOM component planes
18
54 Variables
SOM component planes
19
Correlation Hunting
20
SOM of component planes
21
Tree-structured SOM component planes
22
54 Variables
Tree-structured SOM component planes
23
Tree-structured SOM component planes
24
Clustering
25
Hierarchical Self-organizing Structures
• It combines the advantages of the Hierarchical representation and Soft Competitive Learning
• In the state of the art all the methods are crisp
approaches
• In geospatial applications crisp memberships are
not the optimal representation of clusters.
26
Real world data and its fuzzy nature
Crisp
Fuzzy
27
An approach to tackle this problem consists in allowing a fuzzy representation in the hierarchical structures
28
α-cut
α-cut
α-cut
Breadth grow process
Depth
gro
w p
rocess
Hierarchy Fuzzy membership
Fuzzy Growing Hierarchical Self-Organizing Networks FGHSON
29
Precipitation
Temperature
Similar Zones
Case study-South America Cali Colombia
30
Case study-South America Cali Colombia
31
To finding the right prototype
Case study-South America Cali Colombia
32
Level 1
33
Level 2
34
Fortaleza Brazil
Cali Colombia
Level 3
35
Spatio-Temporal Clustering
36
Space - Where
Time – When
Spatio-Temporal Clustering
Homologues places for Colombian coffee production. Brazil, Equator, East Africa, and New Guinea.
37
Space and time – Where and when
Argentina
United States Maize (Zea maize L.)
Spatio-Temporal Clustering
38
Objective: to find similar environmental zones trough time in South America.
In these experience we are looking for regions with similar patterns in time
windows of three months.
Spatio-Temporal Clustering
39
Spatio-Temporal Clustering
40
Precipitation
Temperature
Similar Zones to Cali in the period jan-feb-mar?
Spatio-Temporal Clustering
41
Spatio-Temporal Clustering
42
Conclusions
1. Original contributions
FGHSON • Capability to reflect the underlying structure of a dataset in a hierarchical fuzzy way
• It does not require an a-priory definition of the number of clusters.
•The algorithm executes self-organizing processes in parallel.
•Only three parameters are necessary to the setup of the algorithm.
43
Conclusions
Tree-structured SOM component planes
• It creates structures that allow the visual exploratory data
analysis of large high-dimensional datasets.
• Similarities on variables’ behavior can be easily
detected (e.g. local correlations, maximal and minimal values and outliers).
44
Conclusions
2. Test of methodologies for clustering and visualization of georeferenced data • GHSOM
• SOM
• FGHSON
3. Methodology contributions • Clustering of spatio-temporal datasets through time by using FGHSON.
45
The COCH project
4. Agroecological knowledge contribution • In sugar cane productivity
• In sugar cane agroecoregionalizacion
• In Andean blackberry production
Conclusions
46
Questions