improving the correlation hunting in a largequantity of som component planes

22
Improving the correlation hunting in a large quantity of SOM component planes Classification of agro-ecological variables related with productivity in the sugar cane culture. Miguel BARRETO Andrés Pérez-Uribe MINISTERIO DE AGRICULTURA Y DESARROLLO RURAL asocaña

Upload: askroll

Post on 14-Jun-2015

1.208 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Improving the correlation hunting in a largequantity of SOM component planes

Improving the correlation hunting in a largequantity of SOM component planes

Classification of agro-ecological variables related withproductivity in the sugar cane culture.

Miguel BARRETOAndrés Pérez-Uribe

MINISTERIO DE AGRICULTURA Y

DESARROLLO RURAL

asocaña

Page 2: Improving the correlation hunting in a largequantity of SOM component planes

Self Organizing Maps

A Self-organizing maps (SOMs) can be seen as a data visualization technique that reduces the dimensionality of data through the use of a self-organizing clustering algorithm.

The problem that data visualization attempts to solve is that humans cannot visualize high dimensional data.

These techniques can be used to improve the understanding of high dimensional data by visualizing information in a low dimensional space.

A SOM presents high dimensional data in a low dimensional space by placing points that are close in the high dimensional space, close in the low dimensional space.

From a computational point of view, the self-organizing model is both a projection method which maps high-dimensional data space into low-dimensional space (reduction of dimensionality), and a clustering method, so that similar data samples tend to be mapped to nearby neurons.

Page 3: Improving the correlation hunting in a largequantity of SOM component planes

Component planes

V1 TMAS P1AS Ra1AS

Vector 1

Vector 2

Vector n

To improve the analysis of the relationships between variables and/or their influence on the outputs of the system, it is possible to slice the Self-organizing maps in order to visualize their so-called component planes

Page 4: Improving the correlation hunting in a largequantity of SOM component planes

Example: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study (Junbai Wang et al, 2002)

a) 42 DLBCL samples, the color scale of SOM red indicates high expression, blue indicates low expression. b) The cluster numbers resprent gruop of genes contained.

Page 5: Improving the correlation hunting in a largequantity of SOM component planes

Correlation hunting

The task of organizing similar components planes in order to find correlating components is called correlation hunting.

Page 6: Improving the correlation hunting in a largequantity of SOM component planes

Correlation huntingThe expression correlation does not include just linear correlations, butalso nonlinear and local or partial correlations between variables

Page 7: Improving the correlation hunting in a largequantity of SOM component planes

Correlation hunting

However, when the number of components is large it is difficult to determine which planes are similar to each other.

Page 8: Improving the correlation hunting in a largequantity of SOM component planes

Correlation huntingA new SOM can be used to reorganize the component planes in order to perform the correlation hunting. The main idea is to place correlated components close to each other.

Page 9: Improving the correlation hunting in a largequantity of SOM component planes

Correlation hunting

An advantage of using a SOM for component plane projection is that the placements of the component planes can be shown on a regular grid. In addition, an ordered presentation of similar components is automatically generated. A disadvantage is that the choice of grouping variables is left to the user.

Page 10: Improving the correlation hunting in a largequantity of SOM component planes

More component planes …

Heart disease

279 component planes

This database contains 13 attributes (which have been extracted froma larger set of 75)

Page 11: Improving the correlation hunting in a largequantity of SOM component planes

Clustering of SOM component planes based on the SOM distance matrix

The U-matrix had been used as an effective cluster distance function. The U-matrix visualizes distances between each map unit and its neighbors, thus it is possible to visualize the SOM cluster structure.

Page 12: Improving the correlation hunting in a largequantity of SOM component planes

Use the Vellido’s algorithm to partition the map

The Vellido’s algorithm is used to obtain different partitioning levels of the clustering of the SOM. The Vellido’s algorithm provides a partitioning of the map into a set of base clusters. The number of clusters is equal to the number of local minima on the U-matrix; allowing different levels of clustering.

Page 13: Improving the correlation hunting in a largequantity of SOM component planes

Tree-structured component planes clusters representation

Page 14: Improving the correlation hunting in a largequantity of SOM component planes

Case study: sugar cane culture

The agricultural productivity of a geographic area depends on many agro-ecological variables like soil and terrain characteristics, climaticconstraints, human behavior and management.

Soil

Management

Climate

Genotype

Productivity

Page 15: Improving the correlation hunting in a largequantity of SOM component planes

A new approach

1358 experiments

Sowing Growing Harvest

SoilManagementClimate Genotype

Each agroecological event is unique in time and space, but it is possible to find similar characteristics between events that allow finding similar behaviors permitting to discover why and how the agroecological variables affect the crop development and therefore the agricultural productivity.

Page 16: Improving the correlation hunting in a largequantity of SOM component planes

The variables

Climate variables. Continuous data.

Average Temperature (TempAvg), / After seed (AS) / Before Harvest (BH) Average Relative Humidity (RHAvg) / After seed (AS) / Before Harvest (BH) Radiation (Rad) / After seed (AS) / Before Harvest (BH) Precipitation (Prec) / After seed (AS) / Before Harvest (BH)Soil variables. Order (Ord) / 3 Orders (Ord1, Ord2, Ord3) Nominal Data Texture (Tex) / Ordinal Data Deep (Dee)/ Ordinal DataTopographic variables. Landscape (Ls) / 3 Landscapes (Ls1, Ls2, Ls3) Nominal Data Slope (Sl). / Ordinal DataOther variables. Water Balance (WB) Ordinal Data Variety (Var) / 3 varieties (V1, V2, V3) Nominal DataProductionTotal 54

Months After Seed (AS)

Months Before Harvest (BH)

1 2 3 4 1 2 3 4

Page 17: Improving the correlation hunting in a largequantity of SOM component planes

Components planes / sugar cane culture

Page 18: Improving the correlation hunting in a largequantity of SOM component planes

Tree-structured component planes clusters representation / sugar cane culture

Page 19: Improving the correlation hunting in a largequantity of SOM component planes

Tree-structured component planes clusters representation / sugar cane culture

Page 20: Improving the correlation hunting in a largequantity of SOM component planes

Classification of agro-ecological variables related with productivity (initial analysis)

Ra1BH, Ra1AS and Productivity BMUs

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116

Best Matching Units

Mag

nitude

of th

e va

riab

les

Ra1AS Ra1BH Productivity

BMUs of the component planes: productivity, radiation 1 month before harvest(Ra1BH) and radiation 1 month after seed (Ra1AS).

Page 21: Improving the correlation hunting in a largequantity of SOM component planes

Conclusions

Visualization of agroecological variables is very important but difficult due to the high dimensionality of the data. The SOM algorithm is a powerful technique able to deal with this problem, but it is used as an exploratory analysis.

In this study is presented a methodology to enhance the component planes analysis process. This methodology improves the correlation hunting in the component planes with a tree-structured clusters representation based on the SOM distance matrix.

By analyzing the obtained groups of agro-ecological variables and cultivated zones, it was possible, as an example of the application of the methodology, to find a relationship between the radiation after seed, before harvest, and a high-medium productivity.

We are currently looking forward to develop data mining and visualization techniques in order to improve the decision support in the sugar cane culture based on the aforementioned methodology.

Page 22: Improving the correlation hunting in a largequantity of SOM component planes

The end

Thanks for new ideas and directions to explore!