GeoFISSpatial data processing for decision
making
Natalia Iglesias Documentation - Rosario 2019
Contents● Overview
○ Introduction○ Architecture of GeoFIS○ Download and Install
● Starting with GeoFIS○ User Interface○ Vocabulary○ Data Formats○ Data Visualization
● Basic GeoFIS operations○ Filter○ Variogram○ Interpolation
● Advanced GeoFIS operations○ Zoning○ Opportunity Index○ Data Fusion
● Tutorials○ Dataset○ Project creation○ Filter and spatial structure
evaluation○ Vector to Raster○ Data Aggregation○ Zoning
● References○ Publications○ FisPro
Overview
Introduction● GeoFIS is a free and open source software platform for high spatial resolution
data processing with a decision support perspective. ● GeoFIS is developed by a group of researchers from several French research
and education institutions in the field of agriculture and environment (INRA, Irstea, Montpellier SupAgro).
Architecture of GeoFIS [1]●
Download and Install● https://www.geofis.org/en/install/
○ Requirements■ Java installed (version >= 1.8)■ R software (version >= 3.5)
Starting with GeoFIS
User Interface
Left panel shows the object
hierarchy of a project in GeoFIS
Right panel shows maps and graphical objects
Menu
User Interface
A project is a set of maps. It is
saved as a XML format file.
User Interface
Add or remove a map on project. A map can have
one or more information
layers.
User Interface● Add an information layer on a map of the
project.
User Interface
Six types of operations can be
made on an information layer.
Vocabulary● A project is a set of maps. It can be saved in a XML file.● A map is composed of one or more information layers.● A layer contains data. Each layer can be of a different type of file.
Data Formats● Input data
○ Different types of layers can be added:■ Vector data from ESRI shapefile: points, lines or polygons.■ Data from CSV or shapefiles files: ■ Raster data from GeoTIFF (tif, tiff), World file (jpeg, png, gif) or JPEG 2000(jp2, j2k,
jpeg2000) image files. Used as a base map. No operation is available for Raster data
Watch the video tutorial to import data (https://www.geofis.org/en/documentation-en/starting-with-geofis/#video-starting-import-data)
.
Data Formats● Input data
○ CSV fileFirst two columns must be (x, y) coordinates.
Selection of a coordinate system.
Supported delimiters: comma, semicolon, tabulation, and space
First row must contains attribute names.
The CSV file must contain only numeric data (with point “.” as decimal separator), without missing values. Header: one line
Data Formats● Input data
○ Restrictions on attribute names (inherited from R software)
First row must contains attribute names.
● The first character must be a letter.
● Other characters can be letters, digits or “undersore” as separator.
● Accented characters are not allowed.
● The attribute must be unique.
Data Formats● Coordinate systems
○ There are 2 types:■ Geographic Coordinate Systems (unprojected)
● A reference system using latitude and longitude to define the location of points on the surface of the earth (e.g. WGS84 in grades)
■ Projected Coordinate Systems● A map projection is the systematic transformation of locations on the earth
(latitude/longitude) to planar coordinates (e.g. UTM in meters)
● Geographic coordinate Systems (lat/lon) good for locating positions on surface of the earth. But lat/lon not efficient for computing distances and areas.
GeoFIS uses only projected coordinate system
Data Visualization● Style: can be applied to layer to visualize data
○ The “Default” Style ○ The “Geometry” Style (vector layers).
■ The Layer data are displayed according to their geometry (point, polygon, line). You can choose the shape, color, size, label of data. All data in the layer will be displayed the same way.
○ The “Palette” style (vector or raster layers).■ The layer data are displayed in different shades of
color based on the value of an attribute. You can choose the attribute, the palette of colors, the number of classes and the method of classification (equal intervals, jenks, equal count).
● The style can be exported or imported in a SDL file.
Watch the video tutorial style(https://www.geofis.org/en/documentation-en/starting-with-geofis/#video-starting-style)
Data Visualization● Geometry Style:
○ Allows you to change the geometry, size, and color of the points.
All data in the layer are shown in the same way
Data Visualization● Pallete Style:
○ You can choose the palette of colors, the number of classes and the method of classification for each attribute.
Class: number of bins into which
attributes are divided
Palettes: defines the color scheme. 4 color palettes: numerical and sequential
follow a gradient; diverging typically takes range between
three distinct colors and qualitative consist of easily
distinguishable colors
Classifier: algorithms to automatically create breaks. Equal interval: divides input values into bins of equal range. Jenks: identifies groups of
similar values in the data and maximizes the differences between categories. Equal count: ensures the same number of observations fall
into each bin. Unique Interval: takes into account unique values
Basic GeoFIS operations
Filter● This operation analysis the
distribution of the data using a histogram (n=∑i=1_k mi), with the objective of quickly locate and filter outliers. A new “filtered” layer is generated.
The number of bins (k) and the break values can be customized. Different
choices are possible, including equally spaced bins, bins with an equal number of elements, or selecting manually the
break values.
R function: hist
Semi-variogram● Describes the degree of spatial
dependence of the data.○ The variogram model often needs
expert tuning to fit the model taking into account the data set (spatial resolution, shape and size of the area under study)
○ The variogram model can be saved to a .variogram file (XML) for reuse on new data or exporting to other software.
Ү(h) = (∑ [ ( Z(x) – Z(x+h)) ]2 ) / (2N)
The semivariance is calculated for several distance h, where Z (x) is the value of the variable at a site x,
Z (x + h) is another sample value separated from the previous one by a distance h, N is the number of couples
that are separated by that distance.
Semi-variogramSelection of variogram parameters:
● Boundaries (clicked) lets you change the settings,○ Number of points: define the number of
interval wherein the calculation of the semivariance is performed.
○ starting distance defines the position of the first interval,
○ Max distance defines the maximum distance considered.
● Cloud (clicked) lets you calculate the semivariogram cloud.
● Once done, click on Compute
R package: gstatR function: variogram
Semi-variogram● It is possible to choose the variogram
model that best fits the data.
Selection of variogram model and associatedparameters:
● Theoretical model:○ Exponential: similar to spherical but only until a
95% sill value is achieved○ Gaussian: uses a normal probability
distribution curve○ Linear: spatial variability increases linearly with
distance○ Spherical: (almost) linear until the range in
which the phenomenon is stabilized
R package: gstatR function: vgm
Semi-variogram
Selection of variogram model and associatedparameters:
● Nugget: The distance at which the model first flattens out
● Partial Sill: The value that the semivariogram model attains at the range (the value on the y-axis) is called the sill. The partial sill is the sill minus the nugget.
● Range: The value at which the semi-variogram (almost) intercepts the y-value
RMSE: difference between the proposed model and the points observed, it is a support to fit the best possible model
At any time you can reset the model and its parameters
Interpolation● GeoFIS can be used to interpolate data with:
○ a deterministic method (inverse distance (IDW)) ■ create surfaces from measured points, based on neighbor values
○ a geostatistical method (Kriging) ■ autocorrelation -> variance
We observe a property of a phenomenon at a limited number of sample locations and we are interested in the property value at not sampled
locations, so we have to predict it for unobserved locations.
To convert vector data to raster data
Interpolation● Deterministic method (inverse distance (IDW))
IDW is a simple method of estimating a specific value of unsampled locations.
where: z* is the estimated value of a point not sampled, zi is the value at a location i; do,i is the distance from sampled point location to the ith data location si; p is the power selected for the inverse distance estimation
Search distance determines how many points will be used
Each measured point has a local influence that diminishes
proportionally to the inverse of the distance raised to the power
value p. If p = 0, there is no decrease with distance, The
default value is p = 2, there is no theoretical justification
Square cells, organised in a regular lattice
R package: gstatR function: idw
Interpolation● Geostatistical method (Kriging)
Ordinary kriging is based on the assumption that variation is random and spatially dependent, and that the random process is intrinsically stationary with constant mean and a variance
that depends only on separation in distance and direction
Determines the resolution of the estimate map. The ideal choice depends on the application and the data
distribution. Larger grid size more data
Import map contours (.shp, polygon or create from data)
Without border a convex hull is used. In this case the grid depends on data locations
R package: gstatR function: krige
Interpolation● Geostatistical method (Kriging)
Import the variogram created by the variogram process. Nugget, sill range
and max distance values are auto-completed.
Select the number of nearest neighbours
Advanced GeoFIS operations
Zoning● Zoning is the process of dividing land into zones. The type of zone determines
a site-specific management of the land.● GeoFIS uses a segmentation algorithm to ‘zone’ data layers inspired from an
image-processing region-merging algorithm. ● Segmentation is called to the process of defining zones inside an image. The
segmentation methods can be classified into two main families: the contour-based ones and the region-based ones. The first family is more suitable for object recognition, and the second one is useful when there are no well definite borders. This last case may well correspond to agricultural data zoning.
● The segmentation algorithm operates either on irregular or gridded (interpolated) data to generate potential management zones.
Zoning● Algorithm [4]
○ Start: one point = one zone○ Iterate on:
■ Merge the pair of neighbouring zones that are closest in the attribute space■ Update zone list and zone neighbours
○ Until all zones are merged
Zoning
● The input parameters drive the algorithm.
○ The border can be used to limit the processed area,
○ the neighborhood relation can be filtered by a minimal common edge length and
○ various distances are used for zone aggregation.
Border options: convex hull or file. Only data points
within the border polygon are processed
At start, Voronoi tessellation is used to
convert each data point to a zone and to define the initial
neighbourhood
Zoning● Neighborhood: all or line segment
length. In this case, to be considered as neighbors two Voronoi polygons
must share an edge with the specified minimal length.
Zoning● The algorithm can be parameterised by different criteria for the distance metric
Univariate distance: Euclidean or fuzzy [5, 6]. It computes the distance between two data points in the
mono-dimensional attribute space
Multivariate combination: Euclidean (p=2) or Minkowski. The combination is needed when the zoning is done according to several attributes. In this case, each univariate or elementary distance is computed and normalized in a unit interval. These
partial distances are then aggregated to yield the distance between two data points in the multidimensional attribute space.
Zone distance: Minimum, Mean or Maximum. At each step, the algorithm merges the two zones with the minimum zone distance. To compute the distance
between two zones, all the data points included in the two zones are considered and the aggregation is done using the parameter. Maximum is the default value
Zoning● A certain number of zones can be
selected for visualization.
Number of zones
Zoning● Post-Processing allows a final filtration of small zones (according to the area
or the number of points).● When the small zone is included into another, it is just merged with its
surrounding neighbor. When the small zone shares a border with several neighbors it is merged with the closest one, according to the between zone distance.
Watch the video tutorial
Zoning
The zoning algorithm produce a map with several attributes for each zone.
This map can be exported as Shapefile
Geometry of the zone. A polygon delimiting the zone
Attributes zone 3
Unique identifier of the zone
Number of points inside the zone
Area of the zone
For each attribute: mean and std values
Data Fusion● Information fusion is done with a specific goal:
○ for instance parameter estimation according to various sensors or risk level evaluation according to different sources of information. The objective is to compare different alternatives, locations, sites or zones, in the case of spatial data, according to the whole information.
● Only values in the same scale and with the same meaning can be aggregated. When the data are of same kind, like in sensor fusion, there is no problem. This is not true in the general case.
● The most popular aggregation operator is the weighted mean but its modeling power is limited.
Data Fusion● Step
○ Each information layer is transformed into an expert layer: numerical attribute transformed into degree values (from 0 to 1) according to rules. -> Fuzzy function
○ Expert layers are combined using an aggregation operator: WAM, OWA, FIS
Data Fusion● All the attributes must belong to the same
layer. An attribute must be selected as an input
Select function to turn raw data to satisfaction degree. Four types of
membership functions (MF):Semi trapezoidal inf: low values are
preferred; Semi trapezoidal sup: high values are preferred; Trapezoidal: around
an interval; Triangular: about a value
Data Fusion●
Membership functions parameters
Data Fusion●
Data Fusion● Aggregation
An aggregation operator is defined for each aggregated variable. Three are currently
available: WAM: the weights are assigned to the information sources; OWA: the weights are given to the position in the distribution;
FIS: a fuzzy inference system (FIS) including linguistic rules. Linguistic rules are used
within fuzzy inference systems for approximate reasoning.
Data Fusion● FIS aggregation
Operator of aggregation FIS:
Granularity: the number of linguistic terms for each
variable
Rules are used within FIS for approximate reasoning. The
maximum number of rules is given by the product of the input
granularitiesRule conclusions: must be in [0; 1],
may be crisp or fuzzy
Data Fusion
Data Fusion result can be saved to ESRI shapefiles to the
zoning process.
Data Fusion
Data Fusion result can be saved to csv files to
pos-processing with R.
Tutorials
Workflow [1] ● Generic flow of data in precision agriculture with main processing steps from
raw data processing to decision-making
Process: From raw data to dataset
Outside of GeoFIS
Raw data: Coordinates converter● WGS 84 to UTM
○ R scriptsTo find EPSG code:
http://epsg.io/
from/to shapefile
from/to csv file
Convert Geographic Units
Raw data: Border creation● Using QGIS
1
2
3
4
5
6
7
8
The first: Add data layer
Result: Data layer added
Raw data: Border creation
The second: Create layer to border
1
23
4
5
6
Raw data: Border creation●
Draw polygon: On map add points with mouse left click, terminate
with right click. Set id = 1.
Finish editing toggle editing mode and save the layer.
The third: Create border and save to layer
1
2
3
4
5
6
7
Dataset● CropLoad_2014S (precision viticulture example) ● The data were obtained from various sensors and mapped into a common
grid (More information: https://efficientvineyard.com)● File: CropLoad_2014S.csv● Variables
○ EC_Deep (apparent electrical conductivity - Soil variability)○ PW (pruning weight - vine size)○ Crop Load (ratio crop weight to pruning weight)
Process: From dataset to information layers
Add Layers● Data Layer
1
2
3
4
5
6
7
8
Add Layers●
1Data Layer Added
Zoom
Info point 1601
1
2
Add Layers● Border Layer (shapefile)
Border Layer Added
1
2 34
5
Filtering (remove erroneous data)● Histogram
1
2 3
Variables
Spatial structure evaluation● Semi-variogram Variables
1
23
4
5
6
7
8
9
10
111
12
13
Visualization● Style change on border
1
2
3
4
5
6
Visualization● Style change on data
Visual data analysis by classifier selection
Vector to Raster● IDW with border and without border
1
2
3
4
56
7
8
9
Vector to Raster● IDW with border and different p values
p = 1 p = 2 p = 3 p = 4
Vector to Raster● IDW with border and different max distance values
Dm = 10 Dm = 100 Dm = ∞Dm = 5.440 Dm = 1000
Vector to Raster● Kriging with border and without border
1
2
3
45
6
7 8
9
10
11
Vector to Raster● Kriging with border and different N min and N max setting
N min = 1N max = 100
N min = 4N max = 90
N min = 1N max = 10
Process: From information to decision
Question to answer● Where have I potential to increase production (yield)?
○ Multicriteria decision according to:■ Apparent electrical conductivity (EC_Deep)■ Pruning weight (PW)■ Crop Load
EC_Deep PW CropLoad
Data fusion● Input attribute selection and satisfaction degree setting according to expert
knowledge
1
2
3
4
Data fusion● Expert knowledge:
○ 8 mS/m <EC_D < 11 mS/m is good -> degree = 1○ PW > 2.75 lb/vine is good -> degree = 1○ 5 < Crop Load < 10 is good -> degree = 1
1
2
3
Data fusion●
1
2
3
4 5
6
7
8
9
Data fusion●
1
2
3
4 5
6
7
8
9
Data fusion●
2
3 4
5
6
7
8
9
1
Interpretation of data fusion using zoning● Zoning on fusion WAM layer
1
2
3
4
56
7 8
910
11
Prudent decision: Zone distance = Maximum
Interpretation of data fusion using zoning● Zoning on fusion OWA layer
#zone = 1 #zone = 2 #zone = 3 #zone = 4 #zone = 5
Interpretation of data fusion using zoning● Zoning on fusion FIS layer
Interpretation of data fusion using zoning● Zoning on fusion FIS layer with fuzzy distance
1
2
3
4
5
Summary● GeoFIS
○ open source○ flexible○ easy use○ introduce expert knowledge○ decision support○ interoperable with other tools○ open to user needs or contributions
References
Publications● [1] C. Leroux, H. Jones, Léo. Pichon, S. Guillaume, J. Lamour, J. Taylor, O.
Naud, T. Crestey, J. Lablee, and B. Tisseyre, “Geofis: an open source, decision-support tool for precision agriculture data,” Agriculture, vol. 8, iss. 6, 2018.
● [2] S. Guillaume, B. Charnomordic, and B. Tisseyre, “Open source software for modelling using agro-environmental georeferenced data.,” in Ieee international conference on fuzzy systems, Brisbane, Australia, 2012, pp. 1074-1081.
● [3] S. Guillaume, B. Charnomordic, B. Tisseyre, and J. Taylor, “Soft computing-based decision support tools for spatial data,” International journal of computational intelligence systems, vol. 6, pp. 18-33, 2013.
Publications● [4] M. Pedroso, J. Taylor, B. Tisseyre, B. Charnomordic, and S. Guillaume, “A
segmentation algorithm for the delineation of management zones,” Computer and electronics in agriculture, vol. 70, iss. 1, pp. 199-208, 2010
● [5] S. Guillaume, B. Charnomordic, and P. Loisel, “Fuzzy partitions: a way to integrate expert knowledge into distance calculations,” International journal of information sciences, vol. 245, pp. 76-95, 2013.
● [6] S. Guillaume and B. Charnomordic, “Fuzzy partition-based distance practical use and implementation,” in Ieee international conference on fuzzy systems, paper f-1136, Hyderabad, India, 2013.
● [7] Roudier, P., Tisseyre, B., Poilvé, H. et al. Precision Agric (2011) 12: 130
FIS● https://www.fispro.org/es/● FisPro is an open source toolbox to design and optimize fuzzy inference
systems (FIS). Among fuzzy software products, FisPro stands out because of the interpretability of fuzzy systems automatically learnt from data. Interpretability is guaranteed in each step of the FIS design with FisPro: variable partitioning, rule induction, optimization. FisPro includes several modules: fuzzy partitioning, rule and partition learning, inference and FIS optimization.
● Referencia: Serge Guillaume, Brigitte Charnomordic, Learning interpretable fuzzy inference systems with FisPro, Information Sciences, Volume 181, Issue 20, 2011,Pages 4409-4427,ISSN 0020-0255