data classification methods in gis - university of thessaly · k = 1+3,322*log 10 n, where n =...
TRANSCRIPT
Laboratory of Urban and Regional Planning
Department of Architecture – School of Engineering – University of Patras
University of Thessaly, Department of Planning and Regional Development
Databases and Geographic Information SystemsDatabases and Geographic Information Systems
Bases de Bases de DonneesDonnees -- SIGSIG
Vassilis PAPPAS, Associate Professor
Master Franco – Hellenique
“POpulation, DEveloppement, PROspective”
Volos, 2013
Data classification methods in GISThe most common methods
� DRAWING FEATURES
� DRAWING CATEGORIES OF FEATURES
� DRAWING QUANTITIES OF FEATURES
Categories in Urban Planning: LAND USE,
BUILDING CUNSTRUCTION TYPE,
BUILDING QUALITY, …
Quantities in Urban Planning: POPULATION,
LAND VALUES, …
Data Classification methods
A feature layer is a reference to a feature class and has an associated
drawing method.
A layer lets us assign any type of drawing method to a geographic dataset.
Data Classification methods
Geographic datasets do not contain the instructions for drawing the data
BLOCK_ID AREA PERIMETER ADEQUACY COVER (%) F.A.R. P. HEIGHT P. LAND USE
1 3954,227000 254,311000 20-600 40 0,6 7,5 gk2
57 1896,508000 174,281100 20-600 40 0,6 7,5 gk1
167 2647,750000 212,230700 12-300 40 0,6 7,5 ak
168 316,265600 71,350790 15-500 70 0,8 8,5 gk
169 3702,945000 304,357800 15-500 70 0,8 8,5 ak
170 846,890600 129,475100 15-500 70 0,8 8,5 ak
171 1096,961000 132,914700 15-500 70 0,8 8,5 ta
172 343,187500 75,198450 15-500 70 0,8 8,5 gk
242 2578,617000 209,745900 25-1000 60 0,6 10,5 pk
243 4661,258000 348,084000 10-300 60 0,6 10,5 ak
244 3522,188000 341,614400 10-300 60 0,6 10,5 gk
245 385,757800 78,804500 15-500 70 0,8 8,5 gk
246 1125,063000 135,727400 15-500 70 0,8 8,5 gk
247 3006,898000 265,786300 15-500 70 0,8 8,5 xo
252 868,085900 146,842800 10-300 60 0,6 10,5 gk
253 343,929700 76,586040 10-300 60 0,6 10,5 ak
254 890,703100 136,971300 10-300 60 0,6 10,5 ak
257 2051,953000 219,292200 10-300 60 0,6 10,5 gk
258 1912,266000 192,092900 25-1000 60 0,6 10,5 ak
259 1174,656000 137,505000 25-1000 60 0,6 10,5 ak
264 1768,484000 170,005800 15-500 70 0,8 8,5 gk
265 1376,539000 150,555600 10-300 60 0,6 10,5 gk
266 1320,211000 150,161500 25-1000 60 0,6 10,5 ak
292 1157,344000 141,667200 10-300 60 0,6 10,5 gk
293 892,242200 121,613000 25-1000 60 0,6 10,5 ak
295 755,593800 124,848200 15-500 70 0,8 8,5 gk
296 730,953100 116,225200 10-300 60 0,6 10,5 ak
297 2548,492000 211,246600 25-1000 60 0,6 10,5 gk
300 2910,672000 222,256800 10-300 60 0,6 10,5 ak
City of Patras, Greece,
Part of a digital map and
Attribute table
for the official building census (2001)
41.738 buildings
Drawing features
Maps present descriptive information about geographic features
using symbols and labels.
� Points – Marker symbol
� Lines – Line symbols
� Areas – Fill symbols
CharacterSimpleArrowPictureMultilayer
Simple
Line
Marker
Gradient
Picture
Multilayer
Cartographic
Hash
Marker
Multilayer
The simplest way to draw a feature layer is to draw all the features
with the same symbol
Drawing features
All road axes have
the same line symbol
All buildings have
the same fill symbol
All blocks have
the same fill symbol
Single symbol
Drawing features
All buildings have
different fill symbol
according to their coverage
All blocks have
different fill symbol
according to their size
Practically this map does not give us
any useful information
Unique values
Drawing features
All buildings are grouped
in three classes according
to their coverage:
All blocks have
the same fill symbol
as a background
But, how we define what is “small”,
“medium” or “big”?
Group of Quantities (quantile)
Data in Urban Planning
Numerical data (quantitative data)
Population density
Building heights
Building coverage
etc
Textual data (qualitative data)
Land use
Building construction type
Building condition
etc
Two main types of data
Classification method
Categories based on their
semantics
Land use
Categories in Urban Planning
Grouping Categories
CODE LAND USE
000 NOT URBAN LAND USE
010 VACANT PLOT
020 AMBANDONED BUILDING
030 UNDER CONSTRUCTION
040 CULTIVATED LAND
050 TREE LANDS
060 FREE LANDS
070
080
090 OTHER NOT URBAN LAND USE
100 RESIDENCE
110 PRIMARY RESIDENCE
111 SINGLE FAMILY
112 SINGLE FAMILY (WITH GARDEN)
113 DUPLEX FAMILY
114 DUPLEX FAMILY (WITH GARDEN)
115 GROUP QUARTERS
116 MULTI-FAMILY
117
118
119 RESIDENCE, OTHER TYPE
120 SECONDARY RESIDENCE
121 COUNTRY HOUSE
122 COUNTRY MULTI-STORE HOUSE
123
Tree structured
coding system
Patterns may be easier to see
through generalization.
That means many categories
to few.
The process of grouping categories is
based to their meaning (semantics)
and the used coding system
It is easer to read a thematic map with less than seven (7) categories
(Mitchell A.,1999)
Classification methods for numerical data
It is easer to read a thematic map with less than seven (7) categories
(Mitchell A.,1999)
The practical type of Sturges gives the number of classes (k)with good results:
k = 1+3,322*log10n, where n = number of cases
How many categories for these records (cases)?
BLOCK_ID AREA PERIMETER ADEQUACY COVER (%) F.A.R. P. HEIGHT P. LAND USE
1 3954,227000 254,311000 20-600 40 0,6 7,5 gk2
57 1896,508000 174,281100 20-600 40 0,6 7,5 gk1
167 2647,750000 212,230700 12-300 40 0,6 7,5 ak
168 316,265600 71,350790 15-500 70 0,8 8,5 gk
169 3702,945000 304,357800 15-500 70 0,8 8,5 ak
170 846,890600 129,475100 15-500 70 0,8 8,5 ak
171 1096,961000 132,914700 15-500 70 0,8 8,5 ta
172 343,187500 75,198450 15-500 70 0,8 8,5 gk
242 2578,617000 209,745900 25-1000 60 0,6 10,5 pk
243 4661,258000 348,084000 10-300 60 0,6 10,5 ak
244 3522,188000 341,614400 10-300 60 0,6 10,5 gk
245 385,757800 78,804500 15-500 70 0,8 8,5 gk
246 1125,063000 135,727400 15-500 70 0,8 8,5 gk
247 3006,898000 265,786300 15-500 70 0,8 8,5 xo
252 868,085900 146,842800 10-300 60 0,6 10,5 gk
253 343,929700 76,586040 10-300 60 0,6 10,5 ak
254 890,703100 136,971300 10-300 60 0,6 10,5 ak
257 2051,953000 219,292200 10-300 60 0,6 10,5 gk
258 1912,266000 192,092900 25-1000 60 0,6 10,5 ak
259 1174,656000 137,505000 25-1000 60 0,6 10,5 ak
264 1768,484000 170,005800 15-500 70 0,8 8,5 gk
265 1376,539000 150,555600 10-300 60 0,6 10,5 gk
266 1320,211000 150,161500 25-1000 60 0,6 10,5 ak
292 1157,344000 141,667200 10-300 60 0,6 10,5 gk
293 892,242200 121,613000 25-1000 60 0,6 10,5 ak
295 755,593800 124,848200 15-500 70 0,8 8,5 gk
296 730,953100 116,225200 10-300 60 0,6 10,5 ak
297 2548,492000 211,246600 25-1000 60 0,6 10,5 gk
300 2910,672000 222,256800 10-300 60 0,6 10,5 ak
City of Patras, Greece,
Part of a digital map and
Attribute table
for the official building census (2001)
41.738 buildings
Classification methods for numerical data
• A classification method subdivides a group of attribute data in classesaccording to the desired criteria.
• Classes group attribute data (features) with similar data, by assigning themthe same symbol.
• Each class has a lower and upper numeric limit(class breaks: minimum and maximum for the specific class).
• By changing the classes we create very different maps that changethe way we ready and translate the specific spatial unity (reference area).
• Apart from the following “technocratic” approach (following classificationmethods), a crucial factor to define classes is the very good knowledge
of the specific spatial variables, their behaviour, distribution and substantial
meaning (thematic approach).
Classification method: Manual
Normally we use this method if we want to emphasize particular patterns
by placing breaks at important threshold values,
or if we need to comply with a particular standard that demands certain
class breaks.
City of Patras, Greece,
Part of central area
Population densities
inh. / Ha
Classification method: Equal interval
This method divides the attribute range into equally sized classes, and is best
applied to familiar data ranges such as percentages.
Normally we use this method to emphasize the relative amount of attribute
values compared to other values.
City of Patras, Greece,
Part of central area
Population densities
inh. / Ha
Classification method: Quantile
Each class will contain an equal number of features. This method is well
suited to linearly distributed data.
City of Patras, Greece,
Part of central area
Population densities
inh. / Ha
Classification method: Natural Breaks
Classes are based on natural groupings of data values. In this method, data values are arranged in order. The class breaks are determined statistically by finding adjacent feature pairs, between which there is a relatively large difference in data value (minimizes the internal standard deviation for the data of each class). This is the default classification method in ArcGIS 9.2
City of Patras, Greece,
Part of central area
Population densities
inh. / Ha
Classification method: Geometrical interval
This method creates class ranges based on intervals that has a geometric sequence based on a multiplier (and its inverse). It creates these intervals by minimizing the square sum of elements per class, this ensures that each interval has an appropriate number of values within it and the intervals are fairly similar. This algorithm was specifically designed to accommodate continuous data. It produces a result that is visually appealing and cartographically comprehensive.
City of Patras, Greece,
Part of central area
Population densities
inh. / Ha
Classification method: Standard Deviation
Use this method to emphasize how much feature values vary from the mean.
Best used on normally distributed data.
City of Patras, Greece,
Part of central area
Population densities
inh. / Ha
References – Bibliography for further reading
• Mitchell A.,
The ESRI Guide to GIS Analysis. Volume 1: Geographic Patterns &
Relationships,
ESRI Press, Redlands, USA 1999
• Zeiler M.,
Modeling our World, The ESRI Guide to Geodatabase design,
ESRI Press, USA, 1999
• Online Help of ArcGIS 9.2 (and 10.1)
• Online Help of ArcView 3.3