gg3019/gg4027/gg5019 an introduction to geographical information technology and gis geographical...

31
GG3019/GG4027/GG5019 An Introduction to An Introduction to Geographical Information Geographical Information Technology and GIS Technology and GIS Geographical Information Geographical Information Systems and Geospatial Systems and Geospatial Data Analysis Data Analysis David R. Green G12 – 2324 [email protected] www.abdn.ac.uk/geospatial

Upload: claire-burns

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

GG3019/GG4027/GG5019

An Introduction to An Introduction to

Geographical Information Geographical Information

Technology and GISTechnology and GIS

Geographical Information Geographical Information

Systems and Geospatial Systems and Geospatial

Data AnalysisData Analysis

David R. GreenG12 – 2324

[email protected]

www.abdn.ac.uk/geospatial

Page 2: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Page 3: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Page 4: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Error is present at all stages in GIS• e.g. Data Capture & Data Analysis• Error is one form of uncertainty• Missing, incompleteness, mistakes,

and quality

Page 5: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Real World• Conception of spatial phenomena• Measurement and representation

of spatial phenomena• Analysis of spatial phenomena

Page 6: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• CONCEPTION• Spatial uncertainty• Vagueness• Ambiguity• Scale of Geographic Individuals

(zones/units)

Page 7: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• MEASUREMENT AND REPRESENTATION

• Accuracy and error• Measurement error• Data integration and shared lineage

Page 8: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• ANALYSIS• Spatial analysis and uncertainty• Aggregation and analysis (ecological

fallacy)• Scale and aggregation = Modifiable Area

Unit Problem = different results• Visualisation helps to study this problem

Page 9: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• The Ecological Fallacy is a situation that can occur when a researcher or analyst makes an inference about an individual based on aggregate data for a group. For example, a researcher might examine the aggregate data on income for a neighbourhood of a city, and discoverer that the average household income for the residents of that area is $30,000.

• To state that the average income for residents of that area is $30,000 is true and accurate. No problem there. The ecological fallacy can occur when the researcher then states, based on this data, that people living in the area earn about $30,000. This may not be true at all, and may be an ecological fallacy.

• Close examination of the neighbourhood might discover that the neighbourhood is actually composed of two housing estates, one of a lower socio-economic group of residents, and one of a higher socio-economic group. The poorer part of town residents earn on average $10,000 while the more affluent citizens can average $50,000. When the researcher stating that individuals who live in the area earn $30,000 (the mean rate) this did not account for the fact that the average in this example is constructed of two disparate groups, and it is likely that not one person earns $30,000.

• Assumptions made about individuals based on aggregate data are vulnerable to the ecological fallacy.

• This does not mean that identifying associations between aggregate figures is necessarily defective, and it doesn't necessarily mean that any inferences drawn about associations between the characteristics of an aggregate population and the characteristics of sub-units within the population are absolutely wrong either. What it does say is that the process of aggregating or disaggregating data may conceal the variations that are not visible at the larger aggregate level, and researchers, analysts and crime mappers should be careful.

Page 10: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

http://www.jratcliffe.net/research/ecolfallacy.htm

Page 11: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

The Modifiable Areal Unit Problem (MAUP) is a potential source of error that can affect spatial studies which utilise aggregate data sources (Unwin, 1996). Geographical data is often aggregated in order to present the results of a study in a more useful context, and spatial objects such as enumeration districts or police beat boundaries are examples of the type of aggregating zones used to show results of some spatial phenomena. These zones are often arbitrary in nature and different areal units can be just as meaningful in displaying the same base level data. For example, it could be argued that enumeration districts containing comparable numbers of houses are better sources of aggregation than police beats (which are often based on ancient parish boundaries in the UK) when displaying burglary rates. Large amounts of source data require a careful choice of aggregating zones to display the spatial variation of the data in a comprehensible manner. It is this variation in acceptable areal solution that generates the term 'modifiable'. Only recently (well, the last 20 years!) has this problem been addressed in the area of spatial crime analysis, where 'the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating.' (Openshaw, 1984 p.3).

Page 12: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

The MAUP consists of both a scale and an aggregation problem, and the concept of the ecological fallacy should also be considered (Bailey and Gatrell, 1995). The scale problem is relatively well known. It is the variation which can occur when data from one scale of areal units is aggregated into more or less areal units. For example, much of the variation in enumeration districts changes or is lost when the data is aggregated to the ward or county level. The aggregation problem is less well known and becomes apparent when faced with the variety of different possible areal units for aggregation. Although geographical studies tend towards aggregating units which have a geographical boundary, it is possible to aggregate spatial units which are spatially distinct. Aggregating neighbours improves the problem to a small degree but does not get round the quantity of variation in possibilities which remains.

Page 13: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Data Accuracy and Quality

•The quality of data sources for GIS processing is becoming an ever increasing concern among GIS application specialists

•With many GIS software on the commercial market and the accelerating application of GIS technology to problem solving and decision making roles, the quality and reliability of GIS products is coming under closer scrutiny

•Much concern has been raised as to the relative error that may be inherent in GIS processing methodologies

•While research is ongoing, and no finite standards have yet been adopted in the commercial GIS marketplace, several practical recommendations have been identified which help to locate possible error sources, and define the quality of data

Page 14: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Three distinct components, data accuracy, quality, and error

Accuracy

• The fundamental issue with respect to data is accuracy. Accuracy is the closeness of results of observations to the true values or values accepted as being true (estimates of the true value. The difference between observed and true (or accepted as being true) values indicates the accuracy of the observations

Basically two types of accuracy exist:

• positional accuracy• attribute accuracy

Page 15: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Positional accuracy is the expected deviance in the geographic location of an object from its true ground position

•This is what we commonly think of when the term accuracy is discussed

•There are two components to positional accuracy. These are:

•relative accuracy

•absolute accuracy

•Absolute accuracy concerns the accuracy of data elements with respect to a coordinate scheme, e.g. UTM

•Relative accuracy concerns the positioning of map features relative to one another

Page 16: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Relative accuracy is of much greater concern than absolute accuracy

• For example, most GIS users can live with the fact that their survey coordinates do not coincide exactly with the real world, but the absence of one or two units from e.g. a map may have costly consequences

•Attribute accuracy is equally as important as positional accuracy. It also reflects estimates of the truth. Interpreting and depicting boundaries and characteristics for forest stands or soil polygons can be exceedingly difficult and subjective

• Also the degree of homogeneity found within such mapped boundaries is not nearly as high in reality as it would appear to be on most maps!

Page 17: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Quality

• Quality can simply be defined as the fitness for use for a specific data set. Data that is appropriate for use with one application may not be fit for use with another. It is fully dependent on the scale, accuracy, and extent of the data set, as well as the quality of other data sets to be used.

• Spatial Data Transfer Standards (SDTS) often identify the following components to data quality definitions.

• Lineage - Positional Accuracy - Attribute Accuracy - Logical Consistency - Completeness Lineage

Page 18: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Lineage - historical and compilation aspects of the data such as the source of the data; content of the data; data capture specifications; geographic coverage of the data; compilation method of the data, e.g. digitizing versus scanned; ransformation methods applied to the data; and the use of an pertinent algorithms during compilation, e.g. linear simplification, feature generalization

Positional Accuracy - This includes consideration of inherent error (source error) and operational error (introduced error)

Attribute Accuracy - This quality component concerns the identification of the reliability, or level of purity (homogeneity), in a data set

Page 19: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Logical Consistency This component is concerned with determining the faithfulness of the data structure for a data set. This typically involves spatial data inconsistencies such as incorrect line intersections, duplicate lines or boundaries, or gaps in lines. These are referred to as spatial or topological errors

Completeness The final quality component involves a statement about the completeness of the data set. This includes consideration of holes in the data, unclassified areas, and any compilation procedures that may have caused data to be eliminated

Page 20: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• The ease with which geographic data in a GIS can be used at any scale highlights the importance of detailed data quality information.

• Although a data set may not have a specific scale once it is loaded into the GIS database, it was produced with levels of accuracy and resolution that make it appropriate for use only at certain scales, and in combination with data of similar scales.

• Error - Two sources of error:

• Inherent • Operational

• Both contribute to the reduction in quality of the products that are generated by geographic information systems.

Page 21: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Inherent error is the error present in source documents and data

• Operational error is the amount of error produced through the data capture and manipulation functions of a GIS

• Possible sources of operational errors include :

* Mislabelling of areas on thematic maps * Misplacement of horizontal (positional)

boundaries * Human error in digitizing classification error * GIS algorithm inaccuracies human bias

• While error will always exist in any scientific process, the aim within GIS processing should be to identify existing error in data sources and minimize the amount of error added during processing

Page 22: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Page 23: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Because of cost constraints it is often more appropriate to manage error than attempt to eliminate it!

• There is a trade-off between reducing the level of error in a data base and the cost to create and maintain the database

• An awareness of the error status of different data sets will allow user to make a subjective statement on the quality and reliability of a product derived from GIS processing

• The validity of any decisions based on a GIS product is directly related to the quality and reliability rating of the product

• Depending upon the level of error inherent in the source data, and the error operationally produced through data capture and manipulation, GIS products may possess significant amounts of error

Page 24: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• One of the major problems currently existing within GIS is the aura of accuracy surrounding digital geographic data

• Often hardcopy map sources include a map reliability rating or confidence rating in the map legend

• This rating helps the user in determining the fitness for use for the map

• However, rarely is this information encoded in the digital conversion process

• Often because GIS data is in digital form and can be represented with a high precision it is considered to be totally accurate

Page 25: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• In reality, a buffer exists around each feature which represents the actual positional location of the feature

• For example, data captured at the 1:20,000 scale commonly has a positional accuracy of +/- 20 metres

• This means the actual location of features may vary 20 metres in either direction from the identified position of the feature on the map

• Considering that the use of GIS commonly involves the integration of several data sets, usually at different scales and quality, one can easily see how errors can be propagated during processing

Page 26: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Example of areas of uncertainty for overlaying data

Several comments and guidelines on the recognition and assessment of error in GIS processing have been promoted in papers on the subject

• There is a need for developing error statements for data

contained within geographic information systems (Vitek et al, 1984)

• The integration of data from different sources and in different original formats (e.g. points, lines, and areas), at different original scales, and possessing inherent errors can yield a product of questionable accuracy (Vitek et al, 1984)

• The accuracy of a GIS-derived product is dependent on characteristics inherent in the source products, and on user requirements, such as scale of the desired output products and the method and resolution of data encoding (Marble, Peuquet, 1983)

Page 27: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• The highest accuracy of any GIS output product can only be as accurate as the least accurate data theme of information involved in the analysis (Newcomer, Szajgin, 1984).

• Accuracy of the data decreases as spatial resolution becomes more coarse (Walsh et al, 1987).

and

• As the number of layers in an analysis increases, the number of possible opportunities for error increases

Page 28: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

Tools to get a handle on uncertainty• Models of uncertainty: methods for

assessing and describing error• Error propagation (during analysis)• Fuzzy approaches (membership of

classes)• Sensitivity analysis (effect of errors)

Page 29: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Error assessment, reporting, interpretation - more difficult

• Quality of data: standards and metadata• But: No professional GIS currently in use

can present the user with information about the confidence limits that should be associated with the results of an analysis.

Page 30: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

• Chapter 6 - Longley et al.• Chapter 15 - Longley et al.

Page 31: GG3019/GG4027/GG5019 An Introduction to Geographical Information Technology and GIS Geographical Information Systems and Geospatial Data Analysis David

Error and Uncertainty in GIS

http://www.geog.ucsb.edu/~good/176b/m14.htmlhttp://www.colorado.edu/geography/gcraft/notes/error/error.htmlhttp://images.google.com/imgres?imgurl=http://www.geog.ubc.ca/courses/geog470/notes/images/sliver_polygon.gif&imgrefurl=http://www.geog.ubc.ca/courses/geog470/notes/error_accuracy.html&h=283&w=476&sz=4&tbnid=QH_DMozL2M4J:&tbnh=74&tbnw=126&hl=en&start=4&prev=/images%3Fq%3Dslivers%2Bin%2Ba%2BGIS%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2004-07,GGLD:en

www.sfu.ca/gis/geog_x55/web_354_new/icons/lec_11_error.pdf

Useful Links

http://www.yogibob.com/303_403_f_04/303_lecture5.html