integrating steering parameters, classification and aggregation

CENTRALE LANDBOUWCATALOGUS

0000 0513 3737

L^f &

Automated Spatial and Thematic Generalization Using a Context Transformation Model

Integrating Steering Parameters, Classification and Aggregation Hierarchies,

Reduction Factors, and Topological Structures for Multiple Abstractions

R&B Publications 105 Varley Lane, Kanata, Ontario, K2K1E6, Canada

01993 D.E. Richardson ISBN0-9697043-0-5

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical,

photocopying, recording, or otherwise, without the prior written permission of the author.

Printed in Canada by Cheriton Graphics Systems Ltd., Ottawa

Promotor: Dr. Ir. M. Molenaar Hoogleraar in de Theorie van de Geografische Informatie Systemen en de Remote Sensing

*)tiO&Z£>( , M

D. E. RICHARDSON

AUTOMATED SPATIAL AND THEMATIC GENERALIZATION

USING A CONTEXT TRANSFORMATION MODEL

Integrating Steering Parameters, Classification and Aggregation Hierarchies, Reduction Factors, and Topological Structures for Multiple Abstractions

Proefschrift ter verkrijging van de graad van doctor in de landbouw- en milieuwetenschappen op gezag van de rector magnificus, Dr. H.C. van der Plas, in het openbaar te verdedigen op dinsdag 11 mei 1993 des namiddags te vier uur in de aula van de Landbouwuniversiteit te Wageningen.

m

in 5

BlöLlUI'HEKN CÄNDBOUWUNIVERSUEQ

WAGENINGEN

fjuohiaty &&\

Stell ingen bij het proefschrift van D. E. Richardson

I

Comprehensive nik bases for generalization do not exist because there is a lack of knowledgeabouthowtogeneralize. If there isalackof knowledge about how to generalize then the knowledge needed to acquire and formulate the rules is severely impaired.

II

Tlueefimdamentalaiieas of loiowkdgeare important fo for context transformations. Then«tareaofknowledgeismgeneralizatfon,thesecondarea of knowledge is the data model environment; and the third area of knowledge is the logical data structures.

This dissertation

III

Tteuseofsteen^ parameters, wrdchcanbeœrisideredadecisionman^ with classification and aggregation hierarchies, and adjusted, if necessary, with the application of attenuation factors, or amplification factors provides considerable flexibility which can serve a wide variety of users.

This dissertation

IV

Inusingadatasetforanalysis, whenrepresenting itata lower resolution, it is important thatthegeneralizationappliedtoitisdonesoœr>sistenUy,c43Je<^ely,andmawaythatany user can repeat.

This dissertation

The tasks involved inspatial and thema ticgeneralizationshould no longer be viewed purely as representation issues, but rather, issues of key importance in reducing the need to populate national databases at different resolutions with redundant features.

VI

Making maps simple does not change the world; it only lets us treat it, for certain purposes, as if it were uncomplicated.

Muehrcke, P. 1978. Map Use, Reading, Analysis, and Interpretation. JP Publications, Madison, Wisconsin, US. A.

VII

The concept of information is intuitively familiar but nevertheless difficult to define independently of any communication process or application context

Biais, J. AJL1987. Theoretical Consideration for Land Information Systems. The Canadian Surveyor, Vol 41, No. 1 pp, 51-64

vm A land information system may be regarded as having infological and datalogical

constituents, where infological refers to human concepts and real world representations, while datalogical relates to computer concepts and machine representations.

Tsfchritzis, D. C, and F. H. Lochovsky. 1981 Data Models. Prentice Hall, Inc, Englewood Cliffs, New Jersey.

IX

Geographicpherßmenacanbecharacterized by threegeneral infological cor^tituents; œntent,pœitioi\ardtiirœ,and/canbemanipulateddatalogica^^ temporal domains, respectively.

Based on a statement made in M. Feuchtwanger. 1992. Geographic Semantic Database Modelling. Doctoral Dissertation.

The concept of an object may be clear without being spatially distinct, butit cannot be spatially distinct unless it is conceptually clear.

Based onastatement by René Descartes in his "PrincipleXLVI'inthePrmciples of Human Knowledge

XI

Theorizing about solutions is easier than solving problems.

From a partial statement in Mbnmonier, M. 1991. The role of interpolation in feature displacement In Buttenfield BP, McMaster R.B. (eds) Map Generalization: Making Rules for Knowledge Representation, Longman, London, pp 189-204

XII

If one holds an opinion it is acceptable but if an opinion holds an individual, it is obstinance.

XIII

Scientific Creativity is the same as artistic creativity. The only limitations are tine tools and imagination.

With love and gratitude For her gentle nature, grace and courage

And for all she gave, I dedicate this book to my mother,

Barbara Mary Richardson

Abstract

Automated Spatial and Thematic Generalization Using a Context

Transformation Model:

Integrating Steering Parameters, Classification and Aggregation Hierarchies, Reduction Factors, and Topological Structures for Multiple Abstractions

This dissertation presents a model for spatial and thematic digital

Îgeneralization. To do so, the development of digital generalization over the ast thirty years is first reviewed. The first of the three epochs of generaliza

tion is discussed with reference to its graphic orientation while the second epoch in generalization is discussed in the context of algorithmic efficiency. This is followed by a review of contemporary models developed for generalization in which more comprehensive approaches to the subject have been addressed. The approach to generalization taken in this research differs from other existing works as it tackles the task from a database perspective rather than a representation perspective. Accordingly, a context transformation model is presented that can automatically decrease representation density through the application of steering parameters, classification and aggregation hierarchies, and topological data structures. These processes act in different ways on the database environment and provide considerable flexibility within which users can operate. The underlying philosophy in the development has been to provide a user environment with as much flexibility as possible since a diversity in their requirements is the expected norm. In these processes, data structure plays an important role for building objects, subclasses, classes and superclasses which when manipulated by the user can be used efficiently for selection, elimination, object reclassifications, replacements and aggregations.

These techniques render spatial and thematic generalization and provide a very flexible and powerful environment. By utilizing certain data structures and database operations, the automatic decrease ofentity, object, subclass, and class densities can be invoked and controlled through a user interface. As well, the system provides users an opportunity to evaluate the generalization processes according to reduction factors that quantify the amount and type of data that will appear at the derived level of representation. Because of the data structures used, these reduction factors can be calculated on an entity, object, subclass, class, and superclass level in an easily understandable format. The user is also afforded the opportunity of controlling the amount of generalization that takes place. This is achieved by means of attenuation factors combined with the steering parameters. The model for context transformations provides permutations for generalization suitable for a whole spectrum of user requirements that can range from analytical processing needs to map design. The format of the modelis such that any smaller scale representation can be derived from the original input data.

vu

PREFACE

This research focuses on the development of techniques to automatically reduce the density of objects, and object classes appearing on maps. Many of the developments in digital generalization techniques have addressed coordinate data complexities but little has been done to resolve some of the 'thematic content' issues. With the increase in large spatial and thematic databases especially in large mapping and remote sensing organizations, database aspects of spatial and thematic generalization needed tobe addressed in an effort to solve some of the complexities involved in map or representation content. Once the content of a representation has been determined, many of the existing algorithms that alter coordinate data can be applied, such as simplification, displacement, and smoothing.

In large mapping organizations, representations from a GIS environment should be easily available to comply with many different contextual needs presented by users. Accordingly, the effort made in this research has been to create a model that provides maximum flexibility within a systems environment so that representations of spatial and thematic data can be available for a wide range of requirements. To achieve this the user environment needed to be investigated. This was done in a three part survey, as well as from knowledge gained in having worked in a large mapping and remote sensing environment. The findings of the su rvey were instrumental in building a portion of the model for generalization.

Another area that needed to be examined so that flexibility in representations could be achieved, is the data model environment. This is a seldomly researched area in the context of spatial and thematic generalization, yet, the data model environment can be used in a very powerful way to decrease object and class densities, through selections, eliminations, aggregations, and dass generalization. The data model environment can also support automatic object replacements which are sometimes needed following object elimination activities. These operations became realized by translating the guidelines provided by a conceptual data model to the logical data structures.

Many of the ideas that evolved throughout this research came as a result of working with the Canada Centre for Mapping. Within the CCM, the lack of available generalization models meant that data were collected at different resolutions, stored, and so on, with all the resource intensive maintenance requirements. Key issues were apparent, such as uncertainty

ix

from one resolution and database to another, differences in coding, temporal differences, object and classification differences and many other problems that result from the absence of effective and available tools to automatically derive the lower resolution representations. Hence, it appeared that the development of tools that would automatically derive lower levels of representation would be a useful product for the Canada Centre for Mapping and Remote Sensing.

Throughout this project, the Surveys, Mapping and Remote Sensing Sector provided support, without which, this research would not have been done. Many people have provided needed assistance during the project and I extend my sincere appreciation to all. A few people, however,should be singled out for their contributions. In particular, I am very grateful to my promotor, Dr. Martien Molenaar for his enthusiastic support and efforts on my behalf. His critical reviews and encouragement are most appreciated. I would also like to express special thanks to Dr. David Goodenough for his comments and encouragement at various stages of the research. Others provided encouragement and aid at critical times in this study and for this, special thanks go to Ir. John Van Smaalen for programming support as well as moral support during the implementations both in Canada and the Netherlands. I would also like to thank Pat Lloyd and James Lee for systems support at the Canada Centre for Mapping, and, for systems support in the Netherlands at the Landbouwuniversiteit te Wageningen, I would like to express special thanks to Ir. John Stuiver. As well, many thanks are expressed to Andrew Payer for his kind assistance and understanding. And, finally, grateful appreciation is expressed to my father for long distance moral support, communication, and affection.

Table of Contents

ABSTRACT vii

TABLE OF CONTENTS xi

LIST OF TABLES .................................................................................................... xiv LIST OF APPENDICES » v

CHAPTER 1 PURPOSE AND ORGANIZATION 1

1.1 Introduction 1 1.2 Justification and Need for the Study „. 2 13 Statement of the Research Problem 3 1.4 Objectives 4 15 Significance of the Research 5 1.6 Limitations of the Study 7 1.7 Methodology 8 1.8 Organization of the Dissertation 9

2 REVIEW OF RELATED RESEARCH 11 2.1 Introduction 11 2.2 Three Epochs of Development in Automated Generalization 12 2.3 Assessment of Contemporary Sta tus in Automated Generalization 17

3 DATA MODELLING AND DATA DEFINITION „ 19 3.1 Introduction 19 3.2 Determining the Data Model 20

3.2.1 Data Selection and Application _ 20 3.2.2 Data Abstraction and Spatial Data Modelling 22 3.23 Definitions Required for Data Modelling 24

33 Selecting the Conceptual Data Model 29 3.3.1 The Formal Data Structure Model 29 33.2 TopologicalRelationships 31

3.4 Summary 34

4 CONCEPTUAL MODEL FOR GENERALIZATION 35 4.1 Introduction 35 4.2 Components of the Conceptual Model for Context Transformations 35

4.2.1 General Contextual Considerations 36 43 Formulation of the Necessity Factor and Classification and Aggregation

Hierarchies 41 4.3.1 The Necessity Factor 41 4.3.2 Classification and Aggregation Hierarchies and Generalization 46 433 Examples of Necessity Factor Applications with Classif ication

and Aggregation Hierarchies 50 43.4 Three Strategies of Generalization Using the Necessity Factor 55

4.4 Summary 56

xi

5 DATA CERTAINTY AND GENERALIZATION 59 5.1 Introduction 59 5.2 Uncertainty in a GIS Environment 59

5.2.1 Useof Reduction Factors for Generalization 63 5.2.2 Reduction Factor Calculations 63

5.3 Integration of the Necessity Factor Classification and Aggregation Hierarchies and Reduction Factors 68

5.4 Summary 70

6 IMPLEMENTATION OF THE CONTEXT TRANSFORMATION MODEL 71

6.1 Introduction 71 6.2 Database Environment 71 6.3 Database Design 72

6.3.1 Geometric and Attribute Database Structures 72 6.3.2 NecessityFactor „ 81 6 33 Organization of Data 81 63.4 Reduction Factors 86

6.4 ProgramDesign 88 65 Summary 92

7 ASSESSMENT OF RESULTS 93 7.1 Introduction 93 7.2 Formal Data Structure Model Support for the Context Transformations 93

7.2.1 Generalizing Geometric and Thematic Components in an Integrated Way 94 722 Generalization of Thematic Data From One Level to Another 95 7.2.3 Geometric Changes as a Result of Context Transformations 95

73 Results of the Context Transformations 96 73.1 Populated Places 96 73.2 Hydrography 101 733 Landcover 105

7.4 Context Transformations in Relation to Data Analysis and Map Design 107 75 Summary „ 117

8 CONCLUSIONS 119 8.1 General Conclusions 119 8.2 Specific Conclusions 123 8.3 Future Research 125

APPENDICES 127 BIBLIOGRAPHY.... ........................................................................ 137 SAMENVATTING 145 CURRICULUM VTTAE 149

Xll

List of Figures

3.1 General Definitions for Selected Thematic and Geometric Test Data 21 32 Geographic, Database, and Geometric Terminologies 25 33 Fundamental Structure for the FDS 30 3.4 The FDS for a Single Valued Vector Map for a Two Dimensional Space 32 35 Spatial Relationships Among Data Types 33 4.1 Overview of System Design for Context Transformations 40 4.2 Classification Hierarchy and Generalization Strategies for Populated Places 49 43 Aggregation Hierarchy for Hydrography 54 4.4 Generalization Strategies Using the Necessity Factor. 58 5.1 Examples of Attribute and Location Data Reliability and the Associated

Type of Uncertainty 62 5.2 Application of Necessity Factor and the Calculation of Reduction Factors in

Relation to the Superclass Data Structure 69 6.1 Arc and Node Reference for the Representation of a Hydrographie Feature 74 6.2 Double Line River and a Pseudo Node Requirement for Stream Ordering 74 63 Object Formation with a Logical Data Structure 75 6,4 Spatial and Thematic Data Structure for the Rivers Object Class 76 65 Stream Orders and Arc References Showing the Connectivity of Stream

Network Topology Through Lakes 78 6.6 Landcover Classification Structure 80 6.7 Data Design for Hydrography Superclass Layer 83 6.8 Thematic Data Design for Landcover Superclass Layer 84 6.9 Thematic Data Design for Populated Places „ 85 6.10 Selections from the Core Program Module 89 6.11 Overview of Program Flow 91 7.1 Strahler's and Horton's Stream Orderings 104 7.2 Original Landcover Dataset Plotted at 1:30 million I l l 73 Superclass Generalization on Landcover Data at 1:7.5 million 113 7.4 Superclass Generalization on Landcover Data at 130 million 113 75 Aggregation on Landcover Data at 1:125 million 113 7.6 Original Data Captured at 1 kilometre Resolution and 1:2 million prior to

Context Transformations with Rule-based Generalization 115

xiu

List of Tables

4.1 Map Object Gass Requirement Ratings to Support Landcover Mapping 44 4.2 Map Object Functionalities at 1:2 Million 44 4.3 Map Object Functionalities by Scale 45 4.4 Necessity Factor for Mapping Landcover at Four Scales 45 6.1 Reduction Factors for Mapping Landcover data at 1:7.5 million with No

Attenuation Factors 87 7.1 Map Object Functionality (MOF) Rating for a 1:7.5 million Map

Representation 97 7.2 Necessity Factors including MOF Attenuations for Mapping Populated

Places on a Landcover map at 1:7.5 mi I lion 98 7.3 Frequencies of Populated Places with Different Attenuation Factors for

Mapping Populated Places on a Landcover map at 1:7.5 million 98 7.4 Frequencies of Populated Place Distribution According to Generalization

Strategy Using a . 1 Attenuation Factor 99 7.5 Generalization Results From the Three Strategies on Populated Places 101 7.6 Results of Horton and Strahler Class Generalization with .8 Attenuation 102 7.7 Results of Attribute Value within Subclass Generalization on Landcover Da ta ... 105 7.8 Results of Superclass Generalization for Landcover Data 106

List of Appendices

3.1 Data Model Definitions 129 4.1 Overview of Survey Undertaken to Develop the Necessity Factor 131 4.2 Definitions of the Five Functionalities of the MOF 133 4.3 Stream Order Topology 135

xiv


CHAPTER 1

Purpose and Organization

1.1 Introduction

The activity of map generalization is one of the central concepts involved in map design. This process of transforming the real world into a map is considered the cartographic conceptualization and visualization of reality. It is also known as cartographic abstraction. It is the physical reality of our world compressed in a symbolic way. In an artistic sense, the abstraction of reality is like a caricature in which certain features are emphasized and others are not in an attempt to present some particular aspect of the geographical environment [Muehrcke, 1978].

In traditional map making, the abstraction process involves activities such as selection, classification, simplification, and symbolization. Generally, traditional cartographers consider map purpose and potential use when selecting objects to appear on a map. Simplification involves the determination of important characteristics of the data such as line character and its possible exaggeration or enhancement. Data are often grouped into categories of information to form a classification scheme such as soil or vegetation classes and finally symbolization is applied as a graphic form or sign for expressing information or an abstract idea.

The need for generalization in traditional cartography gave rise to a number of definitions such as those developed by the Swiss Society of Cartography, the Defense Mapping Agency, and the International Cartographic Association. However, it remained that in cartographic generalization the cartographer graphically translated complex interdependent decisions concerning map content, classification, simplification and symbolization with no dear underetanding or unified theory for the actual abstraction process.

With the advent of automated information systems, the introduction of technical developments radically changed the mapping process. The development and proliferation of geographic information systems have enabled the input, storage, manipulation, and display of geographically referenced data from a vast range of sources. Increasing amounts of data coupled with greater sophistication in application software programs can augment the production of maps, graphics, and textual products. Increasingly, large user

D. E. Richardson

groups such as planners, environmentalists, cadastral and topographic surveys and policy makers require immediate access, data manipulations, and combinations of spatially referenced data.

The vast amounts of data now available in digital format can easily be manipulated for the production of multiple maps or representations from the same database. With all the apparent benefits afforded by technology such as flexibility and variations in design, limitless combinations and selections of data, scale transformations and the like can be made. However, on the same token, no facilities yet exist to manipulate or change geometric and thematic data so it can automatically be represented at smaller scales. This in particular becomes an important consideration in large mapping organizations. Because of long standing expectations from users, digital data are frequently collected and stored at resolutions consistent with paper map series. Digital topographic data in Canada for example are usually captured at levels consistent with paper map production at 1:50 000 and 1:250 000. Thematic, and subsets of topographic data, are also digitally collected in Canada at 1:1 million, 1:2 million, 1:7.5 million, and so on as separate datasets. Considerable portions of this data are redundant, such as roads, hydrography, populated places, etc

Consequently, many mapping agencies are now facing the digital collection, storage, and maintenance of datasets at a number of different resolutions. These data are collected independently of one another since no systematic technique has yet been developed to efficiently derive smaller resolutions from more detailed datasets.

This profusion of spatial and thematic data maintained at different resolutions results in a number of costly problems. Maintenance becomes a major issue as does updating. Data certainty is also an issue since data collected at different scales are geometrically mismatched. Data integration becomes problematic as do data definitions and coding. Many of these issues can be resolved if an efficient generalization module were available in a systems environment to automatically derive smaller scale representations from more detailed datasets and subsequently eliminate the need to digitally capture redundant features at various lower resolutions.

1.2 Justification and Need for the Study

Contemporary automated generalization usually revolves around activities of selection, classification, simplification, displacement, exaggeration, and symbolization. While these activities are applied both to the


geometric and thematic elements in a map representation, their involvement is not equally distributed. For example, simplification, displacement, and exaggeration are tools used for the alteration of graphic presentation of spatial objects, and their application is usually, but not exclusively, motivated by legibility requirements. On the other hand, selection and classification affect primarily the thematic content and their application is prompted by the type of information which must be displayed. Hence, in the field of generalization, the former activities emphasize form, whereas the latter emphasize content.

The majority of previous efforts in automated generalization have been concerned with graphic representation rather than thematic database density. The graphic emphasis has been prompted largely by the need to maintain map legibility throughout the process of scale reduction. However, this paradigm in a GIS environment may not be effective when dealing with very large databases due to the heavy computation requirements. Additionally generalizations derived from a representation perspective have not thus far been based on a systematic approach that focuses on the context of the derived product or the information in the database. As a consequence, with the enormous amounts of data presently being collected and stored in national mapping agencies, an alternative approach is examined here from a database perspective. By doing so, logical techniques for selection, elimination, replacement, reclassification, and aggregation can be developed as a major step in generalizing geometric and thematic data. Through the database techniques many options in generalization can be automatically accomplished. A major aspect to these techniques is the derivation of a logical data subset on which other routines can be invoked. The more graphic concerns such as simplification, displacement and conflict resolution, can be applied once a suitable and logical subset of the database has been derived.

1.3 Statement of the Research Problem

Over the last thirty years of development in digital generalization, the field has been dominated with a graphic orientation. The approaches developed have usually been limited to single objects, such as lines, and often taken out of context. Little has been accomplished in developing automated generalization processes that are undertaken in the context of the user's needs and the subject being mapped or represented. Almost no work is available in the literature that addresses these issues in generalization from a database perspective yet it is hypothesized here that major strides forward

D. E. Richardson

can be made by approaching the generalization task through database operations and data structures.

1.4 Objectives

The first objective of this research is to explore and develop a different approach to generalization that exploits the database environment and the underlying data structures, and utilizes standard data classifications. To do so, the research will focus on when generalization should occur, why it should occur and how objects should be affected according to context. This portion of the research will present a necessity factor that provides steering parameters to determine in a logical way the density of objects that are to appear at a derived scale. Since the context within which a generalized representation can change from user to user the steering parameters must function within an environment that provides flexibility. This flexibility must be available at the object composition level, subclass, class and superclass levels of geometric and thematic data. Thus representations of spatial and thematic data can be automatically transformed at the entity level through to the superclass level. Within this spectrum of data classifications these facilities can render at the lowest level, object distributions that are transformed at the entity level and suitable for map analysis purposes or, at the highest end of the spectrum can provide object distributions in aggregated formats suitable for very small scale representations.

The second objective that must be realized to ensure provision of these capabilities, is to examine the data model environment and determine how data can be structured and manipulated in ways that comply with the requirements of generalization. Thus the conceptual data model must be represented by logical data structures that allow automatic object reclassification according to scale changes, when appropriate. The conceptual data model must also provide topological data structures so that object connectivity is maintained throughout the generalization processes. This is important in generalizing hydrographie data for example when lakes are removed from in the midst of river networks. The selection of a conceptual data model must be extended to logical data structures that support the needs for generalization.

A third objective is to create a means for controlling the amount of generalization. Here the objective is to move away from interval generalization based on discrete scales and to develop a technique that allows continuous generalization. Thus, using the steering parameters which are initially


established as a guide for developing representations based on scales, a module should be available that provides a technique to increasingly decrease entity, object, subclass, class and superclass densities.

The fourth objective is to develop a technique that allows users to assess the results of their selected strategies of generalization. The technique developed should quantify in a straightforward manner, the amount of reduction to the database at entity, object, subclass, class, and superclass levels. A process that quantifies the amount of change to the data and the database representation allows users to invoke techniques for controlling the generalization process. As such the control process presented in the third objective should be available for invocation following a quantification of the generalization.

1.5 Significance of the Research

Since the introduction of computer technology to the mapping disciplines, considerable activity has been invested in exploring generalization techniques, yet major issues still remain unresolved. Much of the development work in digital generalization has been in an attempt to simulate the work of traditional cartographers, which as Muller [1990] points out is one of the most difficult challenges in the cartographic research agenda of the 1990s. The premise that digital generalization should simulate the work of traditional cartographers bears some examination.

In traditional generalization, the cognitive processes involved with activities such as selection, simplification, classification, and symbolization could be executed in a single pen stroke. That is, the graphic execution of generalization in the traditional environment was generally done inclusively. In a computer environment, each activity must be defined and often executed individually. So unlike traditional generalization, contemporary developers for digital generalization felt rules must be developed for how to select, simplify, displace, aggregate and so on. This would be a relatively simple task if every cartographer and every map user had the same view on these activities. Robinson, [1960] in his second edition of Elements of Cartography alluded to this difficulty in his statement "many cartographers have attempted to analyze the processes of generalization, but so far it has been impossible to set forth a consistent set of rules that will prescribe what should be done in each instance."

Since the requirements for generalizing geometric and thematic data can vary so extensively with the user's perception, map context, scale of

D.E. Richardson

representation and so on, basing an automated generalization model on a paradigm that has evolved from the manual environment maybe impractical. If rules for generalization were not and perhaps could not be developed for all situations in a manual environment it is unrealistic to believe they can be created to address all situations in a digital environment. Rather than creating exhaustive rule sets, this research presents a generalization model, called context transformations, in an environment that is completely flexible and as such can comply with a wide range of user requirements. It is structured in such a way that the user has control over the generalization processes, both with respect to which processes or strategies are to be invoked and to what extent. This paradigm moves away from the predetermination of a rule set for generalization and instead offers a generalization environment that users can exploit and control. In chapter eight, we will see that through testing and experimentation, a rule set can be derived and correlated with recommended techniques for generalization according to user needs.

In contemporary generalization research, most of the rule oriented developments focus on the representation. The representation of geometric data and thematic data is very important, however, with the magnitude of digital data being collected and stored, generalization processes should be examined and solved to a greater extent through database operations. Once the amount and combinations of data from a database have been determined for representation at a lower resolution, the graphic algorithms developed throughout the first ten years of digital generalization can make a major contribution to developing a comprehensive approach to this complex problem.

With extensive geometric and thematic databases being collected from satellite imagery and topographic and thematic maps, the developments presented in this research provide techniques for automatically reducing the density of data for representations at any smaller scale. It will be shown that a profusion of options are available within the context transformation model for density reduction and alterations. Techniques are provided for controlling the amount of generalization, and quantifications reflecting the extent to which generalization is performed is made automatically available to the user. This research examines generalization from an alternative perspective to contemporary research. First it explores generalization from a database perspective rather than a representation perspective and second it is not based on a large rule-base but rather presents a generalization model in a completely flexible environment from which a mapping advisor reflecting


mapping application requirements can later be generated through experimentation by a diverse user group.

1.6 Limitations of the Study

The context transformation model incorporates the necessity factor, attenuation factors, and reduction factors. These three components of the generalization prototype are limited in the following ways. The necessity factor has been derived from a survey of data at scales ranging from 1:2 million to 1:30 million. It does not therefore address topographic scales. The methodology and concept, however, is generic This factor can be applied to each type of data, i.e. points, lines, and areas, and their subsequent entity, object, subclass, class, and superclass definitions. The criteria specified by the necessity factor provides a rationale for why, when, and how much data should be represented at a lower resolution. A subset of rules address which objects should appear based on entity, object, subclass, class and superclass level attributes and attribute domains. Limitations of the necessity factor lie in its formulation. As no model previously existed as a guide for thematic mapping the necessity factor has been formulated to reflect user's needs according to various seal es of representation. At each level of representation, the necessity factor provides a guide for the amount of data selection, elimination, and replacement. This is accomplished based on map object class functionalities, and object class relationships to thematic subjects.

The attenuation factors are limited in this research to 0 < f < 1. This factor can be applied to the necessity factor to increase the intensity of the generalization and accordingly decrease data densities. In the systems implementation this factor has been limited to single decimal values for testing purposes. Amplification factors can also be applied to the necessity factor to increase data density once the system has derived a selection. The amplification factors can take the form f > 1, however, this has not been implemented in this research.

Reduction factors that quantify the amount of data remaining once the generalization processes have been executed are presented at the entity, object, subclass, class and superclass levels. Values for both the geometric changes and thematic changes are presented. These are provided as a means to evaluate how much data has been retained and how much has been eliminated or altered. As this is a measure for users to judge the generalization processes, they are presented in a straight forward and simple manner in the form of actual density numbers of entities, objects, and subclass, dass

D. E. Richardson

and superclass compositions. Additionally, percentages of change according to the above mentioned levels are provided. Reduction factors could be given in a more sophisticated manner such as ratio of change in the numbers of entities, objects, or subclasses and so on, or standard deviations could be used. However it is the opinion of this author that if reduction factors are to be functional they should be straight forward. More complex measures begin to reflect data reliability and accuracy.

The necessity, attenuation, and reduction factors operate in conjunction with a geometric and thematic database which will be maintained in a geographic information systems environment. The conceptual data model selected for testing is based on two dimensional geometry in which objects are described as line segments, points, and polygons. Thematic aspects of geometric objects are provided as attributes. The dataset selected for testing will be limited to the southern portion of the province of Saskatchewan and will include geometric and thematic data for populated places, hydrography, and landcover. The original scales of data collection are discussed in Chapter 6. The systems used for implementation and testing consist of Arc/Info and Oracle operating in a Unix environment.

1.7 Methodology

Three fundamental areas of knowledge are important for the development of a model for context transformations. The first area of knowledge is in generalization, which is the activity used to select, eliminate, replace and rearrange geometric and thematic data in a manner suitable for representation at lower resolutions. The second area is the data model environment which must present a suitable structure for generalization and data manipulation. The third area to be explored is the logical data structures that are built according to the conceptual data model and the requirements specified by the users for generalization.

In implementing the context transformation model, three levels are addressed throughout the research. The first level is the external level which reflects the user's needs. In this context, mapping requirements are assessed along with requirements for changes in context of a map or representation. The next level addressed is the conceptual level. This is the model view which should reflect the external level user's requirements. At the conceptual level, the context transformation model is developed and designed. Also at the conceptual level, the data model environment is assessed according to the context transformation requirements. An existing data


model is extended to meet requirements for generalization. The next aspect addressed, is the logical structure level. Here logical data structures are investigated and designed to reflect the conceptual data model and the requirements of the context transformation model.

1.8 Organization of the Dissertation

The dissertation provides a review of the status of automated generalization in Chapter 2. The last thirty years of development are discussed followed by a section on some current shortcomings in this area of research. Chapter 3 presents the requirements for the conceptual data model. Included are discussions on the data selection, applications and spatial modelling needs followed by the selection of a conceptual data model. This data model is reviewed according to the data modelling needs. Chapter 4 presents the conceptual model for generalization and the components of the model are examined for different strategies in generalization using classification and aggregation hierarchies. In Chapter 5 data certainty and its importance for generalization are introduced followed by the reduction factor calculations which can be examined at the entity, object, subclass, class, and superclass levels. The physical implementation along with the database design for thematic and geometric data are presented and illustrated in Chapter 6, while Chapter 7 presents the results of the context transformations. These are presented according to the various strategies in generalization that can be used in the context transformation model. Finally, in Chapter 8, general conclusions are given along with more specific observations on the results and contribution of this research.

D. Ë. Richardson

10


CHAPTER 2

Review of Related Research

2.1 Introduction

Activities in digital generalization have been ongoing over the last three decades. Solutions to generalization have been actively sought during this period for a number of reasons. Initially with the development of automated systems, storage restrictions and processing constraints were an incentive for the development of techniques to reduce coordinate pairs in line data. Maintaining graphic legibility with scale reduction was also an equal partner for concern. From an initial focus on coordinate pair reduction, digital generalization became involved with other activities like selection, classification, amalgamation, displacement and symbolization. These activities effect both the thematic and geometric aspects of objects. For example, selection, classification, and amalgamation concentrate on thematic composition whereas simplification and displacement are concerned with graphic representation and largely focus on map legibility. Thus, the former activities address map content while the latter address form.

In the transition from manual techniques in generalization to automated, the application of generalization processes changed radically. In manual generalization the cognitive processes involved with activities like selection, simplification, classification, and symbolization can be graphically achieved with a single application of the pen. In other words, the graphic execution of these activities is achieved inclusively. In a computer environment, however, each activity is executed individually. Thus, because of the different processes involved, achieving the same results in an automated environment as those achieved in a manual environment has been problematic. What was relatively straight forward in a conventional environment has proven to be very complex in the digital environment. As a result, three epochs of automated generalization research and development have transpired [Kilpeläinen, 1992, and Buttenfield et al, 1991].

The first epoch, starting in the early sixties, was concerned with the development of the individual steps required to generalize graphic representations. The second era was involved with testing the results achieved from the first epoch, and finally, the third epoch, which started in the late 1970s has been involved with the development of more comprehensive

11

D. E. Richardson

models for generalization. The next sections of this chapter review the developments made in

generalization over the last thirty years. As several good reviews are already available, [Zycor, 1984, McMaster, 1987, Muller, 1987, Buttenfield and McMaster, 1991] the review provided here is intended as an overview and an assessment of areas in which further research and development are required.

2.2 Three Epochs of Development in Automated Generalization

Throughout the first ten years of development in generalization a profusion of programs were developed both in Europe and North America. Most research and development during this stage was directed at linear feature generalization in the form of simplification of digital lines.

Considerable efforts were made to find efficient techniques for eliminating coordinate data in line features. A number of reviews are available such as Zycor's [1984] in which more than twenty five techniques for linear feature simplification are reviewed. McMaster, in 1987, reviewed over twelve and Muller, 1987, reviewed about ten in an analysis to assess the angularity and areal deviation between fractalization and line simplification techniques [Richardson, 1988].

During the development phase in linear feature simplification algorithms, about six categories of processes evolved. These consisted of independent point algorithms, local processing routines, unconstrained extended local processing, constrained extended local processing, fractalization, and global routines. One of the early developers in this area was Tobler, [1966] who created simple algorithms that select every nth coordinate pair. According to McMaster in 1987, however, this form of independent point algorithm was not a very acceptable approach for high quality results. Tobler also developed techniques in local processing routines using the characteristics of adjacent coordinate pairs to decide whether to retain coordinate pairs or eliminate them. Algorithms, such as this, function either by rejecting coordinate pairs if they are closer than the pen width used for display or by eliminating coordinate pairs according to pre-defined distance variables. Jenks, [1981] also worked in this area and developed two well known algorithms, the 'perpendicular distance algorithm' and Jenk's angular algorithm. The perpendicu lar distance algorithm works by défini ng a straight line between two points, PI and P3. A perpendicular line is then defined

12


lar line to P2 is greater than the tolerance, then P2 is retained to give the line character. In his algorithm for angular change, an angular tolerance is set and subsequently measured between points PI, P2, P3, and P4. If the angular tolerance between PI and P3 is greater than the tolerance then P2 is retained. If the tolerance value is less, say between P2 and P4, then P3 is rejected.

In unconstrained extended local processing a different approach was developed in which these algorithms search line segments rather than being restricted to a few coordinate pairs. The extent of the search depends on line complexity and coordinate density. A number of algorithms have been developed of this nature, most notably the Reumann-Witkam [1974] algorithm. In this algorithm, two parallel lines define a search region according to a predefined slope. The line is processed sequentially until the edge of the search corridor intersects the line. Reumann-Witkam also developed an enhanced strip algorithm which provided rigorous critical line definition, a test for vertical critical lines, a check for inflection points, and a factor enabling construction of extended critical lines.

In constrained extended local processing, algorithms are constrained in that the search region of the algorithm is restricted by a minimum and maximum distance check. A number of developments were made in this category of linear feature simplification, and of note were Lang, [1969], Opheim, [1981 and 1982] and Deveau [1985]. Algorithms of this nature identify sections of lines or linear features and then simplify it. In the Lang algorithms for example, a straight line is defined between PO and P5 and if the distances of P2, P3, and P4 exceed a given tolerance the end point P5 is repositioned to P4. Then the new P0-P4 distances are calculated. When the distances of the intervening points are less than or equal to the tolerance the straight line is kept. The last end point becomes the new PO [McMaster, 1987],

Opheim's approach to constrained extended local processing involved the use of a circle of diameter x, with a search corridor. When the line falls outside the search corridor a new search corridor is established and the last point in the search corridor is saved. Deveau on the other hand developed an algorithm in which the first point in a line is used to fix one degree of freedom of a line through the centre of a tolerance band. This leaves one degree of freedom to fit the band to subsequent points. The band's orientation is defined by extreme angles in the original line. The centre line between these two positions is maintained by the algorithm [Deveau, 1985].

Some work has been achieved in using fractalization which involves the algorithmic replication of the relationship between large scale structure

13

D. E. Richardson

and small scale detail as a method for enhancement. Mandelbrot [1977,1982] developed the geometric notion of fractal self-similarity and in 1981, Dutton determined that linear generalization should include simplification and enhancement and tested fractalization using a sinuosity dimension to control line curvilinearity.

During this period, perhaps one of the best known algorithms for linear feature simplification is the global routine developed by David Douglas [1973]. This is the only global line generalization routine frequently used and appears in many cartographicsystems and geographic information systems.

This period in generalization research was followed by the second epoch in which algorithmic efficiency was assessed. The focus for assessment, not surprisingly, was linear feature simplification. A number of studies are available in the literature in which selected algorithms have been assessed for their performance in eliminating coordinate pairs. McMaster, [1986], determined that of thirty different measures to assess algorithmic performance six statistical categories could be considered. These consist of changes in the numbers of coordinates, and angularity, ratio of change in the standard deviation of the number of coordinates per inch, changes in vector length, and areal differences, and ratio of change in the number of curvilinear segments. Studies were undertaken by White [1985] on Tobler's nth point elimination, Jenks's perpendicular calculation and Douglas's algorithm were evaluated. Varizella, [1988] evaluated eight linear simplification algorithms and found the Douglas algorithm to be the most effective for generalizing 1:20 000 topographic features for a representation at 1:250 000. In a later review by Muller, [1987] the preservation of fractal dimensions were evaluated for use as a guiding standard for the automated generalization of statistically self similar lines. In his review, Muller points out that simplification algorithms do not preserve fractal dimension of statistically self similar lines. Eight algorithms were tested for angularity and areal deviation using one complex line and one simple line. Angularity is measured by using a formula for calculating average angularity between successive segments in a line, while areal deviation is a measure of the displacement between the generalized line and the original line.

One algorithm, called walking dividers, was proposed as an alternative for line generalization and rendered results commensurate with the Douglas algorithm. For angularity, however, the results for the walking dividers were not as satisfactory. Since walking dividers does not select points pertaining to the original curve and thus stress position as in conventional linear generalization, but rather displaces the line by creating new points

14


which emphasize the aspect of the line [Muller, 1987], it may be a useful tool in generalizing descriptive maps which are usually published at smaller scales, and where a reduced emphasis is placed on the spatial accuracy of the representation.

The third epoch in generalization has f ocussed on the conceptualization of more comprehensive approaches to generalization which has resulted in a number of models being proposed. This development has taken place since the late 1970s in both Europe and North America and is still ongoing. A number of reviews on contemporary generalization models are available such as the conceptual review by McMaster, [1991], and the review of systems and prospects for future progress by Herbert and Joäo, [1991].

Most models are attempting to solve a number of generalization tasks. For example, the Nickerson and Freeman Model, proposed in 1986, consists of five tasks. These are feature modifications which are achieved through deletion, simplification, combination, and type conversion processes; symbol scaling, feature relocation, scale reduction, and finally name placement. In the development of this model, United States Geological Survey maps were used at a scale of 1:24 000 and employing the above mentioned tasks, 1:24 000 was generalized to 1:250 000. In determining which objects should appear in a derived representation the density of the target map was calculated and could be achieved in two possible ways. Rules could either be defined to eliminate entire classes of objects according to scale, or areas on the map where feature density was too great, could be identified and objects deleted from these areas only. A grid structure overlaid on the map data made this second method possible [Herbert and Joäo, 1991]. The prototype created by Nickerson and Freeman recognized a need to allow easy methods for changing rules in the knowledge base and to this end proposed that rules be maintained separately from the rest of the generalization process.

Shea and McMaster [1989], proposed a model that addressed three key components termed; a) intrinsic objectives, or why we generalize; b) situation assessment or when we generalize, and; c) spatial and attribute transformations, or how we generalize. The determination of why we generalize was based on a number of concepts and included objectives of a philosophical, application and computational nature. The situation assessment of this model addressed considerations such as conditions under which generalization should occur, measures by which that determination was made, and controls of generalization techniques used to accomplish change. In exploring the last aspect of this model, spatial and attribute transformations were examined for six conditions. These conditions may occur under scale reduc-

15

D.E. Richardson

tion and can be used to determine the need for generalization. The six conditions consisted of congestion, coalescence, conflict, complication, inconsistency, and imperceptibility and in this model are used to counteract or eliminate undesirable consequences of scale change. It is interesting to note that reduction of graphic complexity is cited as the most important principle in this approach.

Another comprehensive conceptual model proposed for digital generalization is the Brassel and Weibel [1988] model. In this model, Brassel and Weibel propose that map generalization be defined as a variant of spatial modelling and argue that it should be based on understanding and not by a sequence of processing steps. With this view, a conceptual framework for generalization is submitted based on five steps which include structure recognition, process recognition, process modelling, process execution, and display. In the structure recognition phase, the objective is to identify objects or aggregates, their spatial relationships and establish measures of relative importance. This process is controlled by the objectives of generalization, the quality of the original database, and the scale of the target map. In process recognition, the types of data modifications and the parameters of the target structures must be established. This involves an assessment of what should be done to the original database, including a determination of conflicts and their resolutions, and a decision must be made on the types of objects and structures that should occur on the derived map. Following process recognition is process modelling in which the generalization processes are modelled as a sequence of operational steps. This stage of the conceptual model accesses a process library that maintains a set of rules and procedures which are invoked for process execution. The process execution converts the original database into the target map through a sequence of operational steps compiled from the process library. These processes involve selection, elimination, simplification, symbolization, feature displacement, and feature combination. Finally, in the data display stage of this conceptual model, the target data are converted to the target map.

Another model addressing more comprehensive approaches to generalization is the GENEX prototype developed at the University of Hanover [Meyer, 1986 and 1987, and Jäger, 1987]. In this development, the German Basic 1:5 000 map is generalized using a combination of knowledge-based and conventional programming methods. Using a number of modules, this prototype generalizes buildings from 1:5 000 to 1:50 000 based on distance parameters between buildings. Another module handles the generalization of traffic ways and rivers, in these cases concentrating on the simplification

16


of irregularities. According to the Literature, the GENEX system also accomplishes symbolization, elimination, text placement and enhancement Although GENEX is an implementation that appears to handle a number of generalization processes, the conceptual design is unclear. As it addresses large scale topographic mapping, this may suggest that it has a greater concentration on geometric aspects rather than conceptual factors which become more important as the scale decreases [Herbert and Joâo, 1991].

2.3 Assessment of Contemporary Status in Automated Generalization

Many other conceptual models and prototype developments have been made such as the Morrison Model, [1974], and the Astra prototype proposed by Leberl, Olson and Lichtner [1985], but it is interesting to note that during this third epoch of generalization, most models approach generalization from a topographic standpoint and as such the approaches are primarily rooted in graphical representation issues. These approaches are certainly valid, although in view of the extensive databases being created by large mapping and remote sensing sectors, issues in generalization should now also be looked at in the context of database considerations.

Designing a model for generalization is a complex task. In part the complexity in developing comprehensive models in generalization arises from the lack of rules available for determining which processes or operators should be applied forvariousmappingrequirements [Nyerges, 1991, Weibel, 1991, and Schylberg, 1992]. Further, the acquisition and formulation of rules is severely impaired since there is a lack of knowledge about how individuals actually generalize. Thus the conceptual development of a model for generalization based on rules that will suit every user's needs is a complex issue, particularly so when many solutions to a generalization problem can be acceptable, but only one solution may not be acceptable to all users. Of course if a generalization model is developed for a particular application, then it need not be designed to accommodate a diversity in user needs.

Thus far, the major 'representation' focus in generalization models, starting from the first epoch of development, has continued with contemporary approaches. On theother hand, almost no references are provided in the literature that examine generalization from a database standpoint. In view of the lack of available rules, and considering the rapid increase in large detailed geometric and thematic databases, generalization techniques should be explored using database techniques which may solve a number of

17

D. E. Richardson

difficulties. Many choices for altering the representation of geometric and thematic data can be made available to the user by developing techniques using the database and certain data structures. Consequently, the lack of available rules can be compensated by allowing the user to choose the processes for generalization most suitable to their particular need. As such, the data model environment and its expression through logical data structures can be used to facilitate generalization operations.

Another aspect that has not been considered according to the literature is the impact of generalization on data certainty. Since generalization processes can alter both a representation and the geometric and thematic data stored in the database it is interesting to note that this aspect has as yet remained somewhat obscure in howit maybe used. Of interest also is the fact that little has been done with the results obtained during the second epoch of development in generalization. During this epoch, many measures were developed and applied to assess the changes in accuracy which result from linear feature simplification algorithms, displacement algorithms, and amalgamations and the like. The techniques developed and used during this period in many ways would be suited to developing certainty factors for generalization and should be explored in that context. The use of certainty factors can provide a unique way of controlling the generalization processes. Rather than accepting results on the merit of a 'good' representation, which as yet remains to be defined, generalizations of a geometric and thematic database could be accepted or disregarded on the certainty of the derived product. Combinations of representation suitability and data certainty are also an interesting concept for validating choices in generalization.

18


CHAPTER 3

Data Modelling and Data Definition

3.1 Introduction

In geographic information sciences or geomatics, a perception of the world can be regarded as either distinct phenomena or phenomena that are in some way related. The way in which data are modelled and structured can provide information and as a result, an increment of knowledge can be inferred from the data. In a GIS environment, data are frequently separated from its interpretation, or meaning, and often, the interpretations of data are not explicitly recorded. Originally, this occurred partly as a result of hardware and software restrictions in storage and partly because computers are inefficient at handling natural language, which is still the main way of encoding interpretations and meaning of data [Shapiro, 1987; Tsichritzis and Lochovsky, 1981]. At some point, however, data should be linked with an interpretation using structures that in some way allow multiple interpretations or abstractions to be derived from the same data. Since in map generalization, data abstraction occurs at different levels of scale and further, since applications in using the data vary considerably and also affect abstraction, it is essential that the underlying data model allows maximum flexibility in using the data, while at the same time providing a stable environment for the data. Accordingly, in considering abstraction as a process of generalization and abstraction as a process of modelling, the data model should be flexible enough to allow data to be viewed in different ways, and to allow different data to be viewed in the same way when required. For example, at certain levels of generalization and according to certain contexts the interpretation should allow that lakes and rivers are viewed as just rivers. Thus, in the context of generalization the model should allow the level of abstraction to change depending on the scale and the application.

Ideally, the ultimate data model would allow for a complete interpretation of geographic phenomena according to any application or generalization needs. Since our knowledge of our environment is both incomplete and open ended, this is unlikely, and it is therefore important that the data model capture the appropriate amount of meaning as related to the desired use of the data.

19

D. E. Richardson

3.2 Determining the Data Model

A database that can successfully respond to or accommodate the abstraction processes required as a result of generalization as well as the abstractions required by various application contexts, must to a large extent, rely on the underlying conceptual data model and the subsequent translation of the data model to the logical data model design and structures. The formulation of the conceptual model should be determined by analyzing the data types according to the context in which they will be applied. Thus, the development of the conceptual model should focus on the structuring of the data and the establishment of the required relationships among data elements to support the application. This process should be independent of the actual physical implementation, or in other words, the conceptual model should be mappable to logical data models such as relational, hierarchical, network or object oriented.

The design of the logical model, which in data base design processes follows the conceptual model stage, is developed in relation to a specific data base management system. This process involves the mapping of the conceptual model to the data base management system.

3.2.1 Data Selection and Application

As a prerequisite to describing the conceptual data model, it is important to review the selection of data with a discussion on its intended or foreseen use. The data selected for testing are outlined below along with definitions and data manipulation requirements in the context of applications. As well, some of the relationships among the different data types are briefly discussed.

In selecting data for testing context transformations, it was necessary to choose a manageable dataset that is also available from conventionally derived maps at different scales for purposes of comparison. As such, the 'map elements' selected are in conventional terms considered base map objects. Base data usually complement or support thematic data and can include elements like administrative boundaries, towns, roads, and hydrography. The geometric and thematic data selected for this research will be generalized with an emphasis on their information content. Thus, each map element will have a complement of attribute data as well as the spatial aspects such as topology. While this dataset will provide the structural framework and locational context for thematic data, the elements

20


selected also provide a manageable dataset which lend themselves equally well for their analysis as thematics.

In selecting the geometric and thematic data for testing, several other aspects were considered as well. For example, the three standard geometric data types should be tested; i.e. points, lines, and areas. In the case of point-type data, the selection for testing should include a thematic description expressed in a classification hierarchy such as a superclass like populated places with a class level and object level breakdown. Line-type data should be selected for testing classification hierarchies as discussed in Chapter 4. Area data have been selected for testing polygon aggregations within a classification hierarchy, and also for testing aggregations between different classes of objects such as lakes and rivers and different types of objects like areas and lines. The data selected, along with the general data definitions and their types are provided in Figure 3.1.

Figure 3.1: General Definitions for Selected Thematic and Geometric Test Data

Thematic Description Geometric Description

Object Superclass: Populated Places

Object Classes Definition Object Type

City: All separate municipalities with a Point population of more than 5000.

Town: A separate municipality having a population of Point 1000-4999.

Village: A separate municipality having a population Point of less than 1000.

Unincorporated Place: A place with no legally defined boundary and no Point local government It includes all places which are unincorporated regardless of size. In practice, very few such places have more than 5000 and relatively few of 1000 - 4999.

Non-Unincorporated Place: (Name not approved -NUP): This refers to a Point

relatively small number of settlements with names not approved by the Canadian Permanent Committee on Geographic Names.

Indian Reserve: Lands set apart for the use and benefit of an Indian Point Band by an Order-in-Council which are subject to the terms of an Indian Act Population range is 0 - 2000.

21

D. E. Richardson

Object Superclass: Hydrography


River

Lake:

Refers to a natural stream of water emptying into an ocean, a lake, or another river.

Refers to an inland body of water larger than a pool or pond, generally formed by some obstruction in the course of flowing water.

Single &/or Double Line

Area

Figure 3.1 : General Definitions for Selected Thematic and Geometric Test Data, conf d

Object Superclass: Landcover


Forest Land Land currently supporting or capable of growing forest, with tree crown cover of 10% or more. Includes land where trees are stunted owing to site limitations, or undetectable owing to disturbance.

Area

Agricultural Land

Non-Vegetated

Cultivated land currently supporting crops, orchards, Area vineyards, nurseries, etc. Includes land supporting native vegetation with less than 10% tree cover and includes improved land used for forage, and meadows

I ncludes land covered with perennial snow fields and Area glaciers, land without discernible vegetation cover such as barren land and open pit mines, and built up areas such as cities and towns.

3.2.2 Data Abstraction and Spatial Data Modelling

No clear theory of the map generalization process exists, [Brassel and Weibel, 1988], and as a result the abstraction processes involved in developing a simplified map can take many forms. This, in part is a result of aspects such as map purpose and the relationship map purpose has with the

22


elements appearing on the map. Some may be considered important and required in detail while others are less important. In this respect, generalization is user and application dependent. The processes involved in generalization are also affected by perception, cognition, and other complex intellectual functions [Brassel and Weibel, 1988]. Muller [1990] summed up some of the underlying assumptions within the framework of generalization, each of which are affected in some degree by map purpose, application environment, perception, and cognition.

a) There are many possible alternative solutions to generalization.

b) Generalization performed by two different cartographers usually leads to two different solutions.

Thus, in using a data model that will be used in conjunction with generalization applications, it is important that the inherent variability that can be anticipated in generalization processes, can be met with or through the underlying data model structure. Since generalization is usually undertaken in the context of an application or some form of spatial data modelling, some of the requirements in data structures that occur in spatial data modelling also hold true for generalization applications and can be examined in that light. For example, spatial data modelling is characterized in part by the manipulation of objects that are highly structured and often composed of other objects. An example of this point is to consider an application in which objects such as municipalities, towns, roads, and buildings are modelled [Kemp, 1990]. The town is a structured object which may consist of other objects such as urban areas, residential areas, industrial etc, which are represented as areas, and roads and railway lines which are represented as linear features. If a public transit accessibility analysis were to be undertaken, the data processing would not only require manipulations of the spatial information such as geometry, topology, and location, but also the thematic information. These aspects are common to many types of spatial modelling and are also relevant to applications of context transformations. Thus, in the context of spatial modelling, and subsequent generalization, the model should allow the following aspects:

a) modelling of locational and thematic components in an integrated way,

23

D. E. Richardson

b) m od ell ing of thematic data from one level of measurement to another such as transforming interval or ratio data to ordinal or nominal data,

c) the capability to geometrically transform different levels of thematic abstraction to different scales of representation.

Thus, although there is an absence of a generalization theory, a data model should be used that responds to fundamental spatial data modelling requirements. This often requires data decomposition, entity and object identification, and the establishment of spatial and thematic relationships.

3.2.3 Definitions Required for Data Modelling

A reliable description of a data model involves a number of definitions. Three categories of definitions are important, and as yet none of these have standardized terminology. The first category of terminology is geographical. In using geographic terms, we attempt to understand and summarize our perceptions about the earth. This terminology is highly abstract and involves perception, knowledge, inference, and semantics and far exceeds the technical aspects of GIS or automated cartography.

The second level of terminology is referred to as database terminology. At this level we attempt to translate our perception of the real world into a digital environment. However, the abstract and contextually sensitive nature of geographical terminology is a fundamental impediment to designing a model that is a 'true representation' of the real world. Nevertheless, within these conceptual limitations, the digital environment serves to model our interpretation of the real world.

The third and final level of definitions, referred to as geometric terminology is taken from graph theory. This can be considered the operational level which consists of the nuts-and-bolts concepts required to build adigital model.

Figure 3.2 provides a summary of some selected terms important in choosing a data model and translating it to logical structures. In each of the three areas of terminology, examples are provided, in some cases with aspects relating to one of the other categories of terms. (See also, Appendix 3.1)

24


Figure 3.2 Geographic, Database, and Geometric Terminologies

Geographic Terminology: as it relates to the real world.

Feature: A real world phenomenon which may be natural or man made. A feature is defined according to a context and can be considered a conceptual and semantic interpretation of a natural or man made phenomena.

Feature Class: A class of features is a conceptual assignment of features which share the same characteristics to a class. The class definition of a set of features is contextually sensitive.

Database Terminology: as It relates to a GIS environment

Attribute Domain: Domains are sets of values from which certain semantically meaningful objects and their properties can take values over time. For example, three-digit numbers form a domain from which the attribute 'river length' can take values.

Attribute Value: An attribute value is a general term and applies to the actual data or information contained for each object, or entity. A complex entity can take an object attribute value such as 'Kootenay River*. Values for entities or complex entities can be quantitative, qualitative or descriptive.

Class: Objects that belong to one, and only one category are defined as a class. Classes should be mutually exclusive. Classes in a database environment often have a list of attributes and are identified by a label or class name.

Complex Entity: A set of atomic entities which can form a chain, an area or part of an area. A complex entity occurs only when an entity or entity set share more than one entity or object identification. See also multiple inheritance.

Entity: An entity can be an isolated vertex, an edge oran arc and often refers to subparts of an object. A tributary of a river object is an example of an entity.

Entity Attribute: A characteristic or property of an entity. For example, isolated nodes in graph theory may be used to represent populated places. An attribute could be population statistics. Similarly, edges may represent stream segments with an attribute of stream order.

Entity Attribute Value: The value ascribed to the entity, for example, the entity attribute value of stream order may be '2'.

Entity Sets: An entity set is a set of nodes and edges such as a chain which may form part of an area or part of a line. If an entity set acquires more than one definition as a result of spatial co-occurrence, it is thereafter referred to as a complex entity.

25

D.E. Richardson

Database Terminology: as It relates to a GIS environment, cont'd

Extension: The extension of the class corresponds to a list of members of the class.

Intension: The intension of a class (or attribute) is a condition that applies to all members of the class set; it must also be the case that the condition does not apply to any object that is not a member of the class.

Multiple Inheritance: Spatial entities may reference entities orobjects for more than one class, yet retain only one spatial entity or spatial entity set, or in other words two or more object classes can be referenced by one spatial domain for an entity or object. For example, in the case where a riverobject shares multiple inheritance with a boundary, the spatial entities although recorded only once would be available for the object class of both rivers and boundaries. See complex entity.

Object An object is the digital interpretation of a feature. Thus, at the application level a feature is defined often according to a particular context and digitally represented as an object The object therefore includes both a thematic and spatial description, and has meaning.

Object Attribute: A characteristic or property of an object For example, river length may be an object attribute of the object defined as 'Kootenay River*.

Object Attribute Value: A value ascribed to the property of an object For example, the Kootenay River may have an object attribute called length. The object attribute value would be '240' (kilometres).

ObjectClass: Agroupof features defined inaGISenvironment as objects that share a common attribute structure.

Point: A zero degree vertex within a graph. (See isolated vertex). A point is wholly contained within a single area of the graph or it is located on an edge. The spatial location attribute of a point is given by asing le coordinate pairortriplet No two points have the same x,y, (z) coordinate. A point may be part of (that is the location of) any number of objects and represent the location of any number of features. All points will be described by nodes.

Polygon: A two dimensional closed figure of a linear graph. Each polygon can be bounded by any numberof edges. No two polygons overlap. Any point on the surface that is not on an edge is in one and only one polygon.

Spatial Attribute: Coordinate data stored for vertices and nodes and hence edges, arcs, chains, etc.

Spatial Attribute Value: Individual x,y, and zcoordinates of which an entity and object are comprised.

26


Database Terminology: as it relates to a GIS environment confd

Spatial Object: A distinction is necessary between a thematic object and its related spatial description; hence forward referred to as a spatial object There are two fundamental instances when this distinction is necessary: 1) when for example, a county road is also a county boundary, and; 2) when an object is generalized using simplification or displacement. In the first instance, the spatial object remains the same yet the thematic description changes, and in the second instance the spatial object is altered yet its thematic description remains the same.

Thematic Object: In a GIS environment a distinction is necessary between the spatial components of an object and the thematic components. The thematic components consist of attribute domain information and are referred to as a thematic object. See also spatial object

Topological Attribute: A topological attribute refers to a spatial relationship ascribed to particular types of spatial data. For example, an arc is topotogically defined by the attributes from-node' and 'to-node'. An edge occurring in a polygon is defined by the topological attributes 'left polygon' and 'right polygon'.

Topological Attribute Values: Refers to the individual topologie values for each atomic entity.

Geometric Terminology: As it relates to graph theory

Node: In a simple graph a set of nodes N={n^.-.nj. One arc consists of two nodes hence each arc is a subset of two elements of N.

Line: Consists of a finite set of unordered pairs of distinct elements. Sometimes referred to as the node set and the edge set.

Nodes and Edges Node Set = N(G) where G is the graph N(G) = {A,B,C,D}

Edge Set = E(G) where E(G) = {AB}, {B,C},{C,A}and{AD}.

The edge {B,C} joins the nodes of B and C.

27

D.E. Richardson

Geometric Terminology: As it relates to graph theory, cont'd

Arc: Consists of a subset of two nodes. An arc whose first element is A and whose second element is 8 is called an arc from A to B and is written (AB).

Arcs are as follows: arcs = {AB}

{B.Q {C.B} {AD} {C.A} {D.D}

Arcs

Degree of a Node: The degree of a node (A) is the number of edges incident to A and is written p(A). A loop at a node contributes two to the degree of A A node of zero degree is called an isolated node. For example, in the 'Arcs' Figure, node A has a degree of 3 or 3(A). A node of degree one is a terminal node (or endpoint)

Adjacency: Two nodes, B and C of a graph are adjacent if there is an edge or line adjoining them, (i.e. there is an edge of the form {B,C}. The nodes B and C are then said to be incident to the edge.

Edge Sequence: given any graph, an edge-sequence is a finite sequence of edges of the form {V0 V,}, (V1tV^ {V^,. V J . Note: the number of edges in an edge sequence is called its length, ex: V-W-X-Y-Z-Z-Y is an edge sequence of 6 from V toY.

Path: an edge sequence in which all the edges are distinct is called a path. A path is closed if V„ = V .

0 m

Chain: An edge-sequence in which all the edges are distinct and all the Nodes V„ V,,... V., are distinct except possibry Vc= Vm.

Isolated Node: A node of degree zero is an isolated node.

Terminal Node: A node of degree one is a terminal node or endpoint

28


3.3 Selecting the Conceptual Data Model

The data model, in its conceptual formulation, must support and respond to requirements for generalization. As such, the inherent variability anticipated in generalization should be manageable according to the conceptual model and realizable through its translation into logical data structures.

In assuring the conceptual data model would comply with requirements for generalization using context transformations a number of needs were considered. These needs become realistically met in translating the conceptual model to logical data structures and subsequently manipulating these structures using a model for generalization which is discussed in Chapter 4.

The data model must;

a) be translatable to logical data structures such that multiple interpretations or abstractions can be derived from the data.

b) permit different levels of abstractions according to various generalization requirements like application and scale needs.

c) allow object generalization and specialization within a classification hierarchy and must accommodate aggregations between different classes of objects like lakes and rivers.

d) allow manipulation of locational and thematic data tobe achieved in an independent way or in an integrated manner depending on the mapping need.

3.3.1 The Formal Data Structure Model

The model used for testing rule-based generalization is based on the formal data structure (FDS) for single valued vector maps developed by Molenaar [1989,1990,1991]. For the purpose of this research, the formal data structure discussed here is examined in a two dimensional space.

The FDS is a topological data model that has the potential to meet all the above requirements for context transformations. It handles both the geometric and thematic aspects of geo-information using elementary data types like points, lines, and areas, and sets of geometric links among data types and

29

D. E. Richardson

objects to provide a 'feature oriented' data structure. This environment facilitates the analysis of topological relationships among geometric elements and objects and the construction of composite objects. In Figure 3.3 we can see at a very general level, the links between the object identifier, thematic data and geometric data. The arrows among the ellipses indicate a many to one relationship, i.e. many objects belong to 1 class. Figure 3.4 provides the more detailed conceptual model and gives an indication of the importance of arcs and nodes in the overall structure.

Figure 3.3 Fundamental Structure for the FDS

Several conventions must be observed in using the FDS: (the terminology used reflects those provided in section 3.24 and may differ from the terminology cited in the literature).

a) The object classes must be mutually exclusive, this means that each object has exactly one class label.

b) An object class contains geometric data of only one type.

c) When a map is analyzed as a graph, all points which are used to describe its geometry will be treated as nodes.

d) The arcs in this graph are geometrically represented by segments of straight lines.

30


e) For each pair of nodes there is at most one arc connecting them. In addition, the nodes may be connected by one or more chains.

f) For each arc a = {n . n } in which n * n , hence the graph may not contain loops.

g) For each geometric data type there is only one occu rrence of each of its links to objects; i.e. an arc can be at most one line object and has one area at left and one area at right.

3.3.2 Topological Relationships

By using the FDS, a number of topological relationships can be identified which render the model as a powerful analytical tool. For example, the topological relationships present in the FDS handle concepts such as neighbour, island, branching, crossing, intersecting, ending, inside, etc. These capabilities can be exploited for several manipulations that are fundamental requirements in generalization. Feature displacement algorithms can be facilitated by relationships such as an area bounded by a line, particularly in detailed topographic mapping of urban areas. Feature aggregation algorithms can easily be applied as well, when for example, one area touches another. These relationships and others are illustrated in Figure 3.5. Finally, like most conceptual models, the FDS can be mapped into a number of logical models such as relational, hierarchical, network, or object oriented.

31

D.E.Richardson

Part of

a a

Upper

| Area Class j I Point Class j

Part of

J Line Object j [ Area Object ] ^ .

Lower

Part of

Represents

/ Cross or \ I Intersect ]

Figure 3.4: The FDS fora Single Valued Vector Map fora Two Dimensional Space

32


Object Type Spatial Relationships Object Type Delineation

Line bounded by 4 bound

Une crossed by •^ crosses

Area intersected by 4 intersects

Area contains 4 within

Area

Point

Une branched by ^ Une C\/\ M branches

Une intersected by ^. y n e

•^ intersects

Une

Une

Une

Area contains • Point 4 within

Point intersected by • Area 4 intersects

Point intersected by • Line M intersects

Area bounded by • Line f ^y

4 bounds

Area bounded by • 4 bounds

touched by • Area f^CD touches *-—

Figure 3.5 Spatial Relationships Among Data Types

33

D.E. Richardson

3.4 Summary

Data abstraction and object definitions are largely user dependent and contextually sensitive. It is therefore important that a data model be assessed and/or developed according to the context within which it will be applied. The selection of the formal data structure has been made based on its topological strengths as well as the relative ease with which generalization in the form of context transformations can be achieved by using the FDS as a conceptual foundation. The distinction in the model between geometric components and thematic components is a necessary requirement in generalization. The model also incorporates the concept of dass hierarchies thus providing a conceptual design for coding subclass, dass, and superdass structures at the logical data structure level. Requirements for context transformations also means that the data structures formed within the context of the FDS must permit data decomposition as well as data aggregation. Through the conceptual design of the FDS, logical structure translations permit both data decomposition and aggregation which are essential components for the generalization process.

34


CHAPTER 4

Conceptual Model for Generalization

4.1 Introduction

This chapter presents a model for generalization that provides flexibility under various conditions that can be presented by a user. The conceptual model is designed to provide guidelines for what, when, why, and how much generalization should occur in small scale thematic maps. The model incorporates criteria concerning some of the relationships among information content, map purpose, data reliability, and the data density that should be represented at the derived scale of representation. These issues imply that in creating an integrated process for automated generalization three primary components of a conceptual model are involved. The first component or sub-model, called the necessity factor provides steering parameters for generalization and addresses the contextual relationships of the objects being mapped with the map themes portrayed at the derived scales. The second component involved is classification and aggregation hierarchies which can be used with the necessity factor while the third component or sub-model are reduction factors which evaluate the composition of the map subsequent to the generalization process. The first two sub-models are discussed in this chapter, while the third aspect; i.e. reduction factors, are discussed in Chapter 5.

4.2 Components of the Conceptual Model for Context Transformations

In digital mapping and in applications in a geographic information system, access to techniques for effective scale changes, or in other words, automated generalization is an important aspect The characteristics which should be considered in a strategy for automated generalization should include a set of rules that allow alterations to spatial and thematic data according to various contexts which arise as a result of different user requirements. In this sense, a strategy for automated generalization should be contextually sensitive. Accordingly, rules used for generalizing spatial and thematic data should explicitly incorporate facilities for context transformations.

35

D.E.Richardson

The term context transformation, although uncommonly used in GIS environments, provides a suitable description within which generalization can be examined. For example, the word 'context' is defined as the whole situation, background, or environment relevant to a particular event, personality, creation, etc [Webster's]. In relation to generalization, difficulties in developing a contextually sensitive system start to arise at the object level and continue throughout the mapping spectrum. For example, the definition of an object class such as stream segments may be viewed as one thing to a fluvial geomorphologist and another to a cartographer [Joâo, Herbert, Rhind, 1990]. Additionally, there may be cultural as well as professional differences in generalization from the object level through to the entire map composition.

The concept of 'transformation' also becomes interesting when considering how contextual aspects effecting the map composition can be realized in an automated systems environment. For example, how best can spatial and thematic data be transformed, or rather change in form or outward appearance, to meet specific contextual considerations, and further, what types of data structures should be in place to allow these transformations or contextual changes to occur?

In essence, to successfully implement a rule-base, or generalization model capable of context transformations, two fundamental aspects must be examined conceptually, and subsequently, logically structured. These are the contextual aspects within which a map is to be derived, and the actual transformation processes which operate on the data and data structures. Thus, the term context transformation means a change in context which evolves as a result of the mapping requirement or situation as well as the user's perspective, and, the physical transformation of the data implies a change in the data and data structures as a result of the contextual emphasis.

In examining context transformations a selection of contextual issues will be discussed followed by a discussion on how data may be structured for transformational purposes using classification and aggregation hierarchies. Details of the logical data structures are provided in Chapter 6.

4.2.1 General Contextual Considerations

The process of generalization is intrinsically related to the concept of abstraction. Abstraction emphasizes the removal of the'specific', 'random', and 'unimportant' in order to concentrate on the 'importanf and on the 'general' aspects of reality [Brassel and Weibel, 1988]. Similarly the act of

36


generalizing means that important aspects are selected and unimportant ones disregarded. In making generalization contextually sensitive it is, therefore, necessary to establish a means for determining what is important and what is not This is in no way a small task as generalization has proven to be a highly complex mental process involving perception, cognition, and other intellectual functions required for the mental processing of information [Brassel and Weibel, 1988]. One of the reasons why experienced professionals who perform map generalization have not (and perhaps cannot) fully describe how generalization is done is because there is no systematic means to document the knowledge used to perform generalization tasks [Nyerges, 1991]. Nevertheless, the identification of a number of contextual issues can be made to determine why and when to generalize. Their subsequent formalization into a generalization model becomes an i mportant step in building a flexible and contextually sensitive system.

Although there are many contexts within which a map can be derived, four fundamental aspects can be considered and assessed for incorporation into a generalization model. The four contexts for examination in this research indude the:

a) user requirement b) map subject requirement c) map object functionality, and d) intended scale of representation.

a) User Requirement:

The generalization of a map should be considered in the context of the user's requirement The range of generalization possibilities according to a user requirement context are legion, and, accordingly here only two aspects are considered, those of spatial analysis or modelling, and map design-

Both these aspects can determine a range of alternatives for selecting and representing objects to appear on a map. For example, when map objects are required for thematic mapping in an analytical or modelling context, objects should be selected objectively and in a repeatable manner so that map users, in using the same dataset and analytical procedures, can arrive at the same conclusions. If objects are required more for balance or backgrou n<J, or in other words, aesthetics, objectivity can be relaxed in favour of techniques reliant on principles of cartographic design.

37

D. E. Richardson

b) Map Subject Requirement

The development and/or generalization of a map generally occurs as a result of a particular subject requirement or in other words the map purpose. This then implies that in generalization there is, or should be, a certain subject dependency or context. Thus, when generalization is necessary, the results should support the overall context of the map. Accordingly, some objects would be selected, simplified and displaced for one context, whereas for another context they maybe treated differently. For example, in climate mapping, there is a relationship between air temperatures and, water bodies and glaciers. As a result, in generalizing the objects for representation at a smaller scale it is relevant to maintain the larger water bodies and glaciers as they provide a context or an implicit explanation for the delineation or situation of temperature zones. Many similar examples could be cited, however, it is a reasonable assumption that the generalization processes undertaken should in some way be relative to the thematic context of the derived map.

c) Map Object Functionality

Individual map object functionality is another important context in terms of why certain objects or object classes should appear on a map at a particular scale. A map object functionality context means that certain object classes support and in fact assist in map reading and use. There are a number of categories that can be considered map object functions which allow map users to perform various tasks. These categories include aspects such as orientation, location, enumeration, measurement, and description. With changes in scale, functional contexts also change, largely as a result of a change in the user's requirements. For example, map objects such as rivers, roads, populated places, and bou ndaries assist users in reading a map in that they provide the reader with a sense of orientation, among other things. As scale changes, the requirement for objects to appear for an orientational context decreases as the user's requirement in the map has altered say from an analytical and/or orientational perspective to something more descriptive in nature.

d) Intended Scale of Representation

In generalization the choice of scale is very important since it sets a limit on the information that can be included in the map and on the degree of

38


reality with which it can be delineated '[Robinson and Sale, 1969]. In this context one aspect of importance is that generalizations of data are executed between and within object classes in a logical and compatible manner. Thus, the selection and delineation of rivers and lakes should be compatible with each other or similarly generalized for a chosen level of representation. The effect of scale may effect other classes differently. The generalization of landcover data for example, may have subclasses of data represented at one scale, but for a smaller scale may show only aggregated classes which provide a reduced level of information.

Once the map has been generalized, or the context transformations have been derived, their effect on the spatial and thematic data, and on the map as a whole should be evaluated. Although surprisingly little attention has been given in commercial GIS packages to the effects of generalization, it is important that they are quantitatively calculated since, in turn, the quantitative calculations of transformation effects can be used as a control factor on the rule-base for the extent to which the generalization processes are applied. Figure 4.1 illustrates an overview of some of the aspects involved in developing a rule-base for context transformations.

In the following sections, these contextual aspects are explored in relation to the formulation of the rule-base as well as examples provided for howit may function. Reduction factors are discussed in Chapter 5 along with their use for both evaluating the results from context transformations and controlling the extent to which the user wishes to generalize a map or representation.

39

D. E. Richardson

Map Initiator

Select Thematic Context & Scale

Meta Data {dandify Need

Analysis Design

Generefization Model

G Database Management System

GIS Database

{̂ Object IdenWerJ)

^jp. i^BmaBcAttnDute^^

' j k™r .:Ü&ïMi%MS&<&ïZ

* • • ^Geometric OateT^ % — •

Data Stylet

-ObfadÉÊfi •AWx^ill:;;

Geometric Transformations

Simplification, Aggregation, Convergence, etc.

ReducfeB Factors

Results

| | Not Implemented

Interactive Manipulation« and Display Options

Ï T

Alter« mint Factor»

Figure 4.1: Overview of System Design for Context Transformations

40


4.3 Formulation of the Necessity Factor and Classification and Aggregation Hierarchies

The following sections focus on how the concept of context transformations can be formulated into a model to reflect the contextual issues in the generalization process. As mentioned in section 4.2.1, the context transformations discussed indude four major aspects, user requirement, subject requirement, object functionality, and scale. These aspects; are addressed in sections 4.3.1 through 4.3.4, and are followed in Chapter 5 by reduction factors that can be used as a data reduction statement. We will also see in Chapter 5 how reduction factors can be used as a feedback loop to control the generalization processes that will be described in the following sections.

4.3.1 The Necessity Factor

The Necessity Factor [Richardson, 1988,1989] is a combination of two contextual aspects which are a fundamental requirement for developing a system for context transformations. The first aspect for consideration is the determination of how object classes may be selected to appear on a map according to a particular subject for representation at a user selected scale, and the second aspect for consideration is the functionality that individual map object classes allow map readers to perform in reading and using the map. Thus the first aspect or component of the Necessity Factor provides a measure for the subject requirement and the second component of the Necessity Factor provides a measure for map object functionality. Both of these measures are applied according to requirements of scale.

The formulation of the necessity factor is built on knowledge acquired by surveying 110 maps for 44 different thematic realms at scales ranging from 1:1 million to 1:30 million, and through a questionnaire addressed to cartographers, geographers, GIS specialists and senior managers at the Canada Centre for Mapping. Maps from the National Atlas of Canada, the National Atlas of the United States, and the International Map of the World Series were used in compiling the data, along with government documents used for cartographic compilation specifications. (Additional information is available in Appendix 4.1 concerning the survey).

To form the Necessity Factor, each selected map object class has been rated according to its requirement for each subject at four different scales. These ratings measure the degree of need for an object class to appear on a map at a particular scale and is referred to as the Map Object Requirement

41

D.E. Richardson

(MOR). In decreasing order of need these categories are as follows.

a) Essential: The object class is needed to link or support the thematic or subject context. For example, river networks are essential for mapping a thematic context such as drainage basins, as are populated places for mapping ethnography or languages.

b) Desirable: Not essential, but it helps to provide orientation or description to the thematic context.

c) Questionable: It could be used but would not be necessary, for contextual support.

d) Unnecessary: It is either unusable (would clash with the thematic context, or would be readily seen as illogical to use.

The second measure, referred to as the Map Object Functionality (MOF), has been used to determine why and for what purpose an object class appears on a map at a particular scale. This measure has been used independently from a thematic context and measures the functional requ irement of object classes to appear on a map in relation to the intended scale of representation.

Each of the object classes have been rated according to the functions or activities they allow a map reader to perform in using the map. In this case, the four ratings used, i.e. essential, desirable, questionable, and unnecessary, imply the degree of necessity for having a particular object class appear on the map to enhance map reading and understanding. Many types of user functions are dependent on or assisted by the appearance of certain objects on a map. However, only five functions frequently undertaken by map users have been selected, those being orientation, location, enumeration, measurement, and description. These five individual functionalities (fj are meaned to establish the Map Object Functionality component; i.e. MOF = (l/5)2fn. Definitions of these activities are available in Appendix 4.2.

The MOR and MOF criteria are combined to determine selection rules that address the subject requirement or context, along with the object functionalities according to scale. The combined MOR and MOF results in a necessity factor (NF) which determines the intensity of object class selection on a contextual basis. High necessity factor values indicate that a high proportion of objects in a class should be selected and that a small proportion

42


should be eliminated. This determination is achieved by converting the qualitative ratings of essential, desirable, questionable, and unnecessary ascribed to the contextual thematic requirements or MOR, to quantitative values of 100,75,25, and 0 percent respectively. Qualitative assessments for each of the five types of functionalities i.e. orientation, location, enumeration, measurement, and description, are also converted to quantitative ratings such that an object class rated as essential at a particular scale was given values of 80, 90, 90, 90, and 100 percent corresponding to the five activities respectively.

For each object class, the basic necessity factor is computed as follows:

(NF)^ = (l/2)(MORik + MOF,)

where MOR,k is the requirement rating for each of the base map object classes (i=l-14) and for each thematic context (k=l - 44) and where

MOF.=(l/5) 2 fn is the map object functionality rating for map object class (i = 1-14) in which f is an object class functionality and n=l-5 for the five activities (orientation, location, enumeration, measurement, and description).

In Table 4.1, the map object dass requirements (MOR) are shown for mapping landcover at four different scales. Table 4.2 illustrates the five functions the map objects have in relation to a 1:2 million representation while Table 4.3 illustrates the functionalities in relation to the four scales. In Table 4.4, the necessity factors for mapping landcover are shown.

As can be expected, the necessity factor decreases with a decrease in scale. As the scale of representation becomes smaller, the number of objects within a class are reduced according to the thematic context and the functionality. Thus, the necessity factor is used as a threshold for the selection of objects within a class according to the thematic context and according to the user's requirements.

The necessity factor provides a general guideline, or steering parameters, for how many objects should appear on a map given a particular context and scale. To determine which objects of the reduced set, involves rules that can be applied to the entity or object content maintained in the GIS database. For example since the object class of towns have a 75% necessity

43

D.E. Richardson

factor for landcover mapping at 1:7.5 million, 25% can be eliminated from the original 1:2 million input database based on a population statistic or on any other statistic which may have particular relevance to the user.

Table 4.1: Map Object Class Requirement Ratings to Support Landcover Mapping

Map

1 2 3 4 5 6 8 9

10

Object Class

City Town Village Unincorporated Non-Unincorporated Indian Reserve Rivers Lakes Islands

1 2 M

100 100 75 25 25 25

100 100 25

Scale 1:78.5 M

75 75 75 25 0 0

75 75 0

1:12.5M

75 75 75 25 0 0

75 75 0

1:30M

0 0 0 0 0 0 0 0 0

Table 4.2: Map Object Functionalities at 1:2 Million

Functionalities

MOC

1 2 3 4 5 6 7 8 9

Key: MOC Or Lo En Me De

Or

80 80 65 65 15 65 65 15 65

Lo

90 90 90 70 20 90 90 90 20

-Map Object Class -Orientation -Location -Enumeration -Measurement •Description

En

90 90 90 20

0 70 90 90 20

Me

90 90 90 90 20 90 90 90 20

De

100 100 100 100

0 100 100 100 100

44


Table 4.3: Map Object Functionalities by Scale

Scale

MOC 1 2 3 4 5 6 7 8 9

1:2 M 90 90 87 69 11 83 87 77 45

1:7.5M 87 87 54 21

0 18 74 54 22

1:12.5M 87 73 30 7 0

13 74 40 22

1:30M 17 17 0 0 0 0

32 12 0

Table 4.4: Necessity Factor for Mapping Landcover at Four Scales

NFk According to Scale

Map Object Class

1 2 3 4 5 6 8 9 10 11

City Town Village Unincorporated N.U.P. Indian Reserve Rivers Lakes Islands Landcover

12M

95 95 81 47 23 54 93 88 35

100

1:7.5M 1:125M

81 81 64 23 0 9

74 64 11 85

81 74 52 16 0 6

74 57 11 70

1:30M

8 8 0 0 0 0

16 6 0

55

In the case of rivers being mapped con textually with landcover data the necessity factor reduction in a 1:2 million database for 1:7.5 million representation determines that 74% of the rivers should be shown while 26% should be removed. Those rivers that are selected and those eliminated are done so according to stream order vaJues in conjunction with length statistics. Streams with the lowest stream order are dropped from the representation. When a choice is required to drop one stream or another both with the same stream order value, the decision is made based on length. Le. the shortest is removed in this case.

An attenuation factor can be applied to the weighted rati ngs on the map object functionality to render greater flexibility to the necessity factor formula. With the application of an attenuation factor the subsequent reduction

45

D.E. Richardson

in map element density will result in an increased level of generalization. The following formula illustrates the use of an attenuation factor;

NF^f l /^Oj^+KOMOF,) ) where 0 < f < 1

In Chapters 6 and 7, we will see that the use of an attenuation factor allows continuous generalization. The inclusion of an amplification factor renders the opposite effect in that it allows the user increased specialization. The amplification factor has the form f > 1 and can be applied to the MOF. However, only the attenuation factor has been implemented and tested in this research.

For the selection, elimination and replacement of objects determined by the necessity factor, the underlying data model maintains certain data structures as outlined in Chapter 3. Accordingly, the data model, can provide specific facilities that will permit the intensity of the NF^ results to be altered and manipulated in a number of ways.

4.3.2 Classification and Aggregation Hierarchies and Generalization

The conceptual data model outlined in Chapter 3 can be used to achieve certain results with the rule-base. The format of the data model and its subsequent expression in logical data structures allow the use of classification and aggregation hierarchies, which in a computer environment are referred to as generalization and specialization hierarchies. A hierarchical classification permits objects and object classes tobe selected and eliminated in various ways to support the necessity factor model.

In establishing the logical data structures in the database environment, a number of aspects have been considered in the context of requirements for rule-based generalization. An important aspect is to ensure that the data categories and structures permit the appropriate response to different levels of abstraction. Different levels of abstraction can be derived from the proposed database environment when an object class is defined from the entities or objects of a similar class or in other words, abstraction can be used to form a new object class from other existing objects. For example, using processes of abstraction, the entities of connected stream segments, following the conventions outlined above, can be associated to form a new object class of major or minor river networks. These abstraction processes can be

46


realized by using classification hierarchies with subclass and superclass levels. These concepts should be examined in relation to a classification schema.

In a well designed classification scheme, individual objects should belong to one, and only one, class. Therefore, classes are mutually exclusive. A good classification scheme deals with all elements or objects of the particular classification thus eliminating the need for an 'other' category and, as such, makes the classes exhaustive as well as mutually exclusive [Muehrcke, 1978}. In a classification hierarchy, however, objects of a class can be described in different ways by means of subclasses in which objects of a subclass also belong to their parent class. For example, in a collection of subclasses and classes, at the top of the classification hierarchy is a single superclass. Each class represents a collection of objects which have some characteristics in common. The subclass occurs when all the members of the class are also members of some other class. Any class or subclass in the hierarchy inherits the properties of the superclass. That is, all properties of the generalized class can be inherited downward to the constituent entities [Molenaar, 1990, Thompson, 1989].

An aggregation hierarchy is distinct from a classification hierarchy in that it refers to an abstraction in which a relationship between objects is regarded as a higher level object [Smith and Smith, 1989]. An aggregation hierarchy allows abstractions of composite objects which are built from lower level elementary objects [Molenaar, 1991].

In classification hierarchies, the inheritance of attribute structures is downward, allowing the thematic description of objects to become more detailed or specialized as one progresses to lower level branches of the hierarchy. The aggregation hierarchy has a bottom up character in that starting from the elementary objects, composite objects of increasing complexity are constructed in an upward direction. The composite objects inherit the attribute values from their constituent parts [Molenaar, 1991],

In a computer environment digitally structuring a hierarchy is related to the concepts of IS-A and PART-OF links in artificial intelligence and semantic networks [Thompson, 1989]. For example, classes are linked by subclass - superclass relationships in that each class has at most one immediate superclass. This concept is true of the simplest taxonomie hierarchies so that each dass having at most one immediate superclass is a rooted tree. The links between classes are normally called IS-A links and can express the fact that a particular object type is a generalization of another type, such as Medicine Hat IS-A city, IS-A populated place. The IS-A links are components

47

D. E. Richardson

of classification hierarchies and are shown in Figure 4.2 which illustrates a hierarchy for populated places. P ART-OF links are components of aggregation hierarchies. For example, a tributary is PART-OF the Kootenay River, PART-OF the hydrographie network. The PART-OF links relate a particular set of objects to a specific composite object and on to a more complex object and so on [Molenaar, 1991].

There is no standard computer terminology for the components of a classification hierarchy which in AI is referred to as an inheritance network, nor are there any standard semantics for the IS-A link. [Brachman, R.J. 1983]. Other names for the IS-A relation are AKO (A Kind Of, used in FRL), SUPERC (Super Concept, used in KL-ONE), and VC (Virtual Copy, used in NETL) [Shapiro, 1987].

48


Thematic Structure SUPERCLASS = POPULATED PLACES

CLASSES-,

City Names Town Names Village Names Unincorp Names NUP Names lr Names

-+ Attribute Assignments •

Superclass Generalization

OBJECTS — I

Figure 4.2: Classification Hierachyand Generalization Strategies for Populated Places

49

D.E. Richardson

4.3.3 Examples of Necessity Factor Applications with Classification and Aggregation Hierarchies

With respect to rule-based generalization, classification and aggregation hierarchies can be applied in a number of different ways to alter the application and hence the results of the necessity factor in an objective fashion. For example, the properties of city, town, or village, etc., are definitional and are referred to as 'intensional' properties. The intension of a class defines the allowable occurrences of the class by specifying the membership condition. Each occurrence of one instance in the class is referred to as an extension. Thus, 'Invermere' as a town, and '1,194' as a population is factual and forms one instance of the extension. The membership condition of the intension in this example is 'Status' = Town, 'Population Range' = 1000 - 4999; the extension of one instance is Invermere, 1,194. A class always has an intension but not necessarily an extension.

The ways in which hierarchies can be applied using the necessity factor are as follows:

(a) The application of a classification hierarchy to populated places can be used to alter the results of the necessity factor by calculating the mean of the necessity factor for the six classes of the superclass. Only the superclass level needs to be addressed; i.e. a map say at 1:7.5 million presentation requires 65% of the original dataset for populated places. At the execution level however, the extensions are ordered by specific population statistics and selected according to the mean NF percentage value. The mean of the necessity factor is applied only to the attributes of the ordered extensions within the superclass structure without regard to the intensional properties of the class level. Thus this process is referred to as superclass generalization, and can be used in strategy 3 of the generalization model as discussed in section 4.3.4.

Mathematically, this process can be expressed as follows: oc

SC = 2 [ N R J / o c i=l

in which oc=total number of object classes in the superdass Nf.k =the necessity factor for each object dass of the superclass SC = the superclass

50


SC = NF^ mean derived from the object dass level

where in the database environment

V A E SC and A (e^ > A (et + 1), and reads,

each and every attribute (A) is a member of the superclass (SC) and the value of attribute A of the object e is greater than e, +1

Applying the necessity factor to the mean of the superclass has some interesting possibilities. In this case, the superclass is exampled by a standard populated place type hierarchy, i.e. city, town, village, etc The mean of the superclass is calculated and subsequently that mean value is met by ordering the extensions of the superclass by attribute value irrespective of the class intensions. If the selection is made on the basis of population statistics those selected will represent the larger centres thus conforming to the class structure of city, town, and village. However, if the map application is based on locating employment centres, the mean of the necessity factor can be met using unemployment statistics which would not conform at all to the class intensions. Thus, the selection in the second case would be quite different from that of the first yet representative of a different thematic context. In this way, the mean NFik applied to the superclass opens up numerous possibilities for specifying the application orientation and carrying out the selection in that context.

(b) A second way in which these structural concepts can be applied to a populated place classification, is by calculating the mean of the necessity factor (X(NF)ik) for the six object classes of the superclass (see Table 4.4), and selecting objects according to the subsequent percentage threshold by (i) ordering the magnitude of the intension of each class, and (ii) subsequently ordering the extension of each class by the attribute values specified. Thus, the intension is the population range for each class which, according to (i) is organized in descending order. The extension of each intension, which is the actual population statistic for each object in the class, is subsequently also ordered from largest to smallest. Thus, cities, towns, and villages are now organized from largest to smallest by their intensional properties and each of their extensions are also organized from largest to smallest Now the percent value of the X(NF).k for the six classes of populated places can be fulfilled by selecting the mean value from the ordered extensions of the ordered intensions. In this case the following formula is used:

51

D.E. Richardson

oc S C - I | N F J B / o c

i=l

where in the database environment

V C £ SC and 1(c) > I(c+1), and

V A 6 C a n d A ( e ) > A ( e i + l ) , where I is the intension and c is the class therefore each and every class that is a member of the Superclass, class 1 > class 2 and so on, and, each and every attribute (A) is a member of the class (C) and the value of attribute Ap of the object e. is greater than el + l

In this strategy, the class intensions are observed. As a result, if the selection is made on population statistics, the largest centres will be shown. If the selection is made on unemployment statistics, the largest centres with the highest unemployment statistics will be shown. However, other centres with higher unemployment statistics may not be selected because they do not fall within the selected class categories.

This selection process is referred to as Qass Generalization. As it uses a mean value for the six classes of populated places and selects within those class limits, it provides a less precise abstraction of reality than does the attribute value generalization covered in point (c) of this section. This process for selecting the populated place dataset will be applied to strategy two of the generalization model as discussed in section 4.3.4.

(c) The third and final way the classification hierarchy will be used for populated places is by now using the necessity factor NF^ and applying it individually to each class. Accordingly, each class is ordered by its extension accordingtoaparticular attribute and the first instances within theextension are selected according to the percentage value specified by the necessity factor. This application of the necessity factor is applicable to strategy 1 and provides a more objective abstraction of reality based on the inherent classification criteria and as a result lends itself to greater precision than the preceding two applications. This use of the classification hierarchy will be referred to as Attribute Value within Class Generalization. By referring again to Figure 4.2 the different generalization processes, that can be applied

52


within the context of the classification hierarchy, can be observed. In each of the three scenarios described above, the generalization

processes concentrate on selections from the database. To decrease the initial density, attenuation factors can be applied to the MOF for each formula to provide continuous generalization. This process is available to the user following the calculation of the necessity factor and the reduction factors discussed in Chapter 5.

The concept of using a hierarchical structure can also be applied to hydrography. For example, in using stream order classifications the necessity factor can be applied to first the ordered extension of the Strahler stream ordering; this extension should be ordered from smallest to largest steam order. The second ordered extension should be on stream length, ordering them from longest to shortest. In this way within the ordering of the extension for stream order, each stream order value will also be ordered according to length. Hence, when the system is required to select automatically between two stream orders of the same values it can select the longest.

In the application of the NF^ to the Horton stream ordering classification the same rules apply in ordering the extensions. However, since the Horton classification conforms less to strict bifurcation laws than Strahler7s classification and more towards requirements of preserving distinct river entities which maintain a higher correlation with proper names the results could be considered less objective. As such, on the one hand this classification may yield a lower data reliability statement, yet on the other hand, may provide a superior cartographic design due to the preservation of the named entities. (See Appendix 4.3 for an explanation on Stream Ordering). Figure 4.3 illustrates conceptually the generalization hierarchy for hydrography.

53

D.E.Richardson

Superclass Hydrography

Class Level (Lakes)

Class Level (Rivers)

Object Level (Killamey

Lake)

Object Level (Stream Order

Objects)

Subclass Level (Stream Segments)

Figure 4.3: Aggregation Hierarchy for Hydrography

The hydrography dataset provides an example of an aggregation hierarchy that is characterized by PART-OF links as well as IS-A links as described in section 4.3.2. The aggregations occur here among stream orders in forming the object of a named river. The aggregated object has as part of its structure its constituent objects. The properties ascribed to each constituent object are inherited by each of the aggregated objects. Thus each river object, such as the Kootenay River, would be comprised of a set of stream orders, i.e. its constituent objects. In the application of hierarchical structures to hydrography, the generalization processes involved are selection, elimination, and replacements of objects. Replacements occur, when for example

5 4


lakes are eliminated from the midst of a selected river. The context with which the rule-based generalization will be tested is

landcover. In changing the level of abstraction for the landcover classification the necessity factor will not be applied as it incorporates in its structure the relationships of objects according to a particular subject matter. Since the landcover classification is the subject being mapped, here only the Map Object Functionality (MOF) will be used as a technique for decreasing the object density within the classification and aggregation hierarchy. The concept of hierarchies can in this way also be used for landcover resulting in an objective selection and generalization process, that specifies how much data should appear on the map and why it should appear.

4.3.4 Three Strategies of Generalization Using the Necessity Factor

The necessity factor in combination with classification and aggregation hierarchies provides three major strategies with which to generalize a map, these being a) Attribute Value within Class Generalization, b) Class Generalization, and c) Superclass Generalization. When attenuation factors are applied it allows the three strategies to provide continuous generalization since 0 < f < 1. Chapter 7 provides examples of the strategies described here.

In strategy one, or Attribute Value within Class Generalization each object class is addressed independently and the necessity factor is applied. Once the necessity factor provides the guidelines for how many objects should appear on the map given a particular subject and scale, a set of rules are applied to determine which objects should be selected according to the necessity f actor specification. For example if the necessity factor specifies for the thematic realm of landcover that 74% of the rivers should be selected, the 26% eliminated are removed according to a stream order classification and length attribute. The application of the attenuation factor will decrease the map object functionality rating. In this process, when a map is required as an analytical tool, strategy 1 provides the most objective approach to generalization. Each class intension is addressed followed by a logical selection of the members within each dass according to a relevant attribution. However, if objects are required on a map to support a thematic context, the selection can be done in a consistent and objective way by attribute value and the density for representation reduced by the application of attenuation factors to the MOF. This approach to reducing the density of objects appearing on a map is consistent with a user's requirement for the analysis of map

55

D. E. Richardson

objects or for their appearance on a map to support the analysis of a thematic context.

In the second strategy, referred to as class generalization, the dass intensions are the focus for generalization. In cases where a superclass structure consists of classes and subclasses, the mean value of the necessity factor is derived at the class level, rather than the superclass level. This environment allows generalization to take place according to the intensions of a class and the individual membership criteria that reflects that dass intension. It also allows generalization to take place according to the dass intension yet using a different criteria. This process becomes applicable in generalization best described with an example. In generalizing a landcover map from 1:2 million to 1:12.5 million for example, major populated places could be represented on the basis of pulp and paper production values. Although this technique would portray major populated places according to production figures, it may not however portray major pulp and paper production centres. This strategy provides a technique for supporting a context rather than a technique for analyzing a context

In strategy three referred to as Superdass Generalization, the rule for generalization is invoked at the superclass level. In its application to hydrography for instance, the necessity factor values for rivers and lakes are meaned and the result applied to determine how many objects should appear on the map. Thus, if the requirement for mapping landcover at 1:73 million specifies 74% of the rivers and 64% of the lakes should be shown, the hydrography superdass rating becomes 69%. The subsequent application of attenuation fadors will increasingly reduce the 69% density value. The superdass generalization applied to hydrography involves selection, elimination, and replacement, followed by topological restructuring to provide consistent connectivity and contiguity reference as wdl as new coordinate data and area, circumference and length statistics. Since the mean value is applied in this strategy at the superdass level, its use would be direded more towards cartographic design or thematic support for descriptive purposes, than for subsequent analysis. Figure 4.4 illustrates the generalization processes using the necessity fador.

4.4 Summary

This chapter discusses three major components involved in the development and specification of a conceptual framework for generalization. The conceptual modd provides guidelines or steering parameters for what

56


should be maintained and what should be eliminated from the map or representation and also specifies when this should occur given particular scales of representation and according to the thematic context. The conceptual model also presents reasons for why object classes and how much of an object class should appear at a derived scale of representation. Both these aspects are addressed i n the necessity factor which can be used as a technique for selecting and eliminating data to support a particular thematic context and can be used to determine how much of the thematic data should appear according to the map object functionality measure.

The relationship of the necessity factor to the underlying data model illustrates that the structure can be useful in manipulating classification and aggregation hierarchies. Since the conceptual data model accommodates inheritance it provides a way of associating properties with superclasses, and classes. These properties such as the example for populated places may be inherited by the class instances. In this way the data model allows alterations to the necessity factor by using the hierarchies. The data model also allows aggregation hierarchies to be used. In this process, elementary objects are aggregated to more complex objects such as in the generalization of landcover data.

With the use of attenuation factors, what originally appeared to be generalization based on discreet levels of scales, becomes continuous. The attenuation of the MOF provides ever increasing generalization to any smaller scale of representation. As such, if users enter the database and request abstraction suitable to a 1:7.5 million representation, they can choose to attenuate the database subset to any lower level of detail. On the same token, amplification would allow them to increase the level of detail to the extent of the original database.

57

D. E. Richardson

Interface Processes

Map Initiator

Select Thematic Context & Scale

Identify Need i.e. Analysis or Design

Generalization Strategies

Classification Generalization and required spatial aggregation

Necessity Factor Processes

M Attribute Value Within Class

Generalization (NF ) where i is each of

the map object classes & k is a thematic

realm such as landcover

JB

Class Generalization uses the mean of

the NF for the classes within a super

class. Data extraction = l(c^ > l(c.+1)

^Z

Superclass Generalization uses the mean of the N F * for the classes within a super dass. Data Extraction = A (eJ>A (e.+l)

P W p» i

Attenuafcns

Figure 4.4: Generalization Strategies Using the Necessity Factor

58


CHAPTER 5

• Data Certainty and Generalization

5.1 Introduction

The generalization of spatial and thematic data can lead to errors and uncertainties as a result of the quality of the original data and/or the processes effected on it to produce a reduced scale of representation. In turn, the results of generalization may present inaccurate products which can effect decisions. In this section, a review is made of the types of uncertainty important in a GIS environment and how this may be important for generalization. The second section presents calculations for specifying the amount of data reduction that occurs as a result of generalization processes. These reductions in data are presented at the object level, subclass, class, and superclass levels and provide a technique for users to evaluate the choices made in the different strategies for generalization.

5.2 Uncertainty in a GIS Environment

The concept of uncertainty in data reliability for cartographic and geographic information has been well expressed in the literature for decades. Curry, [1962] succinctly summarized uncertainty in geographic situations in his passage "It is too often forgotten that geographical studies are not of the real world, but rather perceptions passed through the double filter of the author's mind and his available tools of argument and representation. We cannot know reality, we can only have an abstract picture of it."

Without providing an exhaustive discussion on the degree to which reality can be represented, it is nevertheless worthwhile to briefly mention some of the inherent difficulties in modelling our abstractions in a GIS environment.

Some abstractions are more easily defined man others. Cadastral property boundaries, for example, represent a 'crisp' spatial set, since boundaries belong to property A or property B, but not both. Climatic boundaries or landcover boundaries are, however, vagueor transitional and cannot be precisely demarcated. Additionally, our abstractions are determined and interpreted by the user context, and consequently, objects repre-

59

D.E. Richardson

sented in a database are a subjective and context sensitive interpretation. The semantics used to express abstractions are also subjective and contexru-ally sensitive and play an important role in assigning definitions to objects and building classifications of information. Thus, various types of uncertainty affect data collection, measurement, definition and classification long before it is modelled in a systems environment. This being said, once spatial and thematic data are stored in a geographic database there are three general areas of uncertainty that can be examined.

The first type involves whether or not an object belongs to a set, i.e. is A £ S? or A ÇÊ S? These two possibilities of uncertainty deal with the membership of'A' to a crisp or well defined set; however, if the classification of the object is unclear, we have a fuzzy set that can be expressed as {A 10 A1}. Here " 'A' may belong a little bit to S" [Molenaar,, and Janssen, 1991]. Thus, this aspect of uncertainty deals with the partial membership of a given element to a set whose boundaries are not 'crisply' defined.

In the first case, 'A' may represent a populated place, in which case the definition of'A' would be crisp. For example, 'A' may be a town and the set here would be defined by a population range of less than 1,000. The second instance, however, may involve something like landcover classifications. The definition of mixed forest may not always be clear as it may also be transitional forest. In this case, the object ' A' belongs partially to both sets; Le. mixed forest, and transitional forest

The second kind of uncertainty arises when neither the location of an object is known nor its correct attribute. This aspect of uncertainty relies on a number of different measures. With respect to the object attribute the probability of its class definitions can be mathematically determined, as, for example, P(A or B) = P(A) + P(B) providing A and B are mutually exclusive. If, however, A and B are not mutually exclusive, then P(A or B) = P(A) + P(B) - P(A+B). Defining a certainty factor for the x,y location of an object becomes much more complex, especially considering the different data types involved, such as points, lines, and areas. In linear features, for example, any number of measures can be applied to determine discrepancies between reality and the model, such as mean displacement from recognized points, measures of variance or standard deviation [Buttenfield, 1991]. Aspects where uncertainty is involved both for the classification of an object and its location could concern objects such as the mouth of a river. In these cases, the conventional geographic approach has been to apply limits based on the difference between fresh water and salt water. This interface, however, is usually represented by a zone of brackish water rather than a sharp demar-

60


cation making both the classification and location of the zone difficult to define (Maling, 1989]. Additionally, fresh water discharge from some rivers can extend as much as 200 miles into the ocean, such as the 'Amazon', thus making both the definition of its mouth and its delineation a situation to which uncertainty can be applied.

Another kind of uncertainty deals with the incompleteness of information. This type of uncertainty arises as a result of a lack of data, and consequently certain assumptions must be made. Generally, uncertainty of this nature has been modelled by non-numerical characterizations such as Doyle's reasoned assumptions [Doyle, 1983]. Figure 5.1 provides examples of spatial and thematic data reliability and the associated type of uncertainty.

The application of certainty factors to geographic data allows users to take account of data quality at the time of analysis and make decisions accordingly. However, if certainty factors are not maintained in the database, how can generalized data be 'accurately" assessed and without knowing the quality of the data, how can the best generalization tolerances even be applied? When generalized datasets are used for analysis, how reliable are the resulting products, and the decisions which may be made using them?

A comprehensive approach to GIS offering generalization facilities should include certainty factors as a necessary component of the database. In providing generalization options, a link should then be established between the certainty factors associated with the original dataset and die certainty or reliability of the data subsequent to generalization, particularly since in generalization, features can be aggregated, reclassified, eliminated, and reshaped. Each of these processes, as well as others involved in generalization, affect the spatial delineation and/or the thematic or attribute description, thus affecting data certainty. Whilst a link between the certainty of the original dataset and the results of generalization effects does not yet exist, users should, nevertheless as a first step, have the option of quantitatively evaluating and controlling the effects of generalization according to a data reduction factor.

61

D. E. Richardson

' A t e

Attrfoute Un

certainty

wS:::S8S:::>::::::?&£&£'S

Attribute Un

certainly

e l Ä Ä Ä

mm OK

x.yuncertaîWy OK :0K

^^^^^^^^B

Aes AgS

^gg-•-«•-•

•^>^::i?S^5^::;:;Siv

OK

^ U M U U U *

x,y uncertainty P(x,y), Variance, Standard Deviation

P(x,y), Variance, Standard Deviation

(A|0<A<1) P(AorB)

Figure 5.1: Examples of Attribute and Location Data Reliability and the Associated Type of Uncertainty

62


5.2.1 Use of Reduction Factors for Generalization

The quantification of reduction factors for generalization activities should be considered from a geometric perspective as well as a thematic perspective. With respect to the spatial aspects, there are three types of processes that can be examined; a) differences in measured lengths, areas, volumes, and densities; b) elimination of objects, and c) shifts of objects. [Joâo, Herbert, and Rhind, 1990]. The thematic aspects should be considered in the context of subclass, class, and superclass changes in classification hierarchies. Each of these types of alterations to the data can result in reduction factors in a number of different ways.

For example, reduction factors should be available at different levels in a spatial dataset. Relevant levels apply to the atomic data level, the object level, the object dass level and the superclass level. The map as a whole can also represent a valid level to which a reduction factor can be applied. A type of reliability statement, then, of the generalized product whether reviewed at the atomic level or the map level is, therefore, available to the user for decision making. In subsequent GIS analysis, reduction factor evaluations can thus be used as an a-priori statement for validating the geometric and thematic data for ensuing manipulations.

5.2.2 Reduction Factor Calculations

The reduction factors used in this research to record the amount of generalization address point, line, and area data according to entity, object, object class, and superclass levels of representations. The measurements include changes in length and densities and, in addition, make use of the necessity factor. More complex measures could be used, however, it is the opinion of this author that a reduction factor should be simple and straight forward. Accordingly, measurements for the three data types are as follows:

Linear Data Spatial Entity

Arc Length AAL = Change in arc length AAL =ALo-ALg 1 % AL = % Change in arc length %AL = (ALg/ALJ*100 2

where ALo is the length of the original arc AL is the length of the generalized arc

63

D.E. Richardson

These two measures operate at the atomic data level and will account for differences in arc lengths and the percentage of change between the original dataset and the generalized dataset This can be useful for assessing the effects of linear simplification testing.

Line Segment ASLS=Change in line segment length for the spatial object

na SLS = 2 ALB,

O O l

i=l

na SLS = ZALC

1=1

na na ASLS=2ALB i-2ALC

i= l i= l

na = 2(ALBl-ALgC1) i=l

= SLS -SLS

where na is the number of arcs in the object SLSo is the line segment length of the spatial object AL0B, is the original arc length for each object B,

where B= (Bt... BJ SLS is the generalized length of the spatial object AL C, is the generalized arc length for each

object C, where C, = (Ct... CJ and in which C, is derived from B, using a simplification algorithm

ASLS isthedifferenceinspatialobjectlengthbetween SLSo and SLSg

64


%SLS = % Change in line segment length for the spatial object %SLS = (SLS^SLSjnOO 6

These measures calculate the difference in line segment length and the percentage of change of the spatial object according to a specified thematic object.

Object Class

ACLL= Change in line length for the spatial object class

no CLL = 2 SLS . 7

O Ol

i=l

no CLL = 2 SLS . 8

1 = 1

no no ACLL=2 SLS. -2 SLS„ 9

i = l i = l

no = 2 (SLS . - SLS J

* OI g K

i=l

= CLL-CLL » 8

%CLL= % Change in line length for the spatial object dass %CLL= (CLLg/CLLJ*100 10

Where CLLo is the original length of all lines in the object class

CLLg is the generalized length of all lines in the object class

no is the number of objects per object class

65

D. E. Richardson

Measurements here are calculated for the object classes to indicate the difference in the object class and the percentage of change according to the class breakdown within a superclass.

Areal Data

Object Class

AAC = Change in polygon object density within the class

AAC = AC -AC 11 o S

% AC = % Change in polygon object density within the class

%AC = (AC/ACjnOO 12

Where AC is the area class density ACo is the area class density of the original

map AC is the area class density of the

generalized map na

AC = 2) ai/area i=l

na is the number of polygons in the class

ai is the area occupied by the polygon area is the area containing the

polygons

For areal data, change in polygon density is calculated according to the class definition.

Object Super Class

SC = Mean % elimination of superclass objects

SC = 100 - ( ? [NFJp/oc) 13 i=l

66


Where oc is the total number of object classes in the superclass

NF^ is the necessity factor for each object class of the superclass

The superclass measure indicates the mean percentage of class data eliminated from the superclass representation.

Point Data

Object Class

APC = Change in point class density APC =PC-PC 14 %PC = % Change in point class density %PC = (PC^PCjnOO 15

Where PCo is the point class density of the original map

PC is the point class density of the generalized map

PC = np/area np is the number of points in the dass area is the area containing them

For point data, the change in the point density is calculated according to the class definition and according to the map area.

Object Superclass

SC = Mean % elimination of superclass objects

SC =100 - ( Z [NFJp/oc ) 16 i=l

The superclass measure indicates the mean percentage of point dass data eliminated from the superdass representation

67

D.E. Richardson

The above quantification for reduction factor evaluations can be derived once the NFik has been applied to a dataset. Most of the calculations evaluate a single data type, or in other words, either points, lines, or areas. The calculations, however, for the superclass evaluation can provide a statement for any combination of data types. Hence, it provides one evaluation for the conjunct of disparate data types within the superclass structure.

The objective in using these kinds of reduction factors has been to provide a simple means with which to quantitatively evaluate the degree to which a product has been changed. Calculations providing greater detail for the effects of scale change could, however, be used such as nearest neighbour analysis, relative distributions, and compaction ratios. Figure 5.2 illustrates the different levels of superclass structure and its relationship to generalization activities and reduction factors.

5.3 Integration of the Necessity Factor Classification and Aggregation Hierarchies and Reduction Factors

The combination of the necessity factor with the classification and aggregation hierarchies provides a number of strategies with which to choose generalization options according to the user's need. The inclusion of attenuation factors in strategy one supports the objectivity of the necessity factor while significantly decreasing the density of objects appearing in the derived map. Strategies two and three alter the impact of the necessity factor by exploiting the concepts of hierarchies. Again, attenuation factors can be applied to decrease the map object density.

Once the user has determined the approximate level of detail given the context for which the data are required, the generalization model permits automatic selection, elimination, classification generalization, spatial aggregations, and object replacements.

After these generalization processes have occurred, the reduction factors quantify the extent to which the necessity factor and its use with classification and aggregation hierarchies has altered the original dataset. These calculations observe the distinction between a spatial object and a thematic object. In generalization the distinction remains importantsince the spatial representation maybe changed yet the thematic classification remain the same. The calculation of reduction factors at entity level data through class level and superclass level allow this distinction to be observed.

68


Hydrography Superclass

Class Level Pvers)

Class Level (Lakes)

Object Level (Kootenay River)

Spatial Entity level

(arcs & nodes) Execution of NF,

Object Level (Killamey Lake)

Spatial Entity level (arcs & nodes)

N o t e l : CLRF -Class Level Reduction Factor OLRF -Object Level Reduction Factor ELRF -Entity Level Reduction Factor

Note 2: The left hand side of the diagram shows the class level of rivers, the object level and the entity level. If the NF is invoked at the class level, i.e.class generalization, the actual execution is carried out at the entity level. Once the NF, has been executed, the reduction factor is calculated at the entity level, then at the class level, as the arrows indicate

Figure 5.2: Application of Necessity Factor and the Calculation of Reduction Factors in Relation to the Superclass Data Structure

69

D. E. Richardson

5.4 Summary

In Chapter 3, the data model was examined and in Chapter 6 in the examination of logical data structures, a clear distinction is made between a spatial object and a thematic object. In generalizing map data, the structural distinction between a spatial and a thematic object is necessary since activities can be invoked on the spatial object so that its detail for representation is altered, for example shifted or simplified, yet its object identification, classification and description remains the same. For example, linear features may be simplified using coordinate point elimination algorithms such as the Douglas algorithm or they may be spatially displaced for either aesthetic purposes or due to congestion yet their object definition i.e. thematic description could, and often does remain the same.

In other circumstances when aggregations occur, the classification hierarchy is involved and depending on the rules applied, some information may be eliminated from appearing on the map while other sets of information may be aggregated and in so doing change the level of classification appearing on the map.

The reduction factor calculations discussed in this chapter are intended to present the user with a statement about how much data has been selected from the original database and how it may have been changed. Geometric information is provided such as thematic information relating toobjectlevel, subclass, class, and superclass levels of data. In the next two chapters we will see how reduction factors can play a key role in assessing the generalization and allow the user to apply attenuation factors to further increase generalization processes.

70


CHAPTER 6

Implementation of the Context Transformation Model

6.1 Introduction

A number of programs have been written to implement the models for context transformations. This has been done using an Arc/Info system with an Oracle database, with programming in AML and SQL. The processes for generalizing the data relies on the models for generalization which are supported by the underlying data structures. Once the data are in a suitable format it can be processed through the models to automatically provide a generalized product at any smaller scale.

The test area data used for the implementation is southern Saskatchewan, Canada, located at the following coordinates 49°45MN, 110°W, 49°45"N, 102" W, and 54°N, 110°W and 54°N, 102°W. Details and reliability of the data are addressed in section 6.3.1.

6.2 Database Environment

An Arc/Info geographic information system, version 5.0.1 was available for implementing the database for testing the context transformation models. Arc/Info provides capabilities for manipulating, analyzing and displaying geographic data in digital form and organizes geographic data using a relational and topological model. This allows both locational and thematic data to be handled effectively. Topological structuring is a requirement for context transformations particularly with objects such as lakes and rivers. The Arc/Info system provides the required topology and represents map features by sets of arcs and nodes, and provides the topological relationships among connected lines and points. In this way, the connectivity and contiguity of map objects is provided. The data structures required for the generalization, and subsequently implemented in an Arc/Info system according to the FDS guidelines, are conceptually generic, and could have been mapped to a database environment other than relational, such as hierarchical or network providing it contained topology.

Arc/Info was used for data capture, and representation, topological structuring and spatial data maintenance. The thematic data, however, were

71

D.E. Richardson

maintained in Oracle, processed for generalization, and re-represented in Arc/Info.

The Oracle relational database environment is fully relational, supports one-to-many and many-to-many operations, allows easy access to all data, and provides greater flexibility in data modelling and processing. The Oracle relational database management system operates with 'SQL', an English-like language, that is used for most database operations. Oracle tables and Info data files were used interactively via the relate environment in Arc/Info.

6.3 Database Design

Ensuring that the data used for testing are in a suitable format is an important aspect for developing context transformations. In this process the objective is to deliberately omit some details while replacing others and yet again reclassifying still others according to the intended application of the abstractions resulting from the user's requirements. The database implemented in the prototype is built to include classification and aggregation hierarchies which provide a fundamental technique for database abstractions.

6.3.1 Geometric and Attribute Database Structures

The necessity factor, classification and aggregation hierarchies, and the reduction factors are achieved by mapping the formal data structure model described in Chapter 3 to logical data structures, in this case in an Arc/Info and Oracle database environment. This research uses three superclasses of data for testing the context transformations which consist of hydrography, populated places, and landcover. In the next sections these data are examined according to their geometric and attribute data requirements.

Hydrography

Hydrography data have been digitally captured from 1:2 million maps, lambert conformai conic, standard parallels at 49°N and 77°N, produced by the Canada Centre for Mapping, Energy, Mines and Resources, Canada, 1990. Data consists of stream segments, rivers, and lakes. Linear feature hydrography has been digitally captured in arcs ending and restarting at nodes, as shown in Figure 6.1. Each hydrographie tributary has three atomic

72


data items which consist of the line segment and a from-node and a to-node, which together form the arc When the delineation of rivers requires a double line for their representation, each bank of the object has been digitally captured as an arc and uniquely identified to form a relationship with the objects. In both single line and double line rivers, each arc is uniquely identified by system generated reference numbers allowing the assignment of attributes to entity level data. The assignment of stream order attributes with double line rivers in some cases requires a line to have more than one stream order. This is illustrated in Figure 6.2. In these cases, a pseudo node is defined and the two or more segments that represent the river are assigned the appropriate stream order. The downstream segment from the pseudo node always gains the higher stream-order value. Both Horton stream-ordering and Strahler stream-ordering have been used for testing.

Stream and river length statistics are generated using existing software programs. The formation of objects such as river segments of stream order 3 are established by assigning the relevant value to the particular arc or set of arcs, which is consistent with the guidelines set ou t in Chapter 3 by the FDS model. Figures 6.3 and 6.4 illustrate the logical data structure for storing spatial and thematic data for hydrography. In Figure 6.3 the vertical relationships of the logical data structure of the data are illustrated reflecting the FDS characteristics. Also evident is the distinction between the spatial object and the thematic object. Figure 6.4 illustrates the table design typical of a relational database, and again, the distinction is dear between the spatial data and the thematic data.

The building up of objects in hydrographie networks from atomic data later supports generalization of streams and drainage networks, based on stream orders, and/or lengths and allows 'extensibility' of the attribute domain. The separation between the spatial objects and the thematic objects facilitates certain operations in generalization such as linear feature smoothing, simplification or displacement. In these operations, the graphic representation of geometric data can be altered but the thematic description remains the same. Thus, it is possible to maintain one thematic description with many different geometric representations. In operations like selection, aggregation, reclassification, and replacement, both the geometric and thematic data are addressed. The separation between spatial and thematic data supports context transformations and additionally can be used compatibly with the strengths afforded by classification hierarchies.

73

D.E. Richardson

1 Node Reference

1 Arc Reference White River

Figure 6.1: Arc and Node Reference for the Representation of a Hydrographie Feature

Pseudo node

Numbers indicate stream orders according to the Strahler classificaton

Figure 6.2: Double Line River and a Pseudo Node Requirement for Stream Ordering

7 4


Real World Phenomena

Database Interpretation of Real World Phenomena

Feature Class = Rivers

Feature = Kootenay River

Object Class = Rivers

Object

Spâfiai Object

Themafic=Kootenay Rrver i Object j

> Attribute löpótogtcat Attribute

Prorata Uit Right NoöV l Foty

1 Spatial Î Attnbute\alue

i J Spatial i Entities

Topological Attribute Values

1 Nodes Arcs [(points)

Values

Object^, Attribute *

h Object ^

Attribute Value »

1" Entity ^

Attribute \

I Entity Attribute

Value

^

r C ) Object level attribute domain and values

( ) Entity level attribute domain and values

Figure 6.3: Object Formation with a Logical Data Structure

75

D.E. Richardson

Figure 6.4: Spatial and Thematic Data Structure for the Rivers Object Class

Spatial Geometric Structure Thematic Structure

= s

Line

Id

Recno

12

13

14

IS

16

1 | 7 I — — — -

18

19

20

21

22

f 2 3 _ " " I _ _ - _

24

- l i 8 - - -

palial Object

Spat ia l At t r ibute

Coords

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

x.y

Topological Attr ibute

From To

12 14

13 14

14 16

IS 16

16 17

17 23 •

18 19

20 21

22 21

21 19

19 23

24

25

25

26 '

Line Id

Recno

12

13

14

15

16

17

18

19

20

21

22

23

24

25

= Topolo Valu»

gica s

1 Attribute

Ent i ty At tr ibute

Stream Ord

;'~r\ ! ' » 1 2 ! • 1 |

1 2 »

• 1 1

1 1 »

1 1 l _

! 2 »

!$1 1 * !

V.aJi ^ '

Object At tr ibute

Name

NN

NN

NN

Cros*

Cross

[Kootenay)—

NN

NN

Palliser

Palliser

Palliser

f Kootenay 1_

White

[Kootenay j -

X Object Attribute Value

• Attribute Values

= Entity Attribute Value

= Object Attribute Values

Note 1 : NN implies that the river has not been named on the map reference used for graphic

Note 2: An object is represented by rows 17,23, and 25 shown in both the spatial geometric structure and the thematic structure. A spatial object however, is represented by the geometric structure only. This is so, since a geometric structure, either in the form of an entity, entity set, or spatial object may carry more than one thematic description. The identification of a spatial object is also important in generalization when the geometry changes but the thematic description remains the same.

Note 3: The determination of an object is contextualty sensitive.

76


For the representation of lakes, geometrically atomic data, like nodes and arcs, are linked to form polygons. Topologically this includes left and right characteristics of arcs and applies also to the delineation of islands. To provide polygon statistics, arcs are processed for closure to form closed polygons. Attribute data for lakes consists of area and circumference, however, numerous other attributes could be used for the generalization of lakes.

Connectivity must be maintained between rivers and lakes so that continuity can exist through the drainage pattern. This can be achieved by using topological linkages in the database between a lake, and the river or rivers, connected to it, along with a lake-line which connects incoming and outgoing river segments when a lake has been dropped as a result of the generalization process. For example, for all lakes that terminate a river network, the lake-line is continued through the visual centre of the polygon upstream to a point on the furthest shore from the polygon outlet. For lakes that have both an inlet and an outlet, the rivers with the highest order streams using the Strahler classification are linked by lake lines to maintain the direction of flow. In this way the streams of greatest magnitude are linked and any additional incoming streams are then joined to the highest order stream. In the instances when streams of the same magnitude flow into a lake, the general configuration of the lake and the direction of run-off is taken into consideration in delineating the lake lines.

Topologically, all hydrographie objects are connected by their 'from' and 'to-nodes'. Each time a river enters or leaves a lake, a node is identified either as a 'from' or 'to-node' in the river network to establish the intersection. The delineation of the polygon also requires an intersection of the incoming or outgoing river to be identified with a node. If a lake-line is required through the polygon to connect incoming or outgoing river segments, nodes at these intersections are also required, thus allowing all three entity sets to be topologically linked. Figure 6.5 illustrates the connection of lake-lines and rivers along with the intersections at nodes.

Landcover

The vector landcover data havebeen generated from a lkmxlkm raster dataset produced by the Manitoba Remote Sensing Centre with the National Atlas Information Service of the Canada Centre for Mapping. The raster product originates from classification of NOAA (National Oceanic and Atmospheric Administration Satellites) from the AVHRR (Advanced Very High Resolution Radiometer) sensor produced during the summers of 1988

77

D.H. Richardson

V. ?(

^ \ \ i A.

M'Y 2 A3

f y i 2 \A!

2

10

4

\T2^^>

1 Stream Order 1 Arc Reference

13

I \ 1

1 \ I14

1S\)

lofe

\ r I ^ v V \52L3

« \

Note: Ares 13, 17, 5, 4 and 18 maintain connectivity with rivers and acquire the corresponding stream order values which are 1,2,3,2, and 3 respectively.

Figure 6.5: Stream Orders and Arc References Showing the Connectivity of Stream Network Topology Through Lakes

78

Automated Spatial and Thematic Generalization Using a Context TransformationModel

and 1989. The image has been registered using the World Databank rasterized linework containing coast, islands, major lakes and water bodies. The normalized difference vegetation indices (NDVI) were calculated and the images were then composited and classified.

The raster was vectorized with some smoothing obtained through a 2-pixel look-ahead algorithm. The accuracy of the coordinates is estimated at about 2-3 km. As a verification of the derived vector data, raster data were re-generated from the vector data and proved to be consistent with the original raster data. No filtering of the original dataset was done to eliminate small polygons.

Digital integration of thelandcover dataset (1989) and the hydrography dataset (1990) was necessary. In some instances where data discrepancies existed between the two datasets, the landcover hydrography data were reclassified to the surrounding landcover class.

The landcover superclass data consists of four classes and eight subclasses, which can be further broken down into sub-sub classes. For the test area, however, only three classes are present. These are forest, agriculture, and non-vegetated. The forest subclasses consist of coniferous, deciduous, and mixed forests, while the agriculture class consists of cropland and rangeland subclasses. The non-vegetated class breaks down into built-up areas, perennial snow, glaciers, and pit mines. However, in this last class the only subclass present in the southern part of Saskatchewan is the built-up areas. The classification hierarchy for classes and subclasses is structured in the relational environment using object codes, and includes area and perimeter statistics. The vector landcover data are stored as polygons, with polygon topology such as from-node, to-node and left and right polygons. Figure 6.6 illustrates the hierarchical structure for landcover data.

Populated Places

The coordinate data for populated places have been captured from the same source as hydrography. The classifications of city, town, village, unincorporated places, non-unincorporated places and indian reserves, however, have been extracted from sources available from the Canadian Permanent Committee on Geographic Names Gazetteer.

The data have been captured in point form and the assignments of object codes are used to define class intensions. In other words the object code has the implication of the class intension embedded in the code. In this way, the digital specifications for the populated places allows a selection of place types to be made at the superclass level or based on the class level. Since

79

D. E. Richardson

Landcover Superclass

Class Level

Forest Agriculture Non-vegetated

Sub-class Level

Coniferous Deciduous Mixed Cropland Rangeland Builtup Perennial Glaciers Pit

. » Mines Areas Snow

Implemented Portion of Hierarchy

Data subclasses not present in test area dataset

Figure 6.6: Landcover Classification Structure

80


every object belongs to a class, every object is defined according to its class code. The object level can, of course, be accessed as well. This is described in detail in Chapters 4 and 7. This structure also allows the individual symboli-zation of each class. The attribute data assigned to populated places includes population and place name.

6.3.2 Necessity Factor

Once the spatial and thematic data are in a suitable format the objective is then to reduce the density of representation for smaller scale portrayal. The necessity factor consists of the map object requirements and the map object functionality, as outlined in Chapter 4. The ratings for each map theme are portrayed in table format for the individual map object classes. These ratings are combined with the map object functionalities which are also maintained in table format. The combination of the MOR and MOF determine how many objects will appear on the derived map at the selected scale. Once the necessity factor has been calculated for the particular theme of interest at the selected scale, the data for landcover, hydrography, and populated places are selected according to their superclass, class, or subclass criteria, or in the case of landcover data, they can also be aggregated from subclass to class levels.

The density of objects appearing on a map as a result of the necessity factor calculations can be controlled by means of attenuation factors. These are in the form 0 < f < 1 and are applied directly to the Map Object Functionality ratings. For thematic mapping this provides sufficient sensitivity to reduce map object density and provides a technique for continuous generalization.

6.33 Organization of Data

The organization of spatial and thematic data layers assists in transforming geometric and thematic data in an integrated way. The development of layers was based on the application requirements for generalization, topological requirements of the data, representation needs and finally, groupings of thematic components were considered. The layers and their descriptions are illustrated in Figures 6.7, 6.8, and 6.9. In examining the figures, it is apparent that there are three superclasses of data maintained in three layers in an Arc/Info system. Within each superclass, class intensions are coded to assist processing for generalization at superclass, class, or

81

D.E. Richardson

subclass level. These classification structures can provide a powerful environment for generalization and aggregation and are illustrated in the aforementioned figures.

In the hydrography superclass for example, one physical layer of data exists consisting of three classes of data. Within this structure, hydrography can be generalized at superclass or class levels according to the stream-order classifications. No aggregation occurs within this data group, however, replacement automatically takes place when a lake-line is required to replace a lake. This is achieved by means of topology.

In the landcover dataset, aggregation and generalization can be employed separately, however, the hierarchical structures also allow aggregation and generalization to be used together. For example, aggregation and generalization operations can be observed independently in Figure 6.8. For very small scale mapping, landcover data can be aggregated from low level subclass data, represented on the right hand side of the diagram, to the higher level class data, represented on the left hand side of the diagram. Thus, the diagram illustrates aggregation occurring from right to left along the horizontal place. When this process occurs, the subclasses of deciduous, coniferous, and mixed-forests are automatically aggregated to form the class 'forest'. The data structures also allow generalization to occur at the superclass and subclass levels. In these cases selection is involved rather than aggregation. For example at the superclass level, the entire dataset is organized by area and the smallest polygons are eliminated. However, when subclass selection is chosen as a generalization option, the smallest polygons from each subclass are eliminated. These last two processes; i.e. superclass and subclass selections, are more suitable for detailed mapping. The aggregation processes, however, are better suited to small scale thematic mapping, because in this case, composite objects are used and, as such, provide a less detailed representation of thematic information.

82

Automated Spatial and Thematic Generalization Using a Context Tranformation Model

<0 -J «o «o jo

§• co

I I c

Q

c o

"«1 c <0

8 a O

c o 'tn c S

M O

c o

'tn

c 3 jz

SS O

!

4 _

£ 3

£5 ra E'S -§ 8. >* 3 X w

83

D.E. Richardson

84


o

1

•= o S « I 3 8 O O 5 3 1 0 . 11 )

85

D.E. Richardson

Like the other superclasses, the populated place superclass is also maintained in one physical layer, in this case consisting of six dasses. The object codes used to define class intensions are used in the generalization process along with attribute values for superclass, class and attribute value within class generalization.

6.3.4 Reduction Factors

In a GIS environment that generalizes data, a useful process to create and invoke, is a function that will automatically evaluate the results of the generalizations. For this purpose reduction factors have been developed which quantify the extent to which the data have been altered by the processes enacted on it. In this particular prototype, reduction factors are available as a means of evaluating the generalization processes at each step. Once the data have been generalized, the reduction factors are immediately available to the user to evaluate the forthcoming representation or product. Reduction factors, according to the calculations outlined in Chapter 5, are available at point, arc, and polygon level, object levels, class and superclass levels. The results of the generalization are indicated in table format, illustrating the original densities of data and the level of density resulting from the generalization processes. These values are available in actual numbers of entities in the database, and as percentages. Table 6.1 provides an example of the reduction factor tabulations. The first column indicates the percentage of data remaining at the derived scale. For example, 41.93% of streams that are classified as 1 in Horton scheme remain at the newly derived representation level. The next two columns show the actual numbers of map features that are attributed with the particular class attribute both in the original database and at the derived scale of representation. When the reduction factors are provided to the user, an assessment can be made evaluating if the density reduction is extensive enough for the intended generalization. If the generalization is not extensive enough, an attenuation factor can be entered which will further decrease the density.

86


Table 6.1 Reduction Factors for Mapping Landcover data at 1:7.5 million with No Attenuation Factors

Map Object Class %DB Retained Original Data Derived Data Reduction Factor

Hydrography

# of arcs -Horton 1 -Horton 2 -Horton 3 -Horton 4

Total # of arcs

Length - Horton 1 - Horton 2 - Horton 3 - Horton 4

Total arc length

#of Lakes Lake area

Total Hydrography

Populated Places # of cities # of towns # of unincorp. places # of villages # non-uninc. places # indian reservations

41.93% 100.00 100.00 100.00 74.16

34.51 100.00 100.00 100.00 71.01

64.50 92.86

69.50%

93 53 44 19

209

3,477.79 km 2,053.48 1,620.20

705.15 7,856.63

130.00 6,635.55 km2

39 53 44 19

155

1,200.23 km 2,053.48 1,620.20

705.15 5,579.07

83.85 6,161.99 km2

25.84%

7.14%

30.50%

81.00% 81.00 22.78 63.96

0 10.00

10 95

316 247

1 59

8 76 72

158 1 6

Total Populated Places

Landcover # built-up areas

# rangeland # cropland # Agricultural Total

# mixed forest # deciduous forest # coniferous forest # Forest Total

43.25% 728

100.00%

93.62 80.80 87.59

82.89 80.72 91.36 84.41

3

141 125 266

567 223 243

1033

321

132 101 233

470 180 222 872

55.91%

Total Landcover 85.11% 1303 1109 14.89%

87

D.E. Richardson

6.4 Program Design

The context transformations using the necessity factors with superclass, class and subclass classification and aggregation hierarchies, the attenuation factors, and reduction factors with the database interface results in some fifty-eight program modules. These include the modules for generalization as well as some of the plotting modules. The core module allows users to first select the theme for mapping, in this case landcover, agriculture, climatology, or phytogeography are offered. As many as some 40 different themes are possible in this prototype, however, only a few have been tested. The core module then offers a number of scales that can be chosen for data representation followed by the subsequent calculation of the necessity factor. At this stage the necessity factor is used to calculate the amount of data that should appear at the derived scale, but is not used to calculate which data should appear. Once the necessity factor has been calculated, options are provided to the user for the types of strategies available for the different categories of map information. For example, for hydrography, options are provided for generalizing rivers on the basis of Horton stream-ordering at class level or superclass level with the same categories also available for the Strahler stream-ordering. If generalization is done at the superclass level, a mean is taken for the necessity factor for rivers and lakes and the density calculated accordingly using either Horton or Strahler superclass, and for lakes the calculation is achieved according to the area statistic. If the user chooses to generalize on the basis of class, then a specific necessity factor value is applied to rivers and another to lakes, thus treating the classes individually.

This portion of the program invokes another program module that sorts the hydrography database accordi ng to the parameters specified by the user. Thus for superclass generalization on Horton stream-ordering, the database is ordered according to Horton stream-orders and length. When the system must select between two stream-orders of the same value, it will choose to represent the longest one. Lakes are subsequently sorted on area and selected according to the largest ones according to the requirements specified by the necessity factor. In this portion of the generalization core program module, another module is invoked for maintaining topological consistency in the drainage network subsequent to the generalization process. This is necessary when a lake is eliminated from the midst of a river. These generalization processes involve selection, elimination, replacement and topological restructuring.

88


With populated places, the program offers options for generalizing on the basis of attribute value within class, class, or superclass generalization. Each of these selection options provide a different distribution of point features. For example, with attribute value within class, the necessity factor is applied to each of the populated place categories. Depending on the necessity factor values, each of the categories of populated places could be represented. With generalization on the basis of class, the intensions of each of the categories are first ordered and the mean of the classes extracted from the ordered intensions and the subsequently ordered extensions. Thus some of the classes may not be represented. If superclass generalization is specified, all the extensions are ordered regardless of the class intensions thus rendering a different distribution again.

With the landcover data, three types of generalization are afforded. The first is selection by area on subclass-level, the second is selection by area on superclass-level, and third, aggregation of subclasses to class level. In the first instance, the subclasses are ordered by area and eliminated according to the specified necessity factor value. In the second instance, all subclasses and classes are ordered within the superclass according to area with no intension distinction. This means that all the smallest polygons will be eliminated regardless of class or subclass levels. In the third instance, a different process is applied, that being aggregation. Here, the subclass data are automatically aggregated to class levels rendering three new classes of data. This is achieved automatically and all new topology and area statistics are derived.

At each stage of the generalization the user is provided with the quantitative results of the processes. If the reduction in density is not suitable, the user can apply the attenuation factors. This process will increase the density reduction which can again be evaluated according to the reduction factors. Figure 6.10 provides an example of the core program module user interface, while Figure 6.11 illustrates the program flow.

Figure 6.10 Selections from the Core Program Module

&type 1 = landcover &type 2 = agriculture &type 3 = climatology &type 4 = phytogeography &setvartemp_theme [response 'enter theme of the map (1,2,3, or 4)"]

89

D. E. Richardson

&type 2 = 1:2,000,000 Ätype 7.5 = 1:7,500,000 &type 12.5 = 1:12,500,000 &type 30=1:30,000,000 &setvar .scale [response 'enter scale of the map (2, 7.5,12.5, or 30)1

&type select a strategy to generalize hydrography • based on: &type 1 = Horton stream-ordering on class level &type 2 = Strahler stream-ordering on class level &type 3 = Horton stream-ordering on superclass level &type 4 = Strahler stream-ordering on superclass level &type 9 = skip

&type select a strategy to generalize populated places - based on &type 1 = attribute value within class &type 2 = class &type 3 = superclass &type 9 = skip

&type select a strategy to generalize landcover - based on &type 1 = generalization by area on subclass level &type 2 = generalization by area on superclass level &type 3 = aggregation of subclasses to class level &type 9 = skip

90


User Evaluation

Draw Map

Thematic Mapping Specifications

Initiator (User)

i i

Scale & Theme

Identify Mapping Specifications

Spatial and Thematic Database

Compute Reduction Factors

i k Programs for Plotting Database Sub-sets and Aggregations

Retrieve Plotting Specifications

User Evaluates Reduction

Apply Attenuation Factors

Retrieve NFik

Density Reduction Calculation

Select Generalization Strategies

Superclass, Class, Subclass, & Aggregation Strategies

Retrieve Database

Write Select Compute Database Subset

Figure 6.11 Overview of Program Flow

91

D.E. Richardson

6.5 Summary

In Chapter 3 it became apparent that data should be structured so that multiple interpretations or abstractions can be derived from the same data. This is a necessary step in the map generalization process, since data abstraction can occur differently at different scales and can even be different within the same scale given the diversity in map user perceptions and functions. Additionally, different interpretations of the same data are necessary due to differences in applications which in many ways can affect the type of abstractions required. If multiple abstractions of data are to be achieved, it is essential that the underlying data model provides flexibility in using the data, while at the same time providing a stable environment for the data. The logical data structures built for this prototype reflect the underlying conceptual data model described in Chapter 3, (i.e. the FDS) and support map generalization in a range of ways suitable for a diversity of applications and abstractions.

92


CHAPTER 7

Assessment of Results

7.1 Introduction

To explore the results of the context transformations, this chapter is divided into three sections. The first section addresses the underlying data model environment which is used to support the context transformation model. The second section of this chapter looks at the different strategies for context transformations using the necessity factor in conjunction with the superclass, class, and subclass structures. In this case, a number of maps have been produced to illustrate the capabilities of the generalization model. Statistical results accrued from the reduction factors are also reviewed. Finally, the third section looks at the context transformations in relation to user requirements in spatial data manipulation.

7.2 Formal Data Structure Model Support for the Context Transformations

The data model selected to provide a geometric and thematic foundation for the context transformations, is the Formal Data Structures of M. Molenaar, 1989. Certain aspects were considered before selecting the data model to ensure that context transformations would be successfully implemented. The primary objective was for the data model to support requirements in generalization. Since it can be anticipated that the need for generalization will vary according to context, scale, map use, and the user's perspective, the aspects considered in selecting a data model involved particularly the provision of flexibility in achieving the results. In some cases this meant that different types of data should be represented as one type of data under certain conditions, for example when lakes in the midst of rivers are eliminated and to retain river connectivity, they must be replaced by a river segment. It also meant that the data model needed to handle both thematic and geometric data independently, as well as in an integrated way, so that different operations in generalization could be performed. Finally, the data model needed to support topological querying as well as classification and aggregation hierarchies. Each of these aspects are discussed in the following sections.

93

D.E. Richardson

7.2.1 Generalizing Geometric and Thematic Components in an Integrated Way

Some aspects considered in generalization concern the distinction between the spatial object and thematic object. This is necessary since the spatial representation of something may change, yet the thematic description will remain the same. In this context the FDS provides a distinction between the thematic and geometric data types. For the particular generalization undertaken here, both the spatial objects and thematic objects were changed. The former case, in which the spatial object changes and the thematic object remains the same arises more when activities like simplification and displacement are involved. In this research, however, the processes involved, selection, elimination, replacement, and aggregation, would generally occur before simplification and/or displacement. This having been said, the processes applied in the context transformation model manipulate the spatial and thematic objects in an integrated manner. An example of this is in the generalization of hydrography. In this case the underlying data model provides the structure for generalization using both the attribute domain and the geometric domain. This results in a process that generalizes hydrography based on the thematic structure such as the stream order classifications of Horton and Strahler. The geometric domain becomes involved when a lake bisects a river and in the elimination process the lake is removed. In this instance the geometric domain must be utilised to maintain connectivity between the selected upstream and downstream rivers. This involves topological processing for contiguity and connectivity.

The manipulation of the hydrography data also involves another concept in generalization which is important. In changing the representation of a lake to a portion of a river the database is presenting two different types of data as the same data, i.e. the lakes concerned now appear as a portion of river. When these transformations occur, the new level of abstraction is supported with newly derived topology, area, length, and coordinate data. This can be stored as a new dataset whilst the original data remains unchanged. In querying the database, however, it remains to be seen at which level users prefer access. The new abstraction can be queried, or the user can access the original dataset. With the addition of amplification factors, the user would have specialization access from the abstraction level to the lowest level of the hierarchical data structures.

94


7.2.2 Generalization of Thematic Data From One Level to Another

The Formal Data Structure model supports the concept of object hierarchies. Since the model makes it possible to classify and manipulate data in a hierarchical way, it means that generalization and aggregation hierarchies can be used to decrease the density of objects appearing on the derived map. This capability allows different users to access the data at different levels of abstraction.

In the landcover dataset, for example, generalization can be applied which results in the formation of a new thematic object. Spatially this means that subclasses are aggregated to represent a new object class level along with a new topological structure including new area and circumference statistics. In the landcover classification, thematic classification generalization results in, for example, coniferous, deciduous, and mixed forest being automatically reclassified to forestry, and cultivated and range land, being reclassified to agriculture. Spatially, when two polygons are adjacent and belong to different subdasses, but are of the same class, they are aggregated. Thus, thematic and spatial aggregation allow users to regard a higher level object, with lower level details being suppressed.

7.2.3 Geometric Changes as a Result of Context Transformations

As discussed in Chapter 4, generalization in a systems environment should be explored as a context transformation. This implies that with scale change and different user requirements the context of a 'map' may change resulting in geometric transformations. In this regard, the data structures built within the framework of the FDS allow context transformations. This is evidenced by the landcover classification generalization hierarchies and the subsequent spatial transformations in the form of aggregations. Another example of the context transformations lies in the representation of hydrography. With changes in scale and context, the elimination of lakes that bisect two river segments requires the presentation of the resulting drainage pattern to appear just as rivers rather than lakes and rivers. In these instances, the geometricdata structures are transformed from a combination of arcs and polygons to arcs with a new topological dataset as well. Thus the new representation can be queried at the new representation level both thematically and geometrically. Of course, the user always has the option of querying the original dataset as well.

95

D.E. Richardson

7.3 Results of the Context Transformations

The application of classification hierarchies with generalization in conjunction with the necessity factor can be used in a number of ways. Generalization can be achieved by applying the necessity factor to superclass, class, and subclass levels. In superclass generalization, the class intensions are not observed, whereas in class level generalization they are. When attribute value within class generalization is used, the necessity factor for each object class is calculated. The strategies for generalization according to the three superclasses, are reviewed in the following part of this chapter.

7.3.1 Populated Places

Superclass Generalization; In superclass generalization, an entire dataset can be accessed, organ

ized and generalized for different scale representations according to any number of variables. For example in populated places at 1:7.5 million the necessity factor is meaned for the six different classes, or in other words the mean for the superclass. This results in a certain percentage of the complete database being specified for representation at the desired scale. For data extraction purposes, this process can be described as follows:

Ve, £ SC and Ap(e) > Ap (e. + 1) where V= for each and every

SC= a superclass e, = is an object A = the property of an attribute

Thus, the superclass extension is ordered from largest to smallest attribute value on in this case a population statistic. The class intensions, which are the classification description of each class, are ignored. This process of generalization takes a subset of populated places which represent those places with the highest population, or in other words, it is a selection by rank order. If superclass generalization were performed on another variable such as unemployment statistics, this process would render results that show the places with the highest unemployment. It would not necessarily show the largest cities or towns with the highest unemployment as in this case the class intension is not observed. Therefore, the superclass generali-

96


zation would provide a more realistic representation of reality than the class generalization would. This is because the variable for data extraction is not a class attribute but rather a superclass attribute.

In Table 7.1 the map object functionality is provided for each class of the populated place classes. The mean of each class is averaged with the map object requirement ratings provided in Table 7.2 which in turn provides the necessity factor for each class. If required, the MOF portion of the necessity factor can be attenuated to increasingly decrease density. In Table 7.3, attribute value within class generalization with .5, .3, .2, and .1 attenuation factors are shown. This results in six of the five classes of populated places being represented with only non-unincorporated places excluded from the distribution. By observing the total figures, the increase in generalization is evident. Table 7.3 illustrates all three types of generalization on the populated place dataset.

Table 7.1 Map Object Functionality (MOF) Rating for a 1:7.5 million Map Representation

MOC

City Town Village U.O. N.U.P IR River Lake

OR

65 65 65 15 0

15 65 15

LO

90 90 70 20

0 20 90 90

Note: MOC= map object class OR= LO= EN= ME= DE=

orientation location enumeration measurement description

EN

90 90 20

0 0

20 70 70

ME

90 90 90 70

0 20 70 20

DE

100 100 25

0 0

25 75 75

MEAN

87 87 54 21

0 20 74 54

97

D.E.Richardson

Table 7.2 Necessity Factors including MOF Attenuations for Mapping Populated Places on a Landcover map at 1:7.5 million.

MOC

City Town UP. Village NU.P. Indian Rs. Superclass

N F *

81 81 23 64 0

10

Attenuation .5

59.25 59.25 17.75 51.00

0 5.00

Attenuation .3

50.50 50.50 15.65 45.60

0 3.00

Attenuation .2

46.20 46.20 14.60 42.90

0 2.00

Attenuation .1

41.80 41.80 13.50 40.20

0 1.00

Mean 43.25% 32.04% 27.54% 25.32% 23.05%

Note:The Necessity Factor with inclusion of an attenuation factor is calculated as follows: NFik = (MOR.k + (f)MOF,)) / 2 where f is the attenuation factor and i is the class, and k is the theme.

Table 7.3: Frequencies of Populated Places with Different Attenuation Factors for Mapping Populated Places on a Landcover map at 1:7.5 million

MOC

City Town UP. Village NU.P. Indian Res.

No Attenuation

8 76 72

158 1 6

Attenuation .5

5.9 56 56

125 0 2

Attenuation .3

5 47 49

112 0 1

Attenuation .2

4 43 46

105 0 1

Attenuation .1

4 40 42 99

0 .6

Totals 321 247 214 202 185

Note:Discrepancies in totals are as a result of rounding.

Class Generalization: On class level generalization of populated places, however, another

process is applied rendering different results. In this process, the class intensions are observed and the database extraction process takes the following form:

98


V C e SC and 1(c) > 1(^+1), and

VAeCandA p ( e i )>A p ( e 1 + l),

where I is the intension and c is the class therefore each and every dass that is a member of the Superclass, class 1 > class 2 and so on, and, each and every attribute (A) is a member of the class (Q and the property of attribute A of the object ^ is greater than e, +1

Thus, in this case, the class intensions are ordered from largest to smallest and each of the class extensions are also ordered. The subset is, therefore, class dependent.

Table 7.4: Frequencies of Populated Place Distribution According to Generalization Strategy Using a. 1 Attenuation Factor.

MOC Original Superclass Class Attribute Frequency Gen Gen Gen

City Town UP. Village N.U.P. I.R. Total

10 95

316 247

1 59

728

10 88 2

45 0

23 168

10 95 63

0 0 0

168

4 40 42 99 0 .(

185

Note: I n the above table it is clear that superclass, class, and attribute value within class generalization renderquite different distributions. Since the classification structure is related to a populated places legal status rather than population, the three different generalization approaches vary considerably in the resulting distribution.

Note: Discrepancies in final total populated place figures is as a result of rounding.

In the results shown for class generalization, only the first three classes provide populated places for plotting. In this process each of the class intensions are ordered according to their status. Hence, the database is ordered as follows:

99

D.E. Richardson

MOC Intension Priority City 1 Town 2 Unincorp 3 Village 4 Non-incorp 5 Indian Reserve 6

The populated place status is the first criteria used to determine the priority rating of the class intensions. The second consideration is the population range. These definitions are available in Chapter 3.

Once the class intensions have been ordered along with the extensions, the mean of the necessity factor is calculated with a .1 attenuation factor assigned to the MOF (in this example) and the resulting percentage of the database is mapped as a subset. With class generalization, the 168 populated places, which represents 23.05% (See Table 7.4) of the database includes all of the cities, and towns, and a small portion of the unincorporated places. Thus, the difference between superclass and class generalization is that in the former case, 23.05% of the database represents the largest places in the whole database regardless of their status. In class generalization, however, the specification of the class intention means that those places with a particular status are selected first and, according to their status, are then selected according to their population. Thus in the second case, the distribution may not indicate those places with the highest populations, but rather represent those places with the largest populations according to their political status.

Attribute Value within Class Generalization

In the case of attribute value within class generalization, the necessity factor is applied to each class independently. In this case, the necessity factor renders a more even distribution for the six populated place categories. By reviewing Table 7.5 the actual percent values of retained data can be observed. In the 'mean' columns, the effect of the retained data is apparent. For example, in Attribute Value within Class, the selected set of cities is 4, which would be the 4 with the highest population, thus providing a higher mean population than Class generalization in which all 10 are selected. This lowers the mean population due to the selection of the smaller cities.

100


Table 7.5: Generalization Results From the Three Strategies on Populated Places

Attribute Valu« within Class MOC % ret Or De Pop-Sum Mean Min Max

City Town UP. Vill NUP IR Total

40.00 42.11 13.29 40.08 100.0 1.01

25.00%

Class Generalization City Town U.P. Total

Supercl City Town U.P. Village I.R. Total

100.00 100.00

19.93 23.05%

10 95

316 247

1 59

728

10 95

316

ass Generalization 100.00 93.00

.6 18.20 38.98

23.05%

10 95

316 247

59

4 40 42 99

1 .6

185

10 95 63

168

10 88 2

45 23

168

341,539 79,511

6,003 31,592

96 996

459,737

396,146 111,340

8,302 515,788

396,146 110,096

1,412 16,431 12,350

536,435

76,729.84 1,998.10

149.84 363.12

96.00 996.00

39,614.60 1,172.00

104.73

39,614.60 1,251.10

706.00 401.29 536.95

28,631 911 72

217 96

996

408 37 32

408 400 616 352 382

149,593 5,141

796 734

96 997

149,593 5,141

796

149,593 5,141

796 734 996

Note: %ret-percent of data retained Or-original data frequency by class De-frequency of derived class

7.3.2 Hydrography

Class Generalization In the hydrography superclass, the generalization activities rely on the

classification structure of 'rivers' and on their topological relationships with lakes. As mentioned in Chapter 6, river data are stored at the logical data structure level in the form of arc, node topology. The arcs form a one-to-many relationship with named rivers and as well with the Horton and Strahler classifications.

In strategy one for mapping hydrography, rivers can be generalized at the class level using either Horton or Strahler stream order classifications. The Horton class level generalization, maintains a drainage network more closely related to a conventionally derived product. This is as a result of the classification in which Horton's scheme gives a higher weighting to what

101

D.E. Richardson

appears to be the major river network. As discussed in Chapter 3 the stream segments intersecting at a bifurcation at the lowest angle to the main stream is given the higher weighting. When the necessity factor, and if specified, an attenuation factor, quantify how much of the drainage network should remain, the shortest stream orders of thelowest Horton value are eliminated. As a result, the major river bodies are maintained. For thematic mapping this process conforms to conventional cartographic requirements in which there is a preservation of rivers with proper names. An interesting study done by E.R. Mazur (1988), found a strong correlation with the Horton classification and river names.

When hydrography is generalized on the basis of class, a necessity factor is applied to rivers and another applied to lakes. In both cases, the user can specify the use of an attenuation factor to decrease the drainage density. (In the core module, the option of using an attenuation factor follows the quantification of reduction resulting from the original necessity factor).

In generalization at class level, lakes are selected on the basis of their area. The specification of river generalization on the basis of Horton or Strahler does not affect the results of lakes. The reverse however, is not true; i.e. the selection of lakes does effect the selection of rivers in the case where an upstream / downstream selection is made of two river segments bisected by an eliminated lake. As described earlier, in this case the lake is replaced by a line which allows connectivity to be maintained at the topological level as well as the representation level.

The Strahler classification renders a different distribution pattern in the hydrographie drainage as a result of the difference in classification. For example, the following table provides results of the generalization process using Horton and Strahler and indicates the difference in arc frequencies.

Table 7.6: Results of Horton and Strahler Class Generalization with .8 Attenuation

Horton Strahler

Stream Order 1 2 3 4

%Ret 24.24

100.00 100.00 100.00

Or 93 53 44 19

De 24 53 44 19

%Ret 36.11

100.00 100.00 100.00

Or 108 52 34 15

De 39 52 34 15

Total 67.00 209 140 67.00 209 140

The stream order ones retained using the Horton selection will be quite

102


different from those using the Strahler classification. Table 7.6 for example shows hydrography generalized on the basis of class using Strahler and Horton classifications with a .8 attenuation factor for a 1:12.5 million map. In comparing the two maps, stream tributaries in the 'Strahler' map are sometimes shorter than in the 'Horton' version. This is as a result of the classification of fingertip channels in the Strahler version being identified as a stream order one, whereas in Horton's version they may acquire a higher rating due to the nature of the classification.


The superclass generalization in hydrography selects and eliminates rivers according to either Strahler or Horton, but takes a mean of the necessity factor with lakes. This allows users to generalize at the superclass level rather than class level. Although in this database there are only two class levels for hydrography, this concept becomes more meaningful with a more detailed dataset that may contain class sets such as bogs, fens, marshes, and swamps. Here the use of superclass generalization reduces the number of iterations that would be required of the user but also provides the 'superclass' generalization as determined by a necessity factor.

With both the Horton and Strahler classifications, regardless of the superclass or class generalizations, the selection and elimination process using the necessity factors, attenuation factors, and the classifications allow batch processing that can greatly enhance scale reduction from large to small scale representations. The Horton classification, although more subjective in it's stream order assignments, renders results which can easily be used for thematic mapping whereas the Strahler classification results in a representation which is more rigorous and consistent in its delineation of hydrographie networks. In both classifications however, difficulties arise in the ordering of braided streams. As they receive stream order values of one less than the main channel they are retained throughout the generalization process up until that stream order number is reached in the elimination process. However, since the main channel has a stream order of 4 in both Horton and Strahler, and the braided streams consequently have orders of three, they are not eliminated and as a result give a representation unlike conventionally derived products. In these areas, the core program module could offer an option to the user for treatingbraided stream sections either according to the classification used or to generalize them on another variable. Figure 7.1 provides an illustration of braided stream ordering.

103

D.E. Richardson

Strahler Classification

Figure 7.1: Strahlers and Horton's Stream Orderings

104


7.3.3 Landcover


Landcover generalization is undertaken on the basis of selection and aggregation. The superclass generalization process functions on the entire database as a whole for landcover and can be generalized on the basis of any attribute common to all classes and subclasses. For example, the superclass generalization applied here, is first sorted according to the area of polygons and the necessity factor, then applied to select, in the case of a 1:7.5 million map, 85% of the database. This eliminates the smallest polygons in the database regardless of the class or subclass intension. An example of results are provided in Table 7.8.

Subclass Generalization

The subclasses for landcover data represent a specialization of the superclass and maintain distinct subclass intensions. Generalization at subclass level addresses individual subclass intensions and applies a necessity factor to each. Once applied, the subclass level is sorted by area and the percent of the database required for retention can be represented. Using this generalization process then eliminates the smallest polygons from each subclass which is not the same as eliminating the same percentage of polygons from the superclass. By eliminating the smallest from each subclass, if a particular subclass is comprised only of large polygons, these would not be eliminated for instance in the superclass generalization, as they would be above the elimination threshold, whereas they would be eliminated in the subclass strategy.

Table 7.7 Results of Attribute Value within Subclass Generalization on Landcover Data:

Numbers of Retained Polygons by Scale Scale 1:2m 1:7.5m 1:12.5m 1:30m

Attenuation

MOC built-up rangeland crop mixed forest coniferous deciduous Totals built-up agriculture forest Landcover

1.0

3 141 125 567 223 243

3 266

1033 1303

0.5

2 71 63

284 112 122

2 134 518 654

1.0

3 120 107 482 190 207

3 227 879

1109

0.5

2 60 54

241 95

104

2 114 440 556

1.0

3 99 88

397 157 171

3 187 725 915

0.5

2 50 44

199 79 86

2 94

364 460

1.0

2 78 69

312 123 134

2 147 569 718

0.5

1 39 35

156 62 67

1 74

285 360

105

D.E. Richardson

Aggregation

In the aggregation process, aggregated objects are composed of elementary objects. A forestry object may consist of as many as three di ff erent classes of elementary objects such as deciduous, coniferous, and mixed. Topologically, the elementary objects are stored as polygons with polygon topology, and on invoking an aggregation process, are topologically restructured to develop a new object with new topology and area values.

Table 7.8 Results of Superclass Generalization for Landcover Data:

Numbers of Retained Polygons by Scale

Scale 1:2.0m 1:7.5m 1:12.5m 1:30m

Attenuation 1.0 0.5 1.0 0.5 1.0 0.5 1.0 0.5

MOC Built-up rangeland crop mixed coniferous deciduous Totals built-up agriculture forest landcover

3 141 125 567 223 243

3 266

1033 1302

3 111 59

240 96

143

3 170 479 652

3 132 101 470 180 222

3 233 872

1108

3 108 54

200 75

115

3 162 390 555

3 120 77

372 142 199

3 197 713 913

3 103 46

154 63 88

3 149 305 457

3 113 64

276 108 154

3 177 538 718

3 90 37

116 50 64

3 127 230 360

In the Aggregation process the numbers of polygons are greatly reduced. Here, the subclasses are generalized to class levels along with spatial aggregations, new topology, and area and perimeter statistics.

Numbers of polygons following aggregation:

built-up 3 rangeland 0 cropland 0 mixed forest 0 coniferous 0 deciduous 0

totals built-up 3 agriculture 24 forest 35 landcover 62

106


7.4 Context Transformations in Relation to Data Analysis and Map Design

The combinations of the necessity factor, the attenuation factors, and the classification and aggregation hierarchies can provide a considerable number of options for decreasing the density of a map or representation derived from the database. The necessity factor provides a guide for representation of densities at particular scales, however, this can be manipulated with the attenuation factors to render continuous generalization. These options are addressed here according to user requirements of spatial and thematic data.

Äs requirements for generalization vary from application to application and from user to user, the philosophy applied in developing the context transformations was to provide as much flexibility to the user as possible. Thus, to make the approach attractive to users, the context transformation model affords flexibility within a controllable environment. With the addition of reduction factors, the results of the generalization are quantified.

The necessity factor is an important component of the context transformation strategies. In its basic formula it provides guidelines to reduce the density of objects appearing on a map according to the thematic requirement, the map object functionality and scale of representation. Once the necessity factor is calculated, the percentage of data that should remain on the map is selected according to attribute values.

Since flexibility was considered important as a means of accommodating a diversity in user requirements this basic formula of the necessity factor needed to be extended to comply with the underlying philosophy of the context transformations. If we examine then, how the flexibility has been achieved and what ways it may be applied, the overall strengths of the approach becomes more meaningful.

At the outset, and as outlined in Chapter 4, it was stated that the first step in generalization is to determine what is important and what is not. This step, conceptually relies on the perception of the user and the application for mapping. However, as user requirements could likely never be entirely quantified, two approaches are addressed in this research, these being user requirements for analysis and map design. These two aspects can of course be broken down and examined in many ways. Nevertheless, the flexibility built into the context transformation model conforms in a number of ways to the needs of these divergent user requirements.

Users often require data from a database for analytical purposes to be

107

D.E. Richardson

represented at different scales. In using a dataset for analysis, when representing it at a lower resolution, it is important that the generalization application is consistent and objective. Accordingly, when the necessity factor (NF) is applied as 'attribute value within class', for example, each class of data is reduced by the percentage specified by the necessity factor. Unlike superclass generalization and class generalization, theattribute value within class addresses each and every category of information that will appear at the derived scale. Each class is treated consistently according to the NF specification and the original survey results.

For the data extraction process based on attribute value within class, the superclass is ignored, as are the other classes in the superclass hierarchy. As the NF.k is applied independently to each class, the overall classification hierarchy does not actually influence the generalization results. Thus, the objective in this strategy is to provide a user with an object class and subsequent attribute value manipulation to extract data. Neither the superclass, nor the other object classes within it, affect the results of the object class undergoing generalization. As such, this kind of approach is more applicable for generalization in an analytical context as it provides a lower level of database access and manipulation.

Other aspects can be considered for user requirements in the context of map design or analytical processing. For example, in map design, in using the populated places dataset, class generalization is a powerful tool. Generally, in portraying populated places they are plotted in thematicmappingby their status as a city or town. In the class generalization, because the class intensions are first ordered, a selection is made of the most frequently portrayed categories of populated places. This is followed by an extraction process that can be invoked on any attribute value. The results show the major categories of places, and the subsequent selection is derived by some attribute, in this case population. On the other hand, the superclass generalization strategy lies in its application between the more analytical approach used in attribute value within class and the more cartographic approach used in class generalization. This is so because the superclass generalization ignores the class intensions, means a NF for the superclass and then extracts the relevant percentage according to an attribute value specified by the user. In this case the application would be well suited to deriving thematic maps, since a representative sample of a superclass attribute can be obtained. The attribute value domain must, however, be at the superclass level, whereas the preceding two strategies (i.e. class and attribute value within class) can be either at superclass or class level.

108


The hydrography data also provides examples of mapping within different contexts. In the case of the Horton classification, the main river channel is defined and is preserved throughout the generalization process while the secondary tributaries drop off. Thus the delineation of the hydrography, given the nature of the classification, correlates closely with named rivers. [SeeMazur, 1988]. As such, the use of the Horton classification for generalization purposes becomes more applicable for map design than analysis. Since the classification can be considered subjective in determining the main river channel, consistency in the database may be compromised and it may be problematic for automating the classification coding.

The Strahler classification provides an example of using the context transformations for analytical processing. In this case the classification is very systematic and is not subjective. As discussed earlier,the Strahler classification presents a natural topology and as it is rigorous in stream order assignments, conforms to the requirements of using an objective and consistent approach for data analysis.

In assessing the landcover data according to user requirements, here three strategies have also been applied. First there is generalization at the superclass level, then subclass level, and finally aggregation. The superclass generalization is an appropriate solution for removing small polygons for a smaller scale representation. The original landcover dataset for Canada, is shown in Figure 7.2and the superclass generalization for southern Saskatchewan is shown in Figures 7.3 and 7.4. In both of the latter two figures the reduction in small polygons is apparent. If the superclass generalization were applied to the map shown in Figure 7.2, the small pixel polygons apparent in the coverage would be eliminated leaving a suitable representation for thematic mapping purposes. If the aggregation strategy were applied, the map shown in Figure 7.2 would be reduced in landcover zones from 11 to 5 which would be a more highly generalized map of landcover data. This is illustrated by observing Figure 7.5, which shows the aggregation results. Finally, in Figure 7.6, the original digital data test area is shown.

109


I I I I 8

111


Figures 7.3 Superclass Generalization on Landcover Data at 1:7.5 million

Reduced 80% Effective scale 1:9.4 million

Figure 7.4 Superclass Generalization on Landcover Data at 1:30 million

Reduced 80% Effective scale: 1:37.5 million

INSET

Figure 7.5 Aggregation on Landcover Data at 1:12.5 million

Reduced 80% Effective scale 1:15.6 million

113


li I— T-

c I

1° I * 3 8

m f" \n y- co

S3 , ? S 9 1-3 a •o >ff P -S £, 3

o ü S ü te m

Iff

8

is

O 10 CD N. i - at T- o> T- ^- m

CO CM

m V) o> « t o (C

s $ a. jg-j 5 « p _• s d ?

13

2

CM

à?

co ^ •^ M

Si

2. 1 -g> I O


7.5 Summary

The assessment of the context transformations has been examined according to support provided through the data model, the classification and aggregation hierarchies, and the contexts within which the context transformation models can be applied. The diversity expected in user requirements of spatial data in a GIS environment and in the field of generalization as a whole was instrumental in designing the context transformation model to provide users with a flexible environment with which to work. Accordingly, the model can be manipulated to provide results oriented towards thematic mapping and design, or towards using data for analytical purposes. Thus the generalization can be responsive to the diversity anticipated in a user environment.

The attenuation factors allow users greater flexibility in generalizing the data. These factors have been implemented in a systems environment as a tool for decreasing density; however, amplification factors can also be easily applied to increase density should a user require this. Finally, the reduction factors provide users with a quantification of the results and as such allow users to determine if they should apply the attenuation factors or reselect their generalization options.

117


CHAPTER 8

Conclusions

8.1 General Conclusions

In Chapter 2, we saw that there have been three epochs of development in digital generalization. The first epoch has concentrated on graphic representation largely in the form of linear feature simplification while the second epoch addressed the quantification of changes resulting from the application of algorithms developed in the first epoch. The third epoch is now in the process of addressing a more comprehensive approach to generalization.

Throughout the first epoch of digital generalization, many solutions in linear feature simplification and smoothing were developed and implemented in a systems environment, followed with the second epoch in which extensive testing of the results of the first epoch were undertaken. The third epoch in generalization however has proven to be much more complex to implement and little has as yet been achieved (joäo and Rhind, 1990]. In general, the comprehensive approaches which are now being investigated work on the premise that a knowledge base is required, often based on rules derived from the conventional environment. Rules, for example in the Weibel and Brassel model address structure and process recognition. These types of rules would frequently be difficult to define and in turn their implementation may not be realized [Brassel and Weibel, 1988]. Generalization models that are primarily dependent on an existing rule set present limitations in that they are generally oriented to a specific user group. Even while concentrating on a specific group, many of the rules may be too static to meet a range of user needs.

The acquisition of rules for systems implementation has been slow to evolve. Many 'rules' are not documented and according to a survey undertaken at the Birkbeck College, Department of Geography, a review of eleven generalization prototypes indicated that only one was built using combinations of rules acquired from literature, experts, and surveys, [Richardson, 1988] whilst the majority of other prototypes were developed based on rules from textbooks. This may imply that few new rules are actually being sought and/or developed, which, may impede a comprehensive generalization

119

D. E. Richardson

model from being developed. One cannot help but wonder if given the dearth of rules for generaliza

tion, and the tremendous range of variations in generalization needs that accrue as a result of differences in user requirements, applications, and perceptions, that the notion of a prototype for generalization being initially exclusively rule-based is perhaps unrealistic.

Consequently, rather than adopting a purely rule-based approach, the development in this research concentrates on using steering parameters, acquired through literature, experts and surveys, to determine the amount of data that should appear at reduced scales of representation yet to be used in such a way so that as much flexibility as possible would be available. The use of steering parameters, which can be considered as a decision matrix, when used with classification and aggregation hierarchies, and adjusted, if necessary, with the application of attenuation factors (and amplification factors), provides a great deal of flexibility which can serve a wide variety of users. As such, the user group does not need to be pre-classified nor do the generalization processes need to be pre-linked to a particular user group.

Thus, the philosophy behind the design of the steering parameters is to provide a reliable environment within which users can decrease map density. However, with the anticipation that user's requirements will vary with scale, application, perception, and the like, the steering parameters are provided within a very flexible environment to accommodate the anticipated diversity in user needs. This is achieved in a variety of ways. First of all, the steering parameters that are provided by the necessity factor, once they have determined the amount of data that is to appear at the derived scale, can be satisfied by using a wide range of attribute domain information. For example if a hydrography dataset is selected for a lower resolution representation, the necessity factor criteria can be met using different stream order classifications and length statistics, or criteria such as river discharge, or any other variable maintained in a hydrography superclass. Thus the attribute domain environment can provide considerable differences in object distribution patterns.

Flexibility is further enhanced through the provision of the classification and aggregation hierarchies. When the necessity factor is combined with the classification and aggregation hierarchies and the attenuation factors, a map consisting of landcover, hydrography, and populated places can be represented in a minimum of 90! different ways for one scale of representation, since (f) is a single value decimal used for experimentation. This extensive range in representation possibilities results from combining

120


the attenuation factors which can be assigned to the necessity (actor, and in this case range from 0 < f < 1, and the strategies for generalization. Hence, the three strategies for landcover such as superclass, subclass and aggregation generalization, results in 3 * 9 possibilities for representation. The same holds true for the three strategies of populated places, i.e. the subclass, class, and superclass processes result in 3 * 9, while hydrography which has four strategies provides 4 * 9 representations. If other attributes within the attribute domain are available, the representation possibilities are increased even more.

The architecture of the context transformations allows objects to be selected and represented at any smaller scale with the required object reclassifications, and aggregations such as in the landcover data, and in object replacements such as in the hydrography data. The combination of the necessity factor with attenuation factors and their application within classification and aggregation hierarchies provides considerable flexibility in the types of object distributions appearing on the resulting maps. The processes allow object densities to be reduced to levels that can remain constant at a derived scale yet within the object population alter the distribution according to a user's specifications. As well as altering the distribution patterns, the user can easily automatically decrease density by requesting an attenuation factor, and if an amplification factor were included, densities could automatically be increased according to a user's needs.

The inclusion of reduction factors allows users to assess the generalization results. Reduction figures which quantify in a straightforward manner, the amount of reduction to the database are provided at the entity level, object, subclass, class, and superclass levels. For instance, in the case of hydrography, reductions in the numbers of stream orders of the particular stream order values are given, the reductions in the numbers of rivers are given, and finally total reductionsinthehydrography database areprovided for the derived scale of representation. With the combination of data structures, and the generalization processes and reduction factor calculations that are invoked on them, graphic reductions such as coordinate reductions resulting say from simplification algorithms would be simple to derive at the entity, object, class, and superclass levels. Nor would it be problematic to extend the reduction factors to include displacement values for the same levels.

The reduction factors, in their present format allow a user to assess the results of the necessity factor calculations. The core program module for the necessity factor offers users options for the type of generalization they wish

121

D.E. Richardson

to undertake, such as subclass, class, superclass or aggregations. Once an approach has been selected the necessity factor automatically reduces the density according to the criteria specified by the user. Once this has been invoked, the system calculates the amount of reduction and provides it to the user in the reduction factor format, which as mentioned above is calculated at the entity, object, class, and superclass levels. If the reductions are insufficient, the user is offered the use of an attenuation factor to further increase the reductions. With testing of the prototype environment by a diverse user group, conceivably, sets of reduction factors would evolve for particular types of user's needs. This process could be modified to incorporate a learning loop that would correlate user needs with the type of generalization specified and the desired level of reduction. Thus, fine tuning of the necessity factors could be achieved. Additionally, in this way the scarcity of rules for generalization could be somewhat compensated.

Discussions in the literature on generalization in the context of data structures, are remarkably absent. This may be as a result of the amount of time involved in classifying, structuring, and coding information for input into a systems environment. Special care needs tobe taken in creating logical and pragmatic elementary, object, subclass, class, and superclass levels of spatial and thematic data definitions along with the required topological structures. The generalization processes developed in the context transformations are dependent on sound data definitions and structures and certainly require an initial investment in designing and building the database. However, although initially time consuming there is an enormous gain in long term benefits. In large mapping organizations and remote sensing agencies for instance, database collection and maintenance is very resource intensive particularly in cases where essentially redundant databases are being collected and stored at different scales. This frequently occurs due to the absence of effective generalization processes. Although the context transformations in no way solve all problems in generalization, with sound data definitions and database structuring rapid successes can be achieved for generating lower resolution representations. Ultimately the processes explored in this research should be linked with algorithms, which were developed during the first epoch in generalization research, and concentrate on graphical representation, to provide a reasonably comprehensive approach to generalization. The context transformations linked with existing algorithms on simplification, displacement, name placement and symboli-zation would then, in addition to addressing map content, address representation issues as well.

122


8.2 Specific Conclusions

The processes used for decreasing the density of populated places provides an effective tool that allows not only decrease in density according to the superclass, class, or attribute domain, but also provides a means of controlling the density reduction by means of the attenuation factors. These options for generalization provide different distributions of data according to a user's needs. When, for example selection is specified using the attribute domain, the technique maybe more applicable in an analytical context than class selection, as it provides a lower level of database access. At the dass level, however, this approach becomes quite useful for thematic mapping purposes, as it specifies a class set as the priority in representation and then determines which objects in those particular classes will be shown. This process is particularly useful when the class selection performs a support role to another dataset. The superclass process renders a different distribution again which can be particularly relevant in thematic mapping when the superclass data provides more contextual focus for the user than in the dass process.

The landcover data provides the same types of facilities. Hence, the reduction in the densities of polygons is achieved either through superclass, class or subclass processing, as well as aggregation. In using the dassification hierarchy, different processes are invoked on the landcover superclass, dass or subdass data which renders different distributions. When a superdass process is applied the necessity factor % of the smallest polygons within the entire dataset are eliminated and the remaining data are topologically restructured, whereas if a dass process is invoked the necessity factor % of the smallest polygons from each dass are removed. If a class only has relatively large polygons within, it the necessity factor % spedfied will result in some of them being eliminated. The aggregations involve a different process that results in a higher level of classification being represented. Thus coniferous, dedduous and mixed forest become shown as a new dass described as forest. Thus, each process renders different distributions, and as discussed in Chapter 7 can be used for any number of applications in thematic mapping and/or analytical manipulation. Again the attenuation fartors provide a means of control on the extent of the generalization.

Finally, in the hydrography dataset, generalization has been implemented and achieved with analytical orientations in mind as well as map

123

D. E. Richardson

design. Generalization on the basis of the Strahler classification offers an environment that allows generalization in a consistent manner and has useful applications for analysis. The Horton classification results in generalization that is more applicable for thematic mapping purposes. Although the original data classifications are contextually dependent, the process for context transformations using the classification and aggregation hierarchies along with the data structures allow automaticgeneralization, and maintain connectivity and contiguity, no matter how simplified the representation becomes. The context transformations developed and implemented in this research provide a powerful tool to decrease data density for any smaller scale of representation. It provides a great deal of flexibility for how a user may wish to achieve this, and allows users to assess their results both quantitatively and of course visually. The various approaches used in the context transformation model are pragmatic and consistent and can achieve results suitable for analysis and / or map design, given the original scale used for data input. These processes immensely reduce the need to store databases at different scales of representation.

Context transformations of this nature are based on classification and aggregation hierarchies that include entity, object, subclass, class, and superclass levels of data. An important part of the process is the topological relationships maintained for spatial data, particularly for hydrographie data. Implementing these processes in many systems environments is possible providing those environments can maintain hierarchical thematic structures and topology. For the thematic data, relational databases provide a powerful tool and thus environments such as Oracle and Ingress are likely candidates. Although this prototype has been implemented in a relational environment using Oracle and for graphics, Arc/Info, it should be possible to implement it in Intergraph and Geovision for example along with a relational database environment for attribute domains.

The development of a prototype implies that experimentation is necessary before a final product is available. For the processes described in this research, experimentation by a user group may indicate that the flexibility provided in the models is more than immediately necessary. Since three superclasses of data can have a possibility of a minimum of 90! different generalization solutions, there would undoubtedly be some solutions that users would require less than others. Experimentation may also resolve some outstanding questions. For example, the availability of context transformation processes in a systems environment leaves open questions such as 'at what level does a user query the database?' Is it at the derived level of

124


representation, or does a user wish to move up in detail and query a more detailed level? These aspects, and others, however, can only be addressed with experimentation.

8.3 Future Research

Ideally, if a comprehensive approach to generalization is to be realized, the development of a context transformation model provides a significant contribution to resolving representation density issues. As pointed out by Brassel and Weibel, one must first determine what is to be mapped. With context transformations, users can determine, given a particular application, what should be mapped and in what kind of distributions. As the model provides considerable flexibility, fine tuning of the options could be developed using a feed back or learning loop. Once the user has specified the type of generalization required and the selection, if any, of an attenuation factor, this information could be processed into a set of meta rules that would provide future users with recommendations on which parameters to use for particular applications. The development of meta rules of this nature should be considered a mapping advisor recommending which generalization processes are better suited to different types of applications.

Much of the work presently being undertaken in generalization, attempts to solve problems by addressing the representation level. Issues such as object conflict resolution is a good example. More emphasis however, should be placed on manipulation and decrease of database content in logical ways, thereby solving some of the object conflict problems. To this end, issues such as the data model environment and topological data structures should be exploited. Additionally nearly all existing models in generalization examine processes undertaken in a vector based environment. Raster processing however, can be extremely useful in identifying object proximities which again is useful for resolving object conflicts for say displacement purposes or for object amalgamations such as the amalgamation of three small closely located lakes to one lake. With the availability of easy raster/vector conversions and vice-versa, generalization developers should explore and develop methodologies suitable to the two environments so that both GIS models could be utilized in a fully integrated and complimentary format.

Finally, the generalization of information implies that only a subset of the original data available from a systems environment may be represented. This subset may take the form of a geometric subset which would result from

125

D.E. Richardson

the application of simplification algorithms, or a thematic and geometric subset which would result from selections and eliminations, reclassifications, and simplifications. Other effects on the database include spatial alterations such as from displacement. Each of the generalization manipulations should have an associated certainty factor which could serve as a measure for the amount of generalization that is tobe applied. This is the concept used for the reduction factors, which although not a certainty factor, nevertheless provide a functional statement about the amount of data remaining at the representation level. The use of certainty factors with generalization would not only be a very useful approach in this context but in other types of manipulations as well. The examination of many of the algorithms, developed during the second epoch of generalization which were used for testing the results of the first epoch of generalization should nowbe examined in this context.

126

APPENDICES

127


Appendix 3.1

Data Model Definitions

A number of the terms used in Section 3.4.1 - Data Model Definitions, are based in part on works by S. Atre; R.A. Edmunds; S.C. Guptill; H. Moellering; M. Molenaar; S.C. Shapiro; D.F. Stubbs and N.W. Weber; D. Tsichritzes; and J.P. Thompson.

There does not appear to be a standard set of terminology used in academia, government, or industry. For example IBM's Information Management System (IMS) describes data as 'fields', which are combined into 'segments', which are combined into 'data bases'. TheData Base Task Group (DBTG) of the Conference on Data Systems Languages (CODASYL) describes data as data items, which are combined into 'data aggregates' which are combined into 'records', the relationship between which are expressed as 'sets'.

In addition to inconsistencies in terminology directly related to a computer environment, in GIS the problem is compounded by the absence of any distinction being made in the literature between 'real world' phenomenon and its digital translation. For example the word feature is used both for a real world phenomena and for its digital expression, as well as any number of parts that make up its digital expression. The same applies to terms such as object and entity.

In using terminology in the GIS field a clear distinction should be made between the real phenomenon and its digital counter part. In this context the computer terminology relevant to GIS should be clarified and standardized in a compendium worksuch as an Encyclopedia for Geographiclnformation Systems or Geomatics. Excellent examples of similar types of work in other fields are the Prentice Hall Encyclopedia of Information Technology edited by A. Edmunds, and the Encyclopedia of Artificial Intelligence edited by CA. Shapiro.

129

Automated Spatial and Thematic Generalization Using a Context TransformationModel

Appendix 4.1

Overview of Survey Undertaken to Develop the Necessity Factor

To determine the requirements of map objects at various scales for some 44 different subject realms a survey was undertaken that consisted of three components: (1) a series of interviews with staff from the Canada Centre for Mapping was done to assess needs for generalization of maps and processing requirements for production and external user needs: (2) a review of 110 maps was done to determine the requirements of map objects for each of the 44 subject realms at four different scales, and; (3) using the same 110 maps, an assessment was done for the functionality of map objects at different scales.

The interviews supported the need for scale flexible mapping and, it was recommended that data capture be at 1:2 million and from that scale to automatically derive smaller scale representations of thematic data. The interviews also supported the fact that certain map objects provide a geographic context for thematic map objects. For example, map objects such as rivers and lakes support and give more meaning to thematic objects like climatic zones or drainage basins.

The map survey undertaken to determine map object requirements and functionality was achieved by a review of thematic maps ranging in scale from 1:1 million to 1:30 million. Matrices for each scale were used to assess the requirements of objects appearing on the map in relation to 44 different subject realms, and another four matrices were used to assess the functionality of objects to appear on maps at the four different scales. These matrices were used by staff at the Canada Centre for Mapping and, for information, an example is provided below.

Map Object Class Subject Realm Requirement -1:2 million

Ï 2 3 4 5 6 City • * • • * Town + + • • * • Village - * * - * Unincorporated Place - - + + - + Non-Unincorp. - + Indian Reserve - + * * + Rivers • . . . . . Lakes • . . . . .

Key 1 = Geophysics • = Essential 2 = Geology * = Desirable 3 = Geomorphology + = Questionable 4 = Climatology - = Unnecessary 5 = Hydrology 6 = Landcover

131

D.E. Richardson

Map Object Class Map Object Functionality -1:2 million

City Town Village Unincorporated Place Non-Unincorp. Indian Reserve Rivers Lakes

Key

• • * * + * * +

1 = 2 = 3 = 4 = 5 =

• • • * + • • •

: Orientation : Location : Enumeration : Measurement = Description

Each of the map objects were rated according to their importance for each of the subject realms, at four different scales. The ratings, defined as essential, desirable, questionable, and unnecessary were assigned values of 100%, 75%, 25% and 0% respectively. In the case of Map Object Functionality, the ratings were assigned as follows

Essential Desirable Questionable Unnecessary

1 80 65 15 0

2 90 70 20

0

3 90 70 20

0

4 90 70 20

0

5 100 75 25

0

The percentage values shown for the 'requirements' can be used to calculate the degree of necessity for the map objects to appear on the map. In the case of the percentage values shown for the 'functionality', a mean is taken of the five categories of MOF per object for each of the four scales, i.e. 1:2 million, 1:7.5 million, 1:12.5 million, and 1:30 million. Subsequently, the two values are then averaged to arrive at a 'necessity factor'. The mathematic expression for the necessity factor is provided in Chapter 4. Further details are available in the literature, see for example, Richardson, (1988,1989), and Richardson and Muller, (1990).

132


Appendix 4.2

Definitions of the Five Functionalities of the MOF

The Map Object Functionality is used to determine why and for what purpose an element appears on a map at a given scale. It provides a functional reason for an object to appear or not and in so doing distinguishes the process from what could otherwise be considered more subjective.

Each of the map objects have been rated according to the function they allow a map reader to perform in using the map. In this case, the four ratings used, essential, desirable, questionable, and unnecessary imply the degree of necessity for having the particular object on the map so that users may do different types of activities. Five types of activity or functions have been used and are defined below. It should be noted that these definitions pertain to the four scales used in this study, which are considered small scales. The definitions would change with larger scales such as 1:25,000 or 1:5,000.

OrientatioruThis means telling where one is in an overall sense. Certain map objects such as populated places and lakes are especially useful as an aid to orientation particularly when the map is not portrayed in the conventional way with north at the top of the sheet or screen.

Location:This allows the user to indicate more precisely where one is and give location relative to other features and places. For example, if a road transportation network was plotted and populated places such as cities and towns were not, it would be difficult to determine the location relative to a city or town.

Enumeration:Users often require information of the total number of lakes within certain drainage basin areas, or the number of towns or indian reserves within a particular region. This is an established requirement from the general public, as reported by members of the Canada Centre for Mapping.

Measurement: Ratings here are used based on whichever type of measu re is appropriate with the particular map object class, (e.g., lengths for linear features, and areas, circumference, or volume for polygons).

Description:This function provides the user the possibility of understanding and describing phenomena appearing on the map by

133

D.E. Richardson

means of reference to map objects. For example, the delineation of isolines showing temperature can be understood and described by their relationship to glaciers or large bodies of water. Generally speaking, glaciers, especially during warmer months, cause the temperature to drop and therefore the delineation of a temperature isoline to shift to the north, whereas with a large body of water, particularly during the winter months the isoline shifts to the south. (Glaciers, although evaluated in the survey were not used in this research as they were not present in the study test area of southern Saskatchewan).

The ratings provided in Appendix4.1 for the l:2million map and for the extensive list, (see Richardson, 1988), were established based on a best case (i.e. a situation where the particular base map object is needed on a map). Some objects only occur in a few areas on the maps reviewed, such as Indian Reserves. Consequently, rating scores are based on areas where these objects do, in fact, occur.

134


Appendix 4.3

Stream Order Topology

Horton's Stream Order Classification

1. Rules of Ordering: In the system devised by Horton ...unbranched fingertip tributaries are always designated as of order 1, tributaries or streams of the 2nd order receive branches or tributaries of the 1st order, but these only; a 3rd order stream must receive one or more tributaries of the 2nd order but may also receive 1st order tributaries. A 4th order stream receives branches of the 3rd and usually also of lower orders, and so on. Using this system the order of the main channel is the highest.

To determine which is the parent and which the tributary stream upstream from the last bifurcation, the following rules may be used: (1) starting below the junction, extend the parent stream upstream from the bifurcation in the same direction. The stream joining the parent stream at greatest angle is of the lower order. Exceptions may occur where geologic controls have affected the stream courses; (2) If both streams are at about the same angle to the parent stream at the junction, the shorter is usually taken as of the lower order. (Horton 1945, p. 281).

Strahler's Stream Order Classification

1. Rules of Ordering: The Strahler method of ordering is defined as follows: ...each fingertip channel is designated as a segment of the first order. At the junction of any two first-order segments, a channel of the second order is produced and extends down to the point where it joins another second-order channel, whereupon a segment of the third order results, and so forth. However, should a segment of the first order join a second or third-order segment, no increase in order occurs at that point in junction. The trunk stream of any watershed bears the highest order number of the entire system. (Strahler, 1964, p. 546).

Evaluation of Methods

If the only consideration in choosing the order method were the technical criteria, the most suitable method of these two classifications

135

D.E. Richardson

would be Strahler's. Rules to determine hierarchy are straightforward and hence the classification can be easily accomplished by a series of simple steps, once the river network is digitized.

The only element adding to its complexity is the implication of two rules which say that: 1) when two streams of the same order join, then the resultant link downstream becomes one order higher; and 2) when two streams of different order join, then the order of the stream downstream is equal to that of the tributary of the higher order. Nevertheless, Strahler's classification can be easily automated. Once the coordinate data input has been completed, the subsequent stream order coding can be resolved algorithmically.

On the other hand, cartographic requirements are to an extent more strongly supported by Horton's classification such that it conforms to the cartographic requirements of preserving proper name and aggregating streams into distinct river entities with one proper name.

The technical considerations, in which Horton's method falls short of Strahler's does not pose significant challenges in creating or manipulating the digital data base. Strahler's presents a more simple method than Horton, but the level of complexity among the two methods is comparable. In the case of the Strahler method the data base can be processed using algorithms to generate the stream order. In Horton's approach the stream ordering must be defined a-priori of in put and entered from the key board.

136


BIBLIOGRAPHY

Atre, S. 1980. Data Base Structured Techniquesfor Design, Performance, and Management. John Wiley & Sons, Inc., New York.

Brachman, R.J. 1983. What IS-A is and isn't: An Analysis of Taxonomie Links in Semantic Networks. IEEE Comput, 16(10), pp 30-36, October 1983.

Brassel, K. E. and R. Weibel. 1988. A Review and Conceptual Framework of Automated Map Generalization. International JournalofGeographical Injbrmation Systems, 2(3), pp 229-244.

Busacker, R.G. and T.L. Saaty. 1965. Finite Graphs and Networks. McGraw-Hill, New York.

Buttenfield, B.P. 1991. A Rule for Describing Line Feature Geometry, in Map Generalization: Making Rules for Knowledge Representation. B. Buttenfield and R. McMaster, editors, Longman House Essex, United Kingdom.

Canada, Energy, Mines and Resources Canada. 1991. Internal Document. NAIS Digital Requirements, Ottawa, Canada.

Curry, L. 196Z Climatic Change as a Random Series. Annals, Association of American Geographers, 52(1), pp 21-31.

Deveau, T.J. 1985. Reducing the Number of Points in a Plane Curve Representation. Proceedings, Auto Carto 7, Washington, D.C. March 11-14, pp 153-160.

Douglas, D., and T. Peucker. 1973. Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or its Caricature. Canadian Cartographer, 10(2) pp 112-122.

Doyle, J. 1983. Methodological Simplicity in Expert System Construction: The Case of Judgements and Reasoned Assumptions. AlMag., 4(2), pp 39-43 (Summer, 1983).

137

D.E. Richardson

Dutton, G. 1981. Fractal Enhancement of Cartographic Line Detail. The American Cartographer 8(1) pp 23-40

Edmunds, R. A. 1987. The Prentice-Hall Encyclopedia of Information Technology. Prentice-Hall, Inc, Englewood Cliffs, N.J.

Environmental Systems Research Institute, Inc 1990. Users Guides, Redlands, CA., U.S.A.

Gane, C. and T. Sarson. 1982. Structured Systems Analysis: Tools & Techniques. McDonnel Douglas Corporation, Woking, Surrey, England.

Greenberg, M. J. 1973. Euclidean and Non-Euclidean Geometries. W.H. Freeman and Company, San Francisco.

Guptill, S. C, and R. G. Fegeas, and M. A. Domaratz. 1988. Designing an Enhanced Digi tal UneGraph.Technical Papers, ACSM-ASPRS Annual Convention, St. Louis, Missouri, pp 252-261.

Guptill, S. C. 1990. Multiple Representations of Geographic Entities Through Space and lime. Proceedingsofthe4thInternationalSymposium on Spatial Data Handling, Vol. 2, Zurich, Switzerland, pp 859-868.

Guralnik,D.B. (ed). 1970. Webster'sNew World Dictionary, Nelson, Foster & Scott Ltd., Toronto, Canada.

Herbert, G., and E.M. Joäo. 1991. Automating Map Design and Generalisation: A Review of Systems and Prospects for Future Progress in the 1990's. SERRL Working Report Number 17, South East Regional Research Laboratory, Birkbeck College, London, United Kingdom.

Horton, R.E. 1945. Erosional Development of Streams and their Drainage Basins. Bulletin of Geological Society of America, Vol. 56, pp 275-370.

Jacqz, C. 1991. Developing a Cost Effective Hydrological Network Coverage. Proceedings of the Eleventh Annual ESRI User Conference, Palm Springs, California, pp 211-220.

138


Jenks, G. F. 1981. Lines, Computers and Human Frailties. Annals of the Assoication of American Geographers 71(1) pp 1-10.

Joäo, E., and G. Herbert and D. Rhind. 1990. The Measurement and Control of Generalisation Effects. SERRL Working Report Number 19. South East Regional Research Laboratory, Birkbeck College, London, United Kingdom.

Kemp, Z. 1990. An Object-Oriented Model for Spatial Data. Proceedings of the 4th International Symposium on Spatial Data Handling, Vol. 2, Zurich, Switzerland, pp 659-668.

Kilpeläinan, T. 1992. Multiple Representations and Knowledge-Based Generalization of Topographical Data. Proceedings, ISPRS Conference XVII, August 2-14,1992, Washington, D.C.

Leberl, F.L., Olson, D., and Lichtner, W. 1985. ASTRA - A System for Automated Scale Transition. Technical Papers 51st Annual Meeting ASP, Washington, D.C, Vol. 1, pp 1-13.

Maling, D.H. 1989. Measurements from Maps, Principles and Methods of Cartometry. Pergamon Press, Oxford, England.

Mandelbrot, B.B. 1977. Fractals: Form, Chance, and Dimension. W.H. Freeman and Co. San Francisco.

Mandelbrot, B.B. 1982. TheFractalGeometryofNature. W.H. Freeman and Co. San Francisco.

Mazur, E.R. 1988. Objective Generalization of River Networks in Small Scale Reference Maps: Using Horton 's Ordering Scheme and Töpfer's Radical Law. M.A. Thesis. Department of Geography, Queen's University, Kingston, Canada.

McMaster, R.B. 1986. A Statistical Analysis of Mathematical Measures for Linear Simplification. The American Cartographer 13(2) pp 103-16.

McMaster, R.B. 1987. Automated Line Generalization. Cartographica 24(2) pp 74-111.

139

D.E. Richardson

McMaster, R.B. 1991. Conceptual Frameworks for Geographical Knowledge, in Map Generalization: Making Rules for Knowledge Representation. B. Buttenfield and R. McMaster, editors, Longman House Essex, United Kingdom.

Meyer, U. 1986. Software Developments for Computer Assisted Generalisation. Proceedings Auto Carta London. Vol. 2, pp 247-256.

Meyer, U. 1987. Computer Assisted Generalization of Buildings for Digital Landscape Models by Classification Methods. Proceedings ICA Conference, Morelia, Mexico, pp 223-232.

Molenaar, M. 1989. Single Valued Vector Maps; A concept in Geo-Information Systems. Geo-lnformationssysteme, 2(1) pp 1&-26.

Molenaar, M. 1990. A Formal Data Structure for Three Dimensional Vector Maps. Proceedings of the 4th International Symposium on Spatial Data. Vol. 2, Zurich, Switzerland, pp 830-843.

Molenaar, M. 1991. Object Hierarchies and Uncertainty in GIS or Why is Standardisation so Difficult, in Kadaster in Perspectief. J.H.M. Kockelkoren et al, editors. Kadaster, Apeldoorn, The Netherlands.

Molenaar, M., and L.L.F. Janssen, 1992. Integrated Processing of Remotely Sensed and Geographic Data for Land Inventory Purposes. Proceedings, ISPRS Conference XVII, August 2-14,1992, Washington, D.C.

Morrison, J.L. 1974. A Theoretical Framework for Cartographic Generalization with Emphasis on the Process of Symbolization. International Yearbook of Cartography. Vol. 14 pp 115-27.

Muehrcke, P. 1978. Map Use, Reading Analysis, and Interpretation. JP Publications, Madison, Wisconsin, U.S.A.

Muller, J.-C. 1987. Fractal and Automated Line Generalization. The Cartographic Journal. Vol. 24. pp 27-34.

Muller, J.-C. 1990. Rule Based Generalization: Potentials and Impediments. DraftPaper, International Institute for Aerospace Survey and Earth Sciences, Enschede, The Netherlands.

140


Nickerson, B.G., and H. Freeman. 1986. Development of a Rule-Based System for Automatic Map Generalization. Proceedings, Second International Symposium on Spatial Data Handling. Seattle, Washington, U.S.A.

Nyerges, T. L. 1991. Representing Geographical Meaning; in Map Generdization:MakmgRulesforKnowledgeRepresentation.B3uiten(ie\d and R. McMaster, editors, Longman House Essex, United Kingdom.

Opheim, H. 1981. Smoothing a Digitized Curve by Data Reduction Methods, in J.L. Encarnacao ed. Eurographics '81, North-Holland Publishing Company, Amsterdam, pp 127-135.

Opheim, H1982. Fast data Reduction of aDigitized Curve. Geo-Processing, Vol 2: pp 33-40.

Reumann K, and A.K.P. Witkam. 1974. Optimizing Curve Segmentation in Computer Graphics. Proceedings, International Computing Symposium, North-Holland Publishing Company, Amsterdam.

Richardson, D. 1988. Rule Based Generalization f or Base Map Production. MSc thesis. International Institute for Aerospace Survey and Earth Sciences, Enschede, the Netherlands.

Richardson, D. 1989. Rule Based Generalization for Base Map Production. Proceedings National Conferenceon GIS; Challengefor thel990 's. Ottawa, Canada, pp 718-739.

Richardson, D. 1991. Conceptual Model for Generalization. 15th Conference of the International Cartographic Association. Bournemouth, England, October 1991.

Richardson, D. and J.-C. Muller. 1991. Towards Rule-Based Selection of Spatial Objects for Small Scale Map Generalization; in Map Generalization: MakingRulesfor KnowledgeRepresentation. B. Buttenfield and R. McMaster, editors, Longman House Essex, United Kingdom.

Robinson, A.H. andR.D. Sale. 1969. Elements of Cartography. 3rd ed., John Wiley, New York.

141

D.E. Richardson

Robinson, V. and A. Frank. 1985. About Different Kinds of Uncertainty in Collections of Spatial Data. Proceedings, Auto Carto 7, Washington, D.C

Robinson, V. 1987. Issues in the Use of Expert Systems to Manage Uncertainty in Geographic Information Systems. Proceedings, First International Geographic Information Systems Symposium: The Research Agenda, November 15-18, Crystal City, Virginia.

Schylberg, L. 1992 Rule Based Area Generalization of Digi tal Topographic Map Data. TekniskaSkrifter, Professàwio/Popers. LMV-Rapport, 1992:5, Gävle, Sweden.

Shapiro, S. C. 1987. Editor. Encyclopedia of Artificial Intelligence. John Wiley & Sons, Inc., New York.

Shea, K.S., and McMaster, R.B. 1989. Cartographic Generalization in a Digital Environment: When and How to Generalize. Proceedings AUTO-CARTO 9. Ninth International Symposium on Computer-Assisted Cartography, Baltimore, Maryland.

Smith, J.M. and D.GP.Smith. 1989. Database Abstractions: Aggregation and Generalization, in Readings in Artificial Intelligence and Databases. J. MylopoulosandM. Brodie, editors, Morgan Kaufmann Publishers, Inc. San Mateo, California.

Strahler, A.N. 1964. Quantitative Geomorphology of Drainage Basins and Channel Networks. Handbook of Applied Hydrology, Ven Te Chow, editor.

Stubbs, D. F., and N. W. Webre. 1985. Data Structures with Abstract Data Types and Pascal. Brooks/Cole Publishing Company, Monterey California.

Thompson, P. J. 1989. Data With Semantics: Data Models and Data Management. Van Nostrand Reinhold, New York.

Tobler, W.R. 1966. Numerical Map Generalization. Michigan Inter-University Community of Mathematical Geographers Discussion Paper No. 8, Ann Arbor, Michigan.

142


Tsichritzis, D. C , and F. H. Lochovsky. 1982. Data Models. Prentice Hall, Inc., Englewood Cliffs, New Jersey.

Un win, D. 1981. Introductory Spatial Analysis, Methuen & Co., New York.

Weibel, R. 1991. Specifications for a Platform to Support Research in Map Generalization. 5th Conference of the International Cartographic Association. Bournemouth, England, October 1991.

White, E.R. 1985. Assessment of line generalization algorithms using characteristic points. The American Cartographer, 12(1) pp 17-28.

Wilson, RJ. 1972 Introduction to Graph Theory. Oliver and Boyd, Edinburgh Great Britain.

van Oosterom. 1990. Reactive Data Structures f or Geographic Information Systems. Doctoral Dissertation, Department of Computer Science of Leiden University, Leiden, The Netherlands.

Varizella, L. 1988. Computer Assisted Map Generalization in Alberta. Eurocarto 7, September 1988, Enschede, the Netherlands.

Zycor, Inc. 1984. Manual and Automated Feature Displacement. Report for the U.S. Army Engineer Topographic Laboratories, Fort Belvoir, Virginia, unclassified material, two volumes, 204 p.

143

D.E. Richardson

144


SAMENVATTING

Het onderzoek op het gebied van de digitale kaartgeneralisatie heeft gedurende de laatste dertig jaar drie verschillende periodes gekend. In de eerste periode werden algoritmen voor de grafische weergave van data ontwikkeld, deze betroffen meestal de simplificatie van lijnobjecten. In mindere mate werden ook algoritmen voor het verplaatsen, combineren en symbolisch weergeven van objecten onderzocht. In de tweede periode werd vooral aandacht besteed aan de efficiëntie en effectiviteit van algoritmen, daarbij werden de algortimen van de eerste periode getoetst op de reductie van het aantal coördinaten, de verandering van hoekigheid van objecten, de lengte reducties etc. de ze bewerkstelligden. In de tweede helft van de jaren tachtig ving de derde periode aan, waarin meer omvattende benaderingen van kaartgeneralisatie aan de orde kwamen. In deze periode zijn een aantal ' modellen ontwikkeld, zoals die van Nickerson en Freemen, van McMaster en Shea en van Brassel en Weibel. Tot nu toe zijn deze modellen nog nauwelijks gerealiseerd in informatie systemen. Misschien komt dat doordat er nog geen goede regels gedefinieerd zijn voor kaart generalisatie, maar het kan ookkomen door het feit dat het hier om een zeer complexproces gaat. In de literatuur wordt de aard van deze complexiteit goed beschreven, toch ziet men dat de meeste onderzoekers slechts modellen ontwikkelen voor het oplossen van deze complexe problemen op het niveau van de grafische weergave. In deze dissertatie wordt van deze lijn afgeweken, hier wordt het generalisatie probleem als een database probleem behandeld en worden technieken ontwikkeld die zich meer op kaartinhoud richten dan op vorm.

Het hier ontwikkelde generalisatie model geeft een logische methode om te beslissen welke objecten wanneer op een kaart moeten worden weergegeven. De procedures om kaartinhouden vast te stellen moeten zo flexibel mogelijk hanteerbaar zijn, om een zo groot mogelijke variëteit aan gebruikers te kunnen bedienen. Tegelijkertijd moet de methode de mogelijkheid bieden om het generalisatie proces te kunnen beheersen en evalueren. Daartoe moet het proces tot elke gewenste mate van gegevens reductie kunnen leiden, terwijl deze reductie op ieder gewenst gegevens niveau kan worden gemeten.

De methodiek om deze doelstellingen te bereiken is drievoudig. Ten eerste, in de terminologie die gebruikt wordt voor data modellering, moeten we het externe niveau onderzoeken om de gebruikers wensen te kunnen doorzien. Daarna wordt op conceptuele niveau een generalisatie model

145

D. E. Richardson

ontwikkeld dat voldoet aan de wensen op het externe niveau, hierbij moet aandacht besteed worden aan GIS modellering. Ten slotte moet op het interne niveau het conceptuele model in een logisch model gerealiseerd worden.

Het onderzoek op het externe niveau omvat een onderzoek naar gebruikers wensen, daarbij geleden een aantal overwegingen en bevindingen. Een belangrijke bevinding is dat gebruikers groepen sterk variëren en dat daarom gegevens op vele verschillende manieren geïnterpreteerd worden. Een andere bevinding is dat gebruikers een hoge mate van flexibiliteit verlangen m.b.t. de interactie met het systeem, dit om aan hun diverse wensen te voldoen.

Het generalisatie en het GIS model op het conceptuele niveau moeten voldoen aan de eisen van het externe niveau, ze moeten daarom flexibiliteit bieden t.a.v. het gebruik van de ruimtelijke en de thematische gegevens voor vele verschillende doeleinden.

Het conceptuele GIS model wordt eerst behandeld. Dit model moet vertaalbaar zijn in logische datastructuren, die meervoudig gebruik en interpretatie van de data toelaten. Dit model moet verschillende niveaus van data abstractie toelaten in overeenstemming met verschillende generalisatie eisen zoals m.b.t. toepassingsveld en schaal, bovendien moet het model de generalisatie en specialisatie van objecten in een classificatie hiërarchie ondersteunen evenals de constructie van geaggregeerde objecten. Ten slotte moet het mogelijk zijn om locatie en thematische gegevens zowel onafhankelijk van elkaar als in onderlinge samenhang te generaliseren. Met het oog op al deze eisen is voor een topologisch data model gekozen.

Ten tweede komt het conceptuele generalisatie model aan de orde. Dit model is ontwikkeld, geïmplementeerd en getest uitgaande van de eisen op het externe niveau, het sluit bovendien aan op de logische datastructuren die voortvloeien uit conceptuele GIS model. Het generalisatie model moet aan vele verschillende gebruikers wensen kunnen voldoen. Data moeten bijvoorbeeld aanpasbaar zijn aan de context waarin ze gebruikt worden, d.w.z. ze moeten context transformaties kunnen ondergaan. Een dergelijke context wordt o.a. bepaald door de gebruikers behoefte, het onderwerp dat gekaarteerd wordt en de weergave schaal. Hoewel er legio contexten te bedenken zijn, spelen er t.a.v. kaartgeneralisatie twee aspecten een belangrijke rol, dat zijn de ruimtelijke analyse en het kaartontwerp.

Voor het vaststellen van kaartinhoud bij generalisatie wordt in deze dissertatie een beslisfunctie gedefinieerd. Deze functie is gebaseerd op het gebruik van een "necessity factor", verder wordt deze functie direct gekop-

146


peld aan het onderliggende datamodel. Deze necessity factor geeft in combinatie met het gegevensmodel met haar classificatie en aggregatie hiërarchieën, een grote flexibiliteit t.a.v de samenstelling van de weer te geven verzamelingen van objecten. De verschillende methoden om de necessity factor te berekenen in combinatie met deze hiërarchieën, geven de mogelijkheid om op verschillende niveaus in de classificatie hiërarchie te generaliseren. De generalisatie processen kunnen gebaseerd zijn op zowel object selecties op basis van de waarden van de attributen op deze verschillende niveaus, als op klasse generalisatie en/of ruimtelijke aggregatie stappen.

De externe eisen en de conceptuele modellen voor generalisatie en GIS worden d.m.v. logische datastructuren gerealiseerd. De resultaten van dit onderzoek tonen aan dat de gebruikte datastructuren als basis kunnen dienen voor een grote variëteit aan generalisatie mogelijkheden voor de verschillende vormen van data abstractie. De resultaten op het conceptuele en het logische niveau zijn onderzocht aan de hand van de diversiteit, die men van de verschillende gebruiks omgevingen van ruimtelijke en thematische gegevens kan verwachten. De resultaten laten zien dat het model bewerkingen toelaat, die zowel output voor kaartontwerp als voor ruimtelijke analyse leveren. Een aantal voorbeelden worden getoond, waarin verschillende verdelingen voorkomen voorbeelden die het resultaat zijn van verschillende context transformaties.

"Attenuation factors" geven de mogelijkheid om de weergaven, die door context transformaties worden gegenereerd, bij te stellen. Daar door kan de sterkte van een generalisatie beheerst worden. Deze mogelijkheid staat de gebruiker ter beschikking, als hij een keuze heeft gemaakt m.b.t. tot de aard van de generalisatie die hij op de data base wil toepassen. Daarna worden de reductie factorenuitgerekend die mate van generalisatie k\van-tificeren. Deze reductie factoren worden op alle data niveaus berekend, d.w.z. zowel op het niveau van de geometrische elementen, als op dat van de objecten, alsook op de verschillende classificatie niveaus. Met deze reductie factoren kan men beoordelen of er in voldoende mate gegeneraliseerd is, zo niet dan kan de attenuation factor bijgesteld worden. Zodoende kan het generalisatie proces precies op het gewenste weergave niveau worden afgestemd.

147

D.E. Richardson

148


Curriculum Vitae

The author of this doctoral dissertation was born in Toronto, Canada. She received her Bachelor of Arts degree with a specialization in Geography from the University of Georgia, in Athens, Georgia, U.S.A. Following completion of this degree she worked with the Canada Centre for Mapping for five years, and in 1986 went to the International Institute for Aerospace Survey and Earth Sciences, in the Netherlands for the Post Graduate Degree Program in Land Information Systems, in Urban Planning. The degree was awarded in 1987 with distinction and, in 1988 she completed her Master of Science degree in Geoinformation Systems, which was also awarded with distinction. She was the first candidate to receive an MSc degree in Geoinformation Systems at the International Institute for Aerospace Survey and Earth Sciences. In October 1990, she started her doctoral workin Canada and travelled frequently to the Netherlands for consultations and the final writing of the dissertation. She has lectured at a number of universities, and is the author a number of professional papers, articles, and book chapters. In January 1993, she began work with the Canada Centre for Remote Sensing, where she will be further developing automatic techniques for generalization involving remotely sensed data and topographic data. These techniques will be investigated in both raster and vector domains.

149

integrating steering parameters, classification and aggregation

Documents