1 ron briggs, ut-dallas gisc 6383 gis management and implementation 9/8/2015 database design and...

59
06/23 /22 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation Database Design and Decisions GISC 6383 Management and Implementation of GIS Note: some figures in this document are adapted from: State of New York GIS Development Guides ftp://ftp.sara.nysed.gov/pub/gis/sara.zip chapter 3: conceptual design chapter 6: database planning and design chapter 7: database construction

Upload: camron-jones

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

1Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Database Designand Decisions

GISC 6383 Management and Implementation of GIS

Note: some figures in this document are adapted from:

State of New York GIS Development Guides

ftp://ftp.sara.nysed.gov/pub/gis/sara.zip

chapter 3: conceptual design

chapter 6: database planning and design

chapter 7: database construction

Page 2: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

2Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

1. Conceptually Model USERS view(Tomlinson Chapter 6,7)

2. Logically Model the Data– Define data entities and their

relationships– Identify representation of entities and

match to spatial data model(Tomlinson Chapter 8, & 9 thru p.136)

3. Physically Model the DataCreate physical database design for selected

software (Oracle, ArcGIS, etc)

4. Design Process for obtaining and converting Data from source

ESRI Steps for Building Geodatabase1. Model the user’sView of the data

2. Define objects and relationships

3. Select geographic representation.

4. Match to geodatabase elements5. Organize geodatabase structure(Zeiler, p. 18)

Data Base Design Process: steps

Conceptual

Physical

Object Oriented Analysis and Design (OOAD)

Database Schema in UML Diagram(using Visio)

Reality

Geodatabase Generation and Population

[

(Review from earlier lecture.)

Page 3: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

3Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

• We covered Conceptual Modeling/Needs Assessment in the lecture on Implementation Steps

• Tonight we focus on Logical Design and Physical Design

• These are the guts of database design/data modeling

Loosely corresponds to Tomlinson’s Chapter 8: Create a data designand Chapter 9 thru p. 136

Page 4: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

4Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Objectives of Database Design: – Satisfies and supports organization’s objectives – Contains all data but no redundant data

• Minimizes redundancy across the organization

– Allows for different users to access same data• Consistent and flexible data retrieval and analysis

– Accommodates different views of the same data• Based on user and application needs• Increases likelihood of users developing applications

– Appropriately represents & organizes geographic features– Maintains the data so its currency is assured– Secures data by distinguishing applications (& users) which

• create data (add records for new entities)• update data (maintain & modify existing data records)• read data (use but can’t modify in any way)• delete data (remove records from database)(CRUD)

Page 5: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

5Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Logical Data Modeling

Page 6: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

6Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Why build a model?• We build models of systems in order to break them up into

well-defined subsystems, because it helps us to overcome the difficulties in comprehending such systems in their entirety.

• As the complexity of systems increase, so does the difficulty of comprehending them and the importance of good modeling techniques to assist us in managing them.

• Good models with well-defined semantics are essential for communication among project teams and to assure architectural soundness.

In other words, developing a model for an industrial-strength software system prior to its construction or renovation is as essential as having a blueprint for a large building.

Luis X. B. Mourão http://cplus.about.com

Page 7: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

7Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Logical Data Modeling – It is an entirely data-driven

process– It encourages comprehensive

understanding of business information requirements

– It enables effective communication among designers, developers, and users throughout the design process

– It forms the basis for designing correct, consistent, sharable, and flexible databases using any database technology

• Correct– an accurate and faithful

representation of the way information is used in the business

• Consistent– no contradictions in the way

information objects are named, defined, related, & documented

• Sharable– accessible by multiple applications

and users to meet varying access requirements

• Flexible– easily updateable when

implementation or business changes

Page 8: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

8Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Logical Data Modeling in practice• purpose of data model is to ensure that data has been

identified in a completely rigorous and unambiguous fashion on which both user and GIS analyst agree

• logical data models define entities (the unit about which we collect information—e.g. people, companies), the attributes of entities (the information collected—age, salary), and the relationships between entities (companies pay salaries to people)

• developed through use of – entity-relationship diagrams which show relationships

among all data throughout the organization– data dictionaries (structured lists) which document each

entity, its attributes, and its relationships

Page 9: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

9Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Entity Relationship (E-R) Diagrams and Logical Data Modeling• Entity (noun)

– objects or things to be included in the database– person, place thing or concept about which you wish to record info.– for example: employee, company, citizen, street

• Attribute (adjective) (but they are often nouns—e.g. a name!!)– characteristics or measurements to be recorded for the entities– fact or nondecomposable piece of information describing an entity– for example: age, dob, owner, street type

• Relationships association between entities (verb)employee---works for----company company--------has-------employees

transformers ----are mounted on----poles land parcel---has----owner

• Cardinality of Relationships (“adverb”)– one to one– one to many– many to many

• Business Rules (attribute domains and validation rules) – requirements that attributes or relationships must meet– specifications which preserve the integrity of the logical data model by governing the values attributes

may assume or the cardinality relationships may take on– For example: # of kids is an integer between 0 and 24; poles have 0 to 3 transformers

Country ----has--- Capital city (one capital city per country)

Company (one) <----Work for---- ---has------> (many) EmployeesParcels < ----have----> owner s (parcels >1 owner; owners>1 parcel)

Page 10: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

10Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Name

Age

Sex

Name

Function

Size

Employee Works in Department

Has

DependentsOccupant_Name

Unit_Number

Job Title

Name

Age

Relationshlip to employer

Building_Name

Height

Floor_area

Owner_Name

Owner_Address

Situs_Address

Building Located on Parcel

Has

OccupantOccupant_Name

Unit_Number

ID #

ConventionalE-R Diagram

GIS E-R Diagram

Page 11: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

11Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Entity or Attribute? owner as examplePerennial problem in modeling

• Do you think of it as a number or text >>>attribute

• Does it have attributes of its own >>>entity– Owner has name, billing address,

• Does it have a relationship with other things >>>entity

– “Owner receives tax bills”

• Does it repeat elsewhere >>>entity– Owner could own many parcels

• Is it important by itself >>>entity– An owner IS important

If in doubt, make it an entity, Can change later in physical model

ParcelPINOwner

ParcelPIN

OwnerName

This? Or this?

Some rules for attributes• Primitives: no meaning in itself

– e.g 50

• All values of same kind – e.g all integers: 4, 6 not 4, SIX

• Describe entity not another attribute – e.g. owner name describes owner of parcel, not

the parcel

• Never a list: e.g. owner: John Smith, Bill Jones

• Never repeatede.g. owner1, owner2, owner3

• Undecomposable or composite?e.g. 117 West Plano Road

Exceptions are always made!!!

e.g. in 911 when speed matters

Page 12: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

12Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

…...data dictionaries (structured lists) document each entity, its attributes, and its relationships

Entity Definition

Entity Name:

Definition:

Unique identifier:

Attributes:

Building

A unique structure with roof

Bldg_Id

Bldg_Id, Bldg_Name, Floor_Area,Height

Relationships: located on parcel has occupant

Entities which have spatial expression need additional conceptualization… …see following slides

When using UML techniques to create Entity-Relationship diagrams, entities, attributes and relationships can all be documented within the diagram.

Data dictionaries become part of the metadata.

Recording measurement units (meters, feet) for attributes is critical and problemattic. Can include in: --name: length_in_feet (clumsy) --metadata (metadata gets lost, forgotten, or separated from data)

Page 13: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

13Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Spatial Data Modeling

Spatial data differs in two key ways, and these must be incorporated:– entities have a corresponding spatial expression.

In ESRI terms, • Objects: entities without spatial expression (e.g. owner)• Features: entities with spatial expression (e.g. parcel)• An entity, when given spatial properties, becomes a

GIS spatial data layer

– relationships may have a spatial expression also

Page 14: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

14Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

…in essence, you need to decide how to represent each entity spatially

• Point or a point symbol

• Line or line types

• Areas or polygons

• Surfaces or surface drapes

• Raster format such as scanned paper documents

• Images such as photos, satellite or clip art

• or plain old Alpha or numeric

Note that:

• one entity could be visualized through another

– footprint on a lot map

• might be different at different scales– airport a point at one scale and polygon at

another

• might be different for two applications– street as line for routing– street as polygon for pavement

management

• attributes might be displayed graphically as annotation

Page 15: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

15Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

…..identifying entities’ spatial representation(examples)• Property associated

– Legal parcel– Assessor parcel– Parcel boundary– plat map– Parcel photograph– Owner– Address– Land value

• Street associated– Street– Street segment– Intersection– Traffic light– Traffic analysis zone– Bus route– Bus stop

– polygon– polygon– line string– raster– image– alphanumeric– alphanumeric– numeric

– line string– line segment– node– point– polygon– route– point

pixels

abcdef123

110210.67

(And these will need to be supported by the GIS software you select)

in essence, you need to decide how to represent each entity spatially

Page 16: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

16Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Additionally, you will need to identify the classic

spatial data properties for each layer• Scale ranges at which data is required to be

displayed• minimum resolution required to support intended

applications• minimum accuracy required to support intended

applications• Projection(s) in which data will be

– stored and – used (may not be the same)

Page 17: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

17Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Spatial Relationships

Spatial Descriptive Common GIS E-R Model Relationship Verbs Implementation Symbol

Connectivity

Contiguity

Containment

Proximity

Coincidence

Connect, link

Adjacent,abutt

Contained,containing,within

Closest,nearest

Coincident,Coterminous

Topology

Topology

Spatial joinoperation

Spatial joinoperation

Spatial join.operation

street segments link to street network

cities common border

lot within floodplain

house nearest fire_hydrant

Valve and gauge same manhole

How GIS system might implement the relationship

…identifying spatial relationships, and how they will be implemented

Possible ER symbol (not standardized)

Page 18: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

18Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

STREETSEGMENT

CONTAINS

POLYGON G T

WETLANDS

CONTAINS

CONTAINS

CONTAINS

HAS

OCCUPANT

ADJACENT

HAS

CONTAINS

WITHIN

LINE G T

WATERMAIN

LINE G T

WATER SERVICECONNECTION

LINK

LINK

LINE G T

COINCIDENTLINE

INTERSECT

NODE G TLINK

ABUTTING

NEARESTADDRESS

HYDRANT

NODE G T

VALVE

NODE G T

STREET SYSTEM

POLYGON G T

FLOODPLAIN

POLYGON G T

SOILS

POLYGON G T

POLYGON G

BUILDING

POLYGON G T

POLYGON G T

POLYGON G T POLYGON G T

POLYGON G T

STREETSEGMENT

POLYGON G T

INTERSECTION

PARCEL

ZONING

CENSUSTRACT

CENSUSBLOCK

TRAFFICZONE

 

Example E-R Diagram with Spatial Concepts for Urban Application

Contains:--15 entities--16 relationsAttributes not included for simplicity

Corrected from original source

Page 19: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

19Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

E-R DiagramFinal E-R diagram should be verified with users for:• required entities• appropriate spatial representation of entities• required attributes• appropriate spatial relations/operations

Once verified, the E-R diagram becomes the basis for the physical data base design.

Page 20: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

20Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Creating E-R diagrams: Unified Modeling Language• E-R diagrams can be drawn by hand or with any drawing package• CASE (computer aided software engineering) tools emerged in

the 1980s and early 90s) to aid in the process (e.g. TI’s Composer)

• Each used their own proprietary language and symbolization• UML (Unified Modeling Language) developed in mid/late 1990s

to provide standardized modeling language based on object-oriented concepts.– Initiated at Rational Software Corporation in 1994/95 by merging of Grady

Booch’s (Booch Model), Jim Rumbaugh’s (OMT--Object Modeling Technique) and Ivar Jacobson’s (OOSE--Object-Oriented Software Engineering) method

• Existence of UML standard allows data base vendors to support automated conversion of conceptual data models to physical data base designs

• ArcGISs support use of MS Visio2000/02/03 (enterprise edition [2000], professional [02/03] for full support) [see Zeiler, p. 19-20] – Sample geodatabase schemas (templates) available for different industries

• Other UML based products: Rational Rose, Paradigm Plus, Oracle2000, ERwin (from Computer Associates)

Page 21: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

21Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Physical Data Modeling

• Creating physical database design for the selected database software – Oracle, SQL Server, ArcGIS, Intergraph, etc

• We will focus on physical database design for an ESRI Geodatabase

Loosely corresponds to Tomlinson’s Chapter 8: Create a Data Design

Page 22: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

…physical data base design involvesFor entities and attributes • Representing (“mapping”)* all entities and their attributes into one or more relational

tables in a selected RDBMS and determining keys for forming relationships• For each spatial entity, selecting an appropriate

– GIS data type e.g. polygon, line, point, surface

– GIS data set format for storage: e.g. geodatabase, coverage, grid, tin, image, etc

– Spatial Reference System • coordinate system (geographic or projected, etc.)• Spatial domain • Precision (or resolution)

• Selecting an appropriate Data Type for each field (attribute)– e.g. for ESRI: string, short integer, long integer, float, double, blob.

For relationships (associations)• for regular (non-spatial) relationships, identifying which of the RDBMS’s normal

query structures or relational operators will handle the relationship– if won’t do it, develop specs for a custom application– See Appendix for discussion of relational operators

• for spatial relations, identify which capabilities of software will handle the desired operation (e.g. nearest neighbor identification)

– if won’t do it, develop specs for a custom application

*Note: Computer people talk about “mapping” entities to tables rather than “representing”. Mapping in this sense does not mean producing cartographic output!

Page 23: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

23Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Example of “mapping” a relationship for RDBMS attribute tables --the relation contains is mapped to a table join --the building table has a foreign or secondary key, (parcel ID#) topermit a join with the primary key (also parcel ID#) in parcel table The spatial data structure is shown as two generic layers

PARCEL

POLYGON POLYGON

BUILDINGCONTAINS

LAYER

PARCEL ID #

BUILDING ID #PARCEL ID #

LAYERPARCEL TABLE

BUILDING TABLE

table join

ConceptualE/R diagram

PhysicalDatabase design

(primary key)

(foreign key)(Primary key=foreign key)

(one) (many)

Page 24: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

ParcelPOLYGON G T

ARC AAT TIC BND ETC. PAT

AreaPerimeterPoly ID #Sub_bl_lot#

Parcel

ParcelINFO Parcel ID#Owner_nameOwner_addSitus_addDepthFront_footageAssessed_valueLast_sale_dateLast_sale_price

Previous Values

Parcel ID#YearOwner_nameOwner_addressAssessed_value

Attributes represented in Oracle Tables

Attribute List of Entity "Parcel"Parcel [subdivision_block_lot#,

owner_name, owner_addresssitus_address, area, depth,front_footage, assessed_value,last_sale_date, last_sale_price(owner_name, owner_address,assessed_value as of Jan. 1 forlast two years)]

ARC/INFO Spatial Database Structure (coverage)

A key field for parcels is formed by concatenating subdivision, block & lot #s

Example of Spatial

Database “Mapping” for ArcInfo

coverage

What is wrong with this design?

spatial expression represented as an ArcInfo polygon coverage

PAT=Polygon Attribute Table

Page 25: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

WATER MAIN

LINK

VALVE

HYDRANT

WATER MAIN ID #

VALVE ID #

HYDRANT ID #

WATER MAIN

WATER MAIN ID #

VALVE

VALVE ID #

HYDRANT

HYDRANT ID #

Attribute: SQL-Server TABLES

Spatial: geodatabase

Feature data set Feature class

Feature class

Design decisions:• why distribmains and transmains as separate feature classes?• why not valve with gate and hydrant subclasses? (--different attributes)• why prodwell1 and prodwell2?

Exampleof Database Mapping:for geodatabase

Page 26: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

26Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

ESRI-related DB Design Decisions

Page 27: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

ESRI DB Design Decision Overview• Type for spatial data layer

– Vector (point, line or polygon), raster, tin ?• Format for spatial data

– Coverage, shapefile or geodatabase ?Geodatabase Design Decisions• Feature Datasets

– Stand-alone feature classes or feature data sets (fds)?– Spatial Reference system for each feature data set

• Coordinate system: lat/long or projected? which projection? what parameters?• Spatial Domain (extent) and Precision ?

• Feature Classes (tables)– Subtypes or separate feature classes?

• roads feature class with road_type subtype, or separate freeway, arterial, street feature classes?• Attributes Types and Validation

– Type (string, long integer, short integer, etc.) ?– Validation Rule through application of Attribute Domain ?

• Domains and Defaults• Relationships and Associations between feature classes

– Implement Geometric networks and/or topology rules?– Implement relates or joins in the database or in ArcMap documents?

• Type of Geodatabase– Personal geodatabase or SDE-based geodatabase ?

Page 28: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

28Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Spatial Data TypesOverall, ArcGIS 9 supports at least four representations of geographic data.

• Vector data for representing features– CAD, Coverages, shapefiles, geodatabases

• Raster data for images, and surfaces.– Image data in .bmp, .tiff, .jpeg, .sid, ERDAS formats

– Raster data in discrete or continuous GRIDS (ESRI’s native file format for raster)• Discrete grids can have an attribute data table, continuous do not

– Raster data in a Geodatabase (as of ArcGIS 9)

• Triangulated Irregular Networks (TINS) for surfaces.– Although TINS are a vector format, as of ArcGIS 9.1, they are not yet supported by

the Personal Geodatabase and must be stored in coverage workspaces or SDE geodatabase

• Tabular data (sometimes called Event tables).

– List of X,Y coordinates for points (such as may be output from GPS)

– “Locators” for finding a geographic position from an address.

A decision must be made as to the spatial data type for each layer.

Page 29: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

29Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Personal GeodatabaseFeature data setFeature class (feature type = polygon)Feature class (feature type = arc)

Coverage (= feature class)Feature type (arc)Feature type (point)Feature type (polygon)Feature type (point)

In a gdb, feature class can have only one feature type.

A coverage can have multiple feature types-now viewed as a shortcoming.

Coverage (= feature class)Feature type (arc)Feature type (point)Locator (table)Raster ShapefileShapefile

Tracts feature class table

Features(rows)

Feature type

Feature ID(key field)

Secondary orForeign key

Spatial File Formats--example

Page 30: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

30Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Texas geodatabaseDallas County feature data

setfeature classes

All feature classes extentwithin a feature data datum set must be in the projectionsame spatialreference system

Plano feature data setfeature class feature class

Stand-alone feature classesEach stand-alone feature class maybe in a differentspatial reference system US Geodatabase

Stand-alone feature class

Geodatabase Design Decisions—example

Rasters and TINs can be stored within a SDE geodatabase but not a personal geodatabase

Page 31: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

31Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Geodatabase

Feature datasetsSpatial Reference

Object classes and subtypesFeature Classes and subtypesRelationship classesGeometric Network Topology

Domains

Validation Rules

Raster Datasets (SDE Only)rasters

TIN datasets (SDE only )nodes, edges, faces

Locatorsaddresses x,y locationsZip codes place namesroute locations

Anatomy of a GeodatabaseGeodatabases contain: feature datasets, raster

datasets, TIN datasets (planned 9.2), locators

Feature datasets contain various objects which all share a common spatial reference

Objects (e.g. Jane Blow, land owner) are instances of object classes (e.g. land owners) and have no spatial form.

Features, stored in feature classes, are spatial objects (e.g. land parcels) which are similar and have same spatial form (e.g. polygon)

Object (or feature) classes are tables, with objects (or features) in the rows of the table

Attributes are in the columns of the table Subtypes are an alternative to multiple object

(or feature) classes (e.g. ‘concrete’, ‘asphalt’, ‘gravel’ road subtypes): think of subtype as the most significant classification variable (attribute) in the class table

Page 32: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

32Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

GeodatabaseFeature datasets

Spatial Reference

Object classes and subtypesFeature Classes and subtypesRelationship classesGeometric Network Topology

Domains

Validation Rules

Raster Datasets (SDE only)rasters

TIN datasets (SDE omly)nodes, edges, faces

Locatorsaddresses x,y locationsZip codes place namesroute locations

Anatomy of a Geodatabase contdRelationship classes are tables containing general

relationships between objects and/or features (e.g. between work order object class and roads feature class)

Geometric Networks models flows thru linear systems such as streams, sewers, raods

Topology models relationships among lines and areas (e.g. common state/county boundary)

Domains are sets of valid and/or default attribute values: (e.g. road lane count default is 2; valid values are integers 1-12 )

Validation rules control feature and attribute integrity by applying domains. 3 types: attribute rules (applied to attributes or subtypes e.g gravel road 1 or 2 lanes only); connectivity rules (e.g. gravel road cannot connect to freeway); and relationship rules (constrains cardinality of a relationship e.g. gravel road can have no more than 4 segments at an intersection)

Simple behaviors are realized thru domains and validation rules

Complex behaviors and custom objects are realized by extending rules with custom programming

Page 33: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

33Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Feature classes (FC), feature datasets (fds) and subtypes• Spatial features (e.g. a land parcel) are grouped into feature classes: a table with spatial data

– Data in FC must have same topology type (all points, all lines, all polygons)• Water feature class with lakes (polygon) and streams (line) not permitted

– Minimizing the number of feature classes improves performance• Use different feature classes only when attributes are significantly different

– Use roads feature class rather than freeway, arterial, streets feature classes– Use subtype to differentiate freeway, arterials, streets (all have similar attributes)

• Subtypes are “subclasses” within a feature class that allow you to further distinguish objects without creating new feature classes

– based on a single column’s values (must be integer or long integer)– Same subtype has similar attribute values and behaviors– Use where attributes are the same across all subtypes

• Feature classes can be grouped into feature datasets (fds) or “spatial folders”– All feature classes in a fds must have the same spatial reference system, but may have different

topology (can have points and lines and polygons in same fds) – Organize by thematic similarity e.g transportation– If you wish to create a geometric network, must be in same fds– If you wish to create topology, must be in same fds – If they share geometry (street forms political boundary), should be in same fds– Security (read/write permissions, etc..) applied at the fds not the fc level!!!!

Page 34: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

34Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Data Types for Attributes• For every attribute field, must select a data type• Each RDBMS stores data slightly differently• ESRI generic data types will translate into closest RDBMS equivalent• Values given below may differ with RDBMS used ESRI Generic Data TypesString: text field. Be sure its length (number of characters), absolute or what you specify, is sufficient to record

longest data value.Short Integer: (or integer) whole numbers (no decimal point) generally +/-32,767 (2 bytes). OK for size of family, not OK for city sizeLong Integer: (or long) only supports integers to +/- 2,147,483,647 (4 bytes) Float: (or single) single precision floating point; again, be careful-- supports decimal point but perhaps only 6

digits long with decimal moveable 34 places (E34) (4 bytes)Double: double precision floating point; the safest-- supports 12-15 digits with decimal moveable up to 308

places (E308) (8 bytes)Blob: binary long decimal for special programming applicationsNote terminology:• Precision: the total number of digits (before plus after decimal)• Scale: number of digits after decimal

Page 35: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

35Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Domains and DefaultsWhy Use Them? • Data Integrity: prevents entry of invalid (“obviously wrong”) data values• Data Efficiency: choose from a set of valid values rather than type in each timeDomains define a set of legal values for a field’s attributes• Range domain: specifies a valid range of values for numerical attributes

– A water pipe must be between 1 and 100 inches wide• Coded value domain: specifies a valid set of values for an attributes. Can apply

to any type of attributes– Parcels can only have RES or VAC land use values

• Domains are defined as a geodatabase property & then applied as appropriate– Multiple objects in the same database may use the same domain– May be applied to an entire field (attribute), or separately by subtype

Defaults are values automatically assigned when a feature is created– Of course, may be changed during data entry/edit process– Again, may be applied to an entire field (attribute), or separately by subtype

Again, the physical design process requires decisions about domains and defaults, and to what they should be applied.

Page 36: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

36Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Implementing Associations and Relationships Associations (general term) may be implemented as• A relationship class within the GDB.

– both classes (tables) in the relationship (i.e. the related tables) must be within the GDB– permanent part of the data model and relationship rules are enforced – Is itself a class, stored in tables, with properties and behaviors– support 1:1 and 1:M cardinality, and many to many using a key class table.– Strictly speaking, everything else is an “association”

• An ArcMap relate– Functionally similar to a relationship class – supports both 1:1 and 1:M cardinality– Can form a relate to tables outside the GDB

• INFO tables, Access tables, dbf tables• Tables in other databases via ODBC (object data base connectivity)

– Local to the ArcMap document therefore is essentially temporary

• An ArcMap join– Supports 1:1 cardinality only– Links matching objects and visualizes them as rows in a single table– Can match objects either

• Non-spatially using key attribute fields in each table: exists only temporarily• Spatially using containment, nearness, etc. criteria: saved as new feature class or new shapefile

• An ArcMap hyperlink– Attribute in a table stores a hyperlink to a document outside GDB

• path name to file-based documents (spreadsheets, text, photos, video or sound clips, etc.)• URLs for Internet documents

– Not part of the GDB in any way; implemented via layer properties in ArcMAP

--Use relationship class when the referential integrity of data is important--Joins are simpler and require less overhead.

Page 37: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

37Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Some Special Case Relationships• Many to Many relationships

– Many class A objects match many class B (parcels & owners)– Implemented via an “attributed relationship”– Intermediate table is created to store the relationship

• Aggregation v. Composition and Simple v. Complex Relationships– Aggregation: e.g. dog has bowl, collar valve has valve box

• Aggregation implemented through simple relationships• Peer to peer: delete one, the other remains: Dog dies but bowl and collar remain

– Composition: e.g. dog has feet, tail valve has maintenance records• Composition implemented through complex relationships• Enforced dependency: delete one, and the other goes also: Dog dies, feet and tail gone

LegalArea Zoning PIN1.5 RES 1

0.75 COM 20.75 RES 30.18 RES 40.18 AGR 5

PIN Percent OWN1 100 12 50 13 50 23 25 33 75 44 75 14 25 35 100 3

Origin PK

Origin FKDestination FK

Own Name1 Mary2 Joe3 Raj4 Pedro

Destination PK

Parcel Ownership Owner

Page 38: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

38Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Spatial ReferenceAll feature classes within a feature dataset must have the same spatial reference.

• Coordinate System– Datum– Geographic (lat/long) or projected?– Projection parameters: central meridian, standard parallels, coordinate system origin

(false easting and northing)– Measurement (map) units: dd (for lat/long); feet, meters, etc. (for projected)

• Spatial domain– The allowable coordinate range for the geographic coordinates

• X/Y Domain: MinX, MaxX, MinY, MaxY (horizontal extent)– Domain defaults to 3 times the the actual data extent (100% on either side)

• Z Domain: Min, Max (vertical extent)• M Domain: Min, Max (other parameter, e.g. distance from river mouth ) (can differ within

feature data set)

– Once created, the spatial domain for feature dataset/class cannot be changed.– Data outside domain will require a new feature dataset or standalone feature class.

• Precision– Number of system storage units (SU) per one map measurement unit (MU)

• If precision is 1 and mu= 1 meter ( 1 SU per MU), cannot record values less than 1 meter • If precision is 100 and mu= 1 meter (100 SUs per MU), can record values

to 1/100 = .01 = 1 cm

Page 39: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

39Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Precision and Spatial DomainWilson NC data

Y values

2302168 238705223344962249613

3232852555 52555

X values

WilsonCity.shpShapefile Extent(range of actual data

when fds created)

Geodatabase Spatial Domain100%-200% wider on all sides thereforeDomain range is at least 3 times Extent range

(Exact amount depends on Precision)

--Geodatabase coordinates are stored as 4byte long integer. This provides 10 significant digits with max value of 2,147,483,648-- map value is multiplied by precision when stored (and converted back when displayed), so min Y values, for example, actually stored as 793707x15624=1,2400,888,268-- the values are also shifted when stored so that data is centered in storage space so you only have to ensure that max. range times precision is less than 2,147,483,648 137438x15624=2,147,331,300 < 2,147,483,648 OK, otherwise reduce domain or precision

Precision is: 15,624Since map units are feet, will support accuracy to 15624/12= 1,302th of an inch!!!

You have 10 significant digits to work with. Precision in essence controls were you put the decimal.If map unit is meters and precision is 1000, you record down to the nearest millimeter. E.g. the map value 1,123,456.1236 is stored as 1,123,456,124

With GRS80, world circumference is 40,075,016 m. therefore can map world at approx. 1.9 cm accuracy (40,075,016*100)/ 2,147,483,648

137,438

13

7,4

38

Domain range in x

Extent range in y

704002

656268

745972

793707

47734

47734

41970

Extent range in x

Dom

ain

rang

e in

y

Page 40: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

40Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Personal versus MultiUser (ArcSDE) Geodatabase• Personal Geodatabase.

– Implemented as a Microsoft Access database (*.mdb file) by using MS Jet engine which is installed along with ArcGIS8.

– Microsoft Access license not needed, but its handy to have for attribute data development

– Can be placed on local or network drives

– if on network drive and a user has edit access, other users can’t access (single user editing).

– Intended for personal or small work-group use

– can handle small to moderately sized datasets.• Max of 250,000 features per feature class (table)• maximum size is 2.0 GB

– In general, has the full functionality of ArcSDE geodatabase except

• versioning.

• Multi-user editing

– If a personal ggdb is deleted in ArcCatalog (or by Windows Explorer) , its gone.

• One .mdb file can contain a lot of data. Be carefull!!

Page 41: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

41Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Personal versus MultiUser (ArcSDE) Geodatabase

• Enterprise Multiuser ArcSDE Geodatabase– ArcSDE is a data access extension to ArcGIS 8 & 9 that serves geodatabases to

ArcGIS applications running on PC’s connected via a TCP/IP network.– supports concurrent editing by multiple users.

• Supports versioning where multiple users can concurrently edit different versions of a layer and any conflicts are resolved when versions are saved back to the original layer

– significantly higher speeds for data access than shapefiles or personal geodatabases– Supports very large databases without the need to tile or otherwise ‘subdivide’ the

data– ArcCatalog only creates and deletes connections to ArcSDE geodatabases, it can’t

delete the database– Can be deployed on UNIX or Windows NT.

• Many use UNIX platform for ArcSDE and DBMS, and XP for GIS applications– ArcSDE is centrally tuned and managed by a DBA.– Back-up and security procedures implemented in the DBMS apply to the GIS data.– Can build SQL applications to access tables in a remote geodatabase.– Requires a server with a DBMS (e.g Oracle, SQL Server) and ArcSDE.

SDE dbGISUser

server

Page 42: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

42Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Generating Geodatabase Schema• “Schema” is the definition of objects contained within a database

– For a geodatabse, objects may include Domains, Tables, FeatureClasses, Relationships or GeometricNetworks.

• Four solutions for schema generation/management in ArcGIS 9

1. Manual creation using ArcCatalog, or

2. Design from scratch in Microsoft Visio, output to XMI* Repository, and use ArcCatalog's Case Tool to import from XMI

3. As above, but begin with one of ESRI existing Sample Data Models**, and edit

4. GeodatabaseDesigner extension (free ArcScript downloadable from www.esri.com) which can be used in conjunction with first 2 or, in some cases, as an alternative.

*XMI is a new standard for storing object models. Supported by Visio 2002 and later

**ESRI has sample data models for a variety of areas:--hydro, parcel, transportation, utilities, etc..--download from Web site

Page 43: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

43Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Pros and Cons • ArcCatalog Menu and Wizards

– Simple and easy to use; documented within ArcCatalog– Easy to introduce gross errors in the schema through either omission or addition

• Visio(UML) and the ArcCatalog Case Tool– UML (Unified Modeling Language) is a standard for modeling object-oriented objects and their properties, thus

can be applied to other databases– object inheritance significantly reduces duplication– Uses Visio's strong graphical functionality for an easy way to visualize the design.– However,

• need to buy it• More complex with steep initial learning curve• only supports a subset of geodatabase properties

• Geodatabase Designer (GD)– Fast and free to use and distribute (although not officially supported) – Supports all geodatabase properties and all ArcSDE RDBMS's.– The only bi-directional solution : schema can be EXPORTED and IMPORTED– However,

• A proprietary solution which only works with ESRI geodatabase • Only displays in html text, but can use Geodatabase Diagrammer (another free extension) to display in Visio

Page 44: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

44Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Database Normalization• An important step in the physical design is database

normalization• Developing a table structure which:

– Reduces or eliminates redundancy– Makes tables easy to manage– Simplifies changes in the future

• There is an entire theory of database normalization– we don’t have time to go into it

• Just present an example– The usual goal is to create a table structure which is in 3rd

normal form (3NF)

Page 45: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

Parcel_ID Parcel_ad Block Precinct Councillor City Mayor Own1_name Own1_ad Own2_name Own2_ad Value8 501 Sadowski 1 1001 Smith Big Green Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green Adams, K 590 Sadowski Adams, M 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green Sadowski, M 501 Sadowski 101,50075 1175 Dadley 12 1004 Hassan Little White Kroeger 592 Tierney Bertrand. K 1097 Bertrand 98,000

Unnormalized (flat file)

OWNER TABLE key fieldsOwner_ID Owner_name Owner_ad

001 Sadowski. M 501 Sadowski002 Adams, K 590 Sadowski004 Kroeger 592 Tierney003 Adams, M 590 Sadowski005 Bertrand. K 1097 Bertrand

PARCEL TABLEParcel_ID street_no street_name Block Owner_ID Value

8 501 Sadowski 1 001 105,4509 590 Sadowski 2 002 89,780

36 1001 Adnan 4 001 101,50075 1175 Dadlexz 12 004 98,0009 590 Sadowski 2 003 89,780

75 1175 Dadlexz 12 005 98,000COUNCILOR TABLE MAYOR TABLE

Precinct Councillor City City Mayor1001 Smith Big Big Green1002 Jones Big Little White1004 Hassan Little

3rd Normal form:--all fields are determined by primary key field

See: Appendix IIfor more detail

Page 46: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

46Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Data Importing Vs. Data Loading• Importing

– Creates new features within a new feature class or geodatabase table.

• The features class or table cannot exist before importing

– Database schema is imported at the same time– Often involves conversion from other formats e.g. coverages

• Loading– Appends features into an existing feature class. – Existing feature class must have the same schema as the data

sources– Can be accomplished with:

• Simple Data Loader (ArcCatalog)• Object Loader Wizard (ArcMap)

Page 47: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

47Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Conclusion• The outcome of these steps is:

– A rigorous design for our database: a “database schema”

– The design of a process for obtaining the data elements that will populate our database schema

• Identifying a data source and the necessary processing sequence for each layer

• covered in Implementation Steps lecture

Next time, we will go into the lab and look at some of this in practice.This will involve many ESRI-specific design decisions as outlined in:

dbdecisions.ppt

Page 48: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

48Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Appendix I

DBMS Relational Operators

“for regular (non-spatial) relationships (in the ER Diagram or UML model), physical database design involves identifying which of the RDBMS’s normal query structures or relational operators will handle the relationship”

Page 49: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

49Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

RDBMS: Relational Operators• Select (or Restrict)

– retrieves a subset of rows from a table based on value(s) in a column or columns

• Project– retrieves a subset of columns from a table, removing duplicates from the result

• Product– produces the set of all rows that are the concatenation of a row from one relational table with a row from another relational

table– (usually an intermediate step; not useful otherwise)

• Join– horizontally combines (contatenates) rows in one table with rows in another (or the same) table, including only rows which

meet some selection criteria relating columns of the two tables– Combines product and select

• Union– vertically combines (stacks) rows of one table with rows in the same or a different table

• Intersection– results in rows common to two (or more) relational tables

• Difference– results in rows that appear in one table but not another

• Division– results in common values in one table for which there are other matching column values corresponding to every row in another

table

Examples follow in the next three slides……

Page 50: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

50Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

PRODUCT of STORE and SUPPLIER Tables

Supplier Supplier Store Store

supplier_name flavor Name Location

Mr Chip vanilla Mr Chip New York

Mr Chip vanilla Bean, Inc. New York

Mr Chip vanilla Mrs Mousse'sNew Jersy

Mr Chip vanilla Mr Mousse's New Jersy

Mr Chip vanilla Sancho's California

Mr Chip vanilla Diet-Crème Colorado

Mr Chip vanilla Freeze-it Alaska

Mr Chip Chocolate Mr Chip New York

Mr Chip Chocolate Bean, Inc. New York

Mr Chip Chocolate Mrs Mousse'sNew Jersy

Mr Chip Chocolate Mr Mousse's New Jersy

Mr Chip Chocolate Sancho's California

Mr Chip Chocolate Diet-Crème Colorado

Mr Chip Chocolate Freeze-it Alaska

Mrs Chip Avocado Mr Chip New York

Mrs Chip Avocado Bean, Inc. New York

Mrs Chip Avocado Mrs Mousse'sNew Jersy

Mrs Chip Avocado Mr Mousse's New Jersy

Mrs Chip Avocado Sancho's California

Mrs Chip Avocado Diet-Crème Colorado

Mrs Chip Avocado Freeze-it Alaska

Mrs Chip date-nut Mr Chip New York

Mrs Chip date-nut Bean, Inc. New York

Mrs Chip date-nut Mrs Mousse'sNew Jersy

Mrs Chip date-nut Mr Mousse's New Jersy

Mrs Chip date-nut Sancho's California

Mrs Chip date-nut Diet-Crème Colorado

Mrs Chip date-nut Freeze-it Alaska

Diet-crème cottage chees Mr Chip New York

Diet-crème cottage chees Bean, Inc. New York

Diet-crème cottage chees Mrs Mousse'sNew Jersy

Diet-crème cottage chees Mr Mousse's New Jersy

Diet-crème cottage chees Sancho's California

Diet-crème cottage chees Diet-Crème Colorado

Diet-crème cottage chees Freeze-it Alaska

Diet-crème skim milk Mr Chip New York

Diet-crème skim milk Bean, Inc. New York

Diet-crème skim milk Mrs Mousse'sNew Jersy

Diet-crème skim milk Mr Mousse's New Jersy

Diet-crème skim milk Sancho's California

Diet-crème skim milk Diet-Crème Colorado

Diet-crème skim milk Freeze-it Alaska

STORE TABLEStore_Name LocationMr Chip New YorkBean, Inc. New YorkMrs Mousse's New JersyMr Mousse's New JersySancho's CaliforniaDiet-Crème ColoradoFreeze-it Alaska

SUPPLIER TABLESupplier_Name FlavorMr Chip vanillaMr Chip ChocolateMrs Chip avovadoMrs Chip date-nutDiet-Crème cottage cheesDiet-Crème skim milk

SELECT (RESTRICT) of Store Table on Location = New JerseyStore_Name LocationMrs Mousse's New JersyMr Mousse's New Jersy

PROJECT of Store table on Location column

LocationNew YorkNew JerseyCaliforniaColoradoAlaska

JOIN Store Table and Supplier Table by vendor name (same as Select from Product when supplier_name=store_nameMr Chip vanilla Mr Chip New YorkMr Chip Chocolate Mr Chip New YorkDiet-crème cottage cheesDiet-Crème Colorado(note: this is an equi-join; there are other types)

Base Tables

Page 51: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

51Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

EUROPE_STORE TableStore_Name Location StatusSir Chip London open Monsieur Chip Paris closedSenor Chip Madrid open

STORE TABLEStore_Name LocationMr Chip New YorkBean, Inc. New YorkMrs Mousse'sNew JersyMr Mousse's New JersySancho's CaliforniaDiet-Crème ColoradoFreeze-it Alaska

UNION of STORE and EUROPE_STORE TablesStore_Name Location StatusMr Chip New York nullBean, Inc. New York nullMrs Mousse's New JersynullMr Mousse's New JersynullSancho's California nullDiet-Crème Colorado nullFreeze-it Alaska nullSir Chip London open Monsieur Chip Paris closedSenor Chip Madrid open (duplicates removed, if any)

MY_FAVORITES TableStore_Name LocationComrade Chip MoscowSir Chip LondonHerr Chip Berlin

INTERSECTION of MY_FAVORITES and EUROPE_STORE TablesStore_Name Location StatusSir Chip London open

DIFFERENCE of MY_FAVORITES and EUROPE_STOREStore_Name LocationComrade Chip MoscowHerr Chip Berlin

DIFFERENCE of EUROPE_STORE and MY_FAVORITESStore_Name Location StatusMonsieur Chip Paris closedSenor Chip Madrid open

Base Tables

Page 52: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

52Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Terri’s job function is to check if employees of Big City X have taken required courses in the city’s employee training program. The process is to compare courses taken with required courses. Terri can be replaced by a RDBMS Division relational operator!

Completed Course TableStudent ID# S_Name Course# C_Name

10 Fred mis101 data10 Fred gis101 gis10 Fred mis201 program10 Fred mis301 networks30 Karen mis101 data20 John gis201 gps30 Karen gis101 gis30 Karen mis301 networks30 Karen mis201 program

Required Course TableCourse# C_Namemis101 datamis201 programmis301 networks

DIVISION of Completed_Course Table by Required_Course TableStudent ID# S_Name

10 Fred30 Karen

Base Tables

The division operator identifies Fred and Karen

Page 53: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

53Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Appendix IIDatabase Normalization Detail

•Developing a table structure which:–Reduces or eliminates redundancy–Makes tables easy to mange–Simplifies changes in the future

•Our goal is normally to have all tables in third normal form (3NF)

Page 54: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

54Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Unnormalized Data (Flat File)

Unnormalized form (flat file)

• You work for the county. In this particular state, the county records land ownership, values property, and manages all elections held in the county.

• Some of the information you need is shown in the flat file above

• This format has many problems, some of which are pointed to above

Parcel_ID Parcel_ad Block Precinct Councillor City Mayor Own1_name Own1_ad Own2_name Own2_ad Value8 501 Sadowski 1 1001 Smith Big Green Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green Adams, K 590 Sadowski Adams, M 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green Sadowski, M 501 Sadowski 101,50075 1175 Dadley 12 1004 Hassan Little White Kroeger 592 Tierney Bertrand. K 1097 Bertrand 98,000

repeating groups of fields.What if there are 3 (or 25) owners?

Not smallest meaningful value.How can you sort by street, then number?

Page 55: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

First Normal Form (1NF)First Normal Form (1NF):•Each field contains smallest meaningful value

•Parcel_ad split into two variables (street_no & street_name), thus can sort on street, then number•Owner_ad left as complex attribute ‘cos only used for mailing

•No repeating fields (owner1, owner2, etc..)•There is now no limit on number of owners per parcel

Parcel_ID street_no street_name Block Precinct Councillor City Mayor Owner_ID Owner_name Owner_ad Value8 501 Sadowski 1 1001 Smith Big Green 001 Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green 002 Adams, K 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green 001 Sadowski, M 501 Sadowski 101,50075 1175 Dadlexz 12 1004 Hassan Little White 004 Kroeger 592 Tierney 98,0009 590 Sadowski 2 1002 Jones Big Green 003 Adams, M 590 Sadowski 89,78075 1175 Dadlexz 12 1004 Hassan Little White 005 Bertrand. K 1097 Bertrand 98,000

However, problems in that:•Must use multiple primary key fields (parcel_id and owner_id) to uniquely identify a record •multiple repeating values when there are two (or more) owners: Street_no, street_name, block, precinct,councillor, mayor, city, Owner_name, Owner_ad all have repeats. Wastes space, and

•If an owner’s address changes, multiple records (rows) must be updated•If a parcel is sold, and the owner does not own any other property (e.g. Kroeger, Adams, M or Bertrand), information about that owner is lost

Page 56: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

56Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Second Normal Form (2NF):concept of 2NF and problem with 1NF table

• Second Normal Form (2NF) requires that every non-key field (attribute) be “functionally dependent” on the primary key– Functional dependency is a relationship between attributes such that knowing

one attribute automatically determines the other

• Tables with multiple fields making up the primary key are not 2NF– This usually shows up as repeating values in an attribute fields

– For example, owner_ID repeats, and knowing the owner does not determine the councilor

• knowing owner_ID (part of primary key) is 001, does not determine Councilor, which could be Jones or Smith.

Parcel_ID street_no street_name Block Precinct Councillor City Mayor Owner_ID Owner_name Owner_ad Value8 501 Sadowski 1 1001 Smith Big Green 001 Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green 002 Adams, K 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green 001 Sadowski, M 501 Sadowski 101,50075 1175 Dadlexz 12 1004 Hassan Little White 004 Kroeger 592 Tierney 98,0009 590 Sadowski 2 1002 Jones Big Green 003 Adams, M 590 Sadowski 89,78075 1175 Dadlexz 12 1004 Hassan Little White 005 Bertrand. K 1097 Bertrand 98,000

(1NF table)

Page 57: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

57Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Second Normal Form (2NF): example of 2NF form PRECINCT TABLE key field

Precinct Councilor City Mayor1001 Smith Big Green1002 Jones Big Green1004 Hassan Little White

OWNER TABLEOwner_ID Owner_name Owner_ad

001 Sadowski. M 501 Sadowski002 Adams, K 590 Sadowski004 Kroeger 592 Tierney003 Adams, M 590 Sadowski005 Bertrand. K 1097 Bertrand

PARCEL TABLEParcel_ID street_no street_name Block Precinct Owner_ID Value

8 501 Sadowski 1 1001 001 105,4509 590 Sadowski 2 1002 002 89,78036 1001 Adnan 4 1002 001 101,50075 1175 Dadlexz 12 1004 004 98,0009 590 Sadowski 2 1002 003 89,780

• In each table, there is only one key field, and knowing its value determines the value of all other attributes

– Satisfies criteria for 2NF

– Far fewer repeats and duplicate editing problems

• Note that there are still shortcomings, for example– if the mayor of city “big” changes, we must update two records

(2NF tables)

Page 58: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

58Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

Third Normal Form (3NF)• 3NF requires that no non-key field be a fact about another non-key field

– This will be violated if there is transitive dependency in a table• Transitive functional dependency occurs when the value in a non-key field is determined by a value in another

non-key field• The value for city determines mayor, (and neither of these is the key field)

• In 3NF, fields can only be attributes of the primary key, and not of some other field– Tables not in 3NF usually have repeating values in a non-key field (e,g mayor field in PRECINCT table)– Mayor ‘Green’ is a fact about city (a non-key field), not about precinct (the key field)

PRECINCT TABLE (2NF)Precinct Councilor City Mayor

1001 Smith Big Green1002 Jones Big Green1004 Hassan Little White

COUNCILOR TABLE (3NF)Precinct Councillor City MAYOR TABLE (3NF)

1001 Smith Big City Mayor1002 Jones Big Big Green1004 Hassan Little Little White

Precinct Table (in 2NF) is split into Councilor and Mayor tables in 3NF

Page 59: 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and Decisions GISC 6383 Management and Implementation of GIS

04/19/23

59Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation

4th and 5th Normal Form• Our goal is usually to have all tables in at least 3rd normal form• 4th and 5th normal forms also exist, but these can add disadvantages (for

example, processing inefficiency) as well as advantages• For example, 5th Normal form has no duplicated data, but requires junction

tables to link data and form relationships

PRECINCT TABLE (5NF) COUNCILOR TABLE (5NF) CITY TABLE (5NF)Precinct Voter_count Councilor Age City Mayor Population

1001 3245 Smith 27 Big Green 500,0001002 5600 Jones 85 Little White 5,0001004 2001 Hassan 431005 750

Junction Tables

Precint Councilor Councilor City1001 Smith Smith Big1002 Jones Jones Big1004 Hassan Hassan Little1005 Hassan