unit4 database

59
Wang,X., ENGO351 Unit 5: Data Management Unit 4: Data Management http://www.datagovernance.com/

Upload: kim-kishi-sena

Post on 18-Dec-2015

312 views

Category:

Documents


3 download

DESCRIPTION

gis

TRANSCRIPT

  • Wang,X., ENGO351

    Unit 5: Data Management

    Unit 4: Data Management

    htt

    p:/

    /ww

    w.d

    ata

    go

    ve

    rna

    nce

    .co

    m/

  • Wang,X., ENGO351

    Unit 5: Data Management

    Data management: Outline

    Definition of a database

    Database management systems

    Conceptual data modelling

    The entity-relationship model

    Logical data modelling

    The relational database model

    The standard query language (SQL)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Database

    A database is a logical, structured collection of data about

    things, their attributes and their relationships to each other

    A database is a collection of related data.

    Within a GIS, these things and relationships have a

    spatial component

  • Wang,X., ENGO351

    Unit 5: Data Management

    Spatial Data Examples

    Examples of non-spatial data

    Names, UCID, grades of a student

    Examples of Spatial data

    Census Data

    NASA satellites imagery - terabytes of data per day

    Weather and Climate Data

    Rivers, Farms

    Medical Imaging

    Exercise: Identify spatial and non-spatial data items in

    A phone book

    A cookbook with recipes

  • Wang,X., ENGO351

    Unit 5: Data Management

    Why use database?

    Data Redundancy and Inconsistency - The same piece of information may be duplicated in several different files.

    Difficulty in Accessing Data: Conventional file processing environments do not allow data to be retrieved in a convenient and efficient manner.

    Data Isolation: Data is scattered in a number of different files. Linking these data to provide new information may require a substantial amount of programming to answer each specific query.

    Concurrent Access Problems: Multiple users attempting to access the same file may lead to problems.

    Security Problems: Certain users should not be allowed to access certain data-sets and not be permitted to edit certain data that they do not own.

    Integrity Problems: Some data values should satisfy certain consistency constraints.

  • Wang,X., ENGO351

    Unit 5: Data Management

    Database management system

    A database management system (DBMS) is a collection

    of software programs that facilitates the processes of

    defining, construction, and manipulating the database for

    various applications

    A DBMS provides tools for data input, search, retrieval,

    manipulation, and output

    Most commercial GIS include database management

    tools for local databases

    ArcGIS uses Microsoft Access

  • Wang,X., ENGO351

    Unit 5: Data Management

    Data model

    A data model is a description or view of the real world and

    data modeling is the process that formalizes the description

    or view at different levels of data abstraction

    The real-world is made up of complex spatial objects and

    phenomena

    A data model tends to be tailored to a specific application or

    problem context

    Different users may have different data models

  • Wang,X., ENGO351

    Unit 5: Data Management

    Data models and levels of abstraction

    Data modeling involves three steps corresponding to

    increasing levels of abstraction:

    conceptual modeling

    logical modeling

    physical modeling

  • Wang,X., ENGO351

    Unit 5: Data Management

    A conceptual model represents the users perception of the real world Data abstraction

    is strictly limited to the description of the information content of the users view of the real world, without any concern for computer implementation

    (Brown, 1997)

    Conceptual model (1)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Conceptual model (2)

    The function of a conceptual data model is to provide the necessary language to describe how we naturally conceptualize data organization Although conceptual data modeling is part of the database

    design process, conceptual models are database-independent

    A conceptual model provides a basic reference for users who need to understand the structure of the data in the system

  • Wang,X., ENGO351

    Unit 5: Data Management

    Conceptual model (3)

    A conceptual data model provides a way to communicate between users, designers and computers

    (Worboys, 2004)

  • Wang,X., ENGO351

    Unit 5: Data Management

    The entity-relationship model

    The entity-relationship (E-R) data model is one of the most commonly-used conceptual data models for GIS data modelling

    It is based on the concepts of:

    entities

    attributes

    relationships

    to represent real-world features, their properties and their relationships

    KimHighlight

  • Wang,X., ENGO351

    Unit 5: Data Management

    The entity-relationship diagram

    An entity-relationship diagram (E-R diagram) can be used to express the features and properties of an E-R model

    (Lo and Yeung, 2007)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Entity

    An entity is a thing or object in the real world that is distinguishable from all other objects

    each person in a classroom is an entity

    An entity has an independent existence

    physical existence: a person, a car, a forest

    conceptual existence: a company, a job, an agenda

    Entities can be thought of as nouns.

    Examples: a computer, an employee, a song, a mathematical theorem

    Entities are represented as rectangles.

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relationships between entities

    A relationship captures how two or more entities are related to one another. i.e. the relationship performs between the entity artist and song

    Relationships can be thought of as verbs, linking two or more nouns

    Relationships are represented as diamonds, connected by lines to each of the entities in the relationship.

  • Wang,X., ENGO351

    Unit 5: Data Management

    Attributes

    Entity and relations can have attributes. i.e. an employee entity might have an SIN attribute

    i.e. entity City has attribute types name, population, density

    The attributes are associated to each occurrence of the entity type

    Attributes are represented as ellipses connected to their

    owning entity sets by a line.

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relationship types (1)

    Relationship types are subdivided into three categories: one-to-one (1:1)

    many-to-one (N:1)

    many-to-many (M:N)

    Participation constraint: entity occurrence may only exist if it participates in a relationship. One can have total and partial participation constraints

    i.e. one director is the manager of at most one cinema (double lines means mandatory)

    (Worboys, 2004)

    A BDouble line m e ans B ca nnot exist

    without A. A c an exist without B.

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relationship types (2)

    In a many-to-one relationship, many occurrences of one entity type may have a relationship with at most one occurrence of another entity type i.e. many cinemas are located in a town

    (Worboys, 2004)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relationship types (3)

    In a many-to-many relationship, many occurrences of one entity type may have a relationship with many occurrences of another entity type i.e. many roads connect

    many cities

    (Worboys, 2004, p. 58)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Process of creating E-R model

    1. Start with textual description

    2. Identify entities (nouns)

    3. Tabulate entities

    4. Determine relationships

    5. Determine cardinality ratio/participation constraints

    6. Determine attributes & key attributes

  • Wang,X., ENGO351

    Unit 5: Data Management

    An Example: World Database

    Conceptual Model 3 Entities: Country, City, River

    2 Relationships: capital-of, originates-in

    Attributes listed in the figure

  • Wang,X., ENGO351

    Unit 5: Data Management

    Logical model (1)

    A logical data model represents the real world by means of diagrams, lists, and tables designed to reflect the recording of the data in terms of some formal language

    Logical models are software dependent; they must be expressed in terms of the language of a specific database management system (i.e. relational model)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Logical model (2)

    This figure illustrates the logical model that was created from the conceptual model previously shown

    (Brown, 1997)

  • Wang,X., ENGO351

    Unit 5: Data Management

    The relational database model (1)

    The relational database model,

    developed by Codd in the early 1970s, is

    the most widely used logical database

    model in the computer industry

    It represents the database as a collection

    of tables (simple files) called relations.

  • Wang,X., ENGO351

    Unit 5: Data Management

    The relational database model (2)

    (Aronoff, 1993)

  • Wang,X., ENGO351

    Unit 5: Data Management

    The relational database model (3)

    Each table in the

    database

    represents an entity

    type identified in

    the data modeling

    process

    (Brown, 1997)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Column and tuple

    In a particular table,

    each column represents an attribute

    each row, called a tuple (or record), represents a collection of

    data values associated to the occurrence of an entity

    (Lo a

    nd Y

    eung,

    2007)

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relation

    A relation can be simply thought of as a table of data

    A relation is made up of a set of tuples

    There can be any number of tuples in a relation

    Within a relation, the logical order of the tuples is not

    important

  • Wang,X., ENGO351

    Unit 5: Data Management

    Domain

    A domain is a set of attribute values associated to each

    column of a relation

    A domain is identified by an attribute name

    There are occasions when the values of some attributes

    within a particular tuple are unknown or missing

    a special value, called null (which is not zero), is assigned

  • Wang,X., ENGO351

    Unit 5: Data Management

    Primary and foreign keys (1)

    Tables, in a relational database, are connected to each

    other using keys

    A primary key represents one or more attributes (columns

    in the table) whose values can uniquely identify a record in

    a table

    Its counterpart in another table for the purpose of linkage

    is called a foreign key

    http://www.datagovernance.com/

  • Wang,X., ENGO351

    Unit 5: Data Management

    Primary and foreign keys (2)

    Thus, a key

    common to two

    tables are used to

    establish

    connections

    between

    corresponding

    records in the

    tables

  • Wang,X., ENGO351

    Unit 5: Data Management

    Primary and foreign keys (3)

    Keys are a simple

    way to connect

    tables within a

    relational database

    Collector

  • Wang,X., ENGO351

    Unit 5: Data Management

    Primary key

    Attribute values of the primary key allow users to identify

    individual tuples uniquely

    e.g. SIN, an identification number

    The primary key must not contain null values (Why?)

    What if cannot use of a certain attribute alone in a relation

    to identify the unique identification of a tuple?

    use one or more additional attributes to form a compound key

    create an additional attribute to hold a unique identifier for each

    occurrence of an entity

  • Wang,X., ENGO351

    Unit 5: Data Management

    Foreign key

    A foreign key is an attribute in a relation that is a primary

    key in another relation

    The identical values of the primary and the foreign keys

    make it possible to logically link different tuples in different

    relations

  • 35

    Join and Relate

    Two common operations for linking tables in a relational

    database are join and relate.

    A join operation brings together two tables by using a

    key that is common to both tables.

    A relate operation temporarily connects two tables but

    keeps the tables physically separate.

  • 36

    Figure 8.10 (Chang, 2012)

    Primary key and foreign key provide the linkage to join the table on the

    right to the feature attribute table on the left.

  • 37

    Figure 8.11 (Chang, 2012)

    This example of a many-to-one relationship in the SSURGO database

    relates three tree species in cotreestomng to the same soil component

    in component.

  • 38

    Figure 8.12 (Chang, 2012)

    This example of a one-to-many relationship in the SSURGO database

    relates one soil map unit in mapunit to two soil components in

    component.

  • Wang,X., ENGO351

    Unit 5: Data Management

    Normalisation

    Normalising is the splitting of the database into multiple tables.

    Two main reasons for normalising a database: prevents unnecessary duplication of data, thus conserving time

    and disk space, and in some cases, preventing errors.

    makes it easier to extract exactly the information from the database.

    Each normal form is built on the normal form before it. e.g. if a table is in 3-NF it is in 2_NF

    KimHighlight

    KimHighlight

  • Wang,X., ENGO351

    Unit 5: Data Management

    Normal Forms

    1-NF: attributes are atomic or single valued.

    2-NF: All non-primary attributes are fully dependent on the primary key; no non-primary attributes are

    functionally determined by a subset of the key

    3-NF: No non-prime attributes are functionally determined by another non-prime attribute; all non-

    primary attributes are directly dependent on the primary

    key.

    Example: A company runs many projects. Employees work on particular tasks on different projects. How do we invoice the time

    each employee allocates to their different tasks on each project?

  • Wang,X., ENGO351

    Unit 5: Data Management

    Guidelines for relational database design

    1. Create E-R model

    2. Produce table for each entity

    3. For 1:1 relationships,

    add foreign key to either participating entities

    For 1:M relationships,

    add foreign key to entity on "Many" side of relationship

    4. For N:M relationships, create a new table

    5. Check design is "normalised"

  • Wang,X., ENGO351

    Unit 5: Data Management

    Advantages of the relational model

    A relational database is simple and flexible

    Each table in the database can be prepared, maintained,

    and edited separately from other tables

    The tables can remain separate until a query or an

    analysis requires that attribute data from different tables be

    linked together

  • Wang,X., ENGO351

    Unit 5: Data Management

    Physical model

    A physical data model describes the physical storage of the data in the computer by record formats and access paths

    It is hardware dependent and is concerned primarily with the implementation details of a database

    It is intended for the system programmer and database manager, not the general user

    (Lo and Yeung, 2007)

    KimHighlight

  • Wang,X., ENGO351

    Unit 5: Data Management

    (Lo and Yeung, 2007)

  • Wang,X., ENGO351

    Unit 5: Data Management

    SQL

    SQL (Structured Query Language) is a data query

    language designed for relational databases

    SQL has been developed by IBM in the 1970s and many

    commercial database management systems such as

    Oracle, DB2, Access, and Microsoft SQL Server have since

    adopted the query language

    This is an English-like language that consists of a set of

    powerful and flexible commands for the manipulation of the

    data in the relational tables

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relational algebra

    The SQL language is made with a set of relational

    operators and commands that can be combined to query a

    database. These operators and commands form what is

    called relational algebra

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relational operators (1)

    With SQL, six logical operators can be used:

    equal to

    not equal to

    smaller than

    greater than

    smaller or equal to

    greater or equal to

  • Wang,X., ENGO351

    Unit 5: Data Management

    Relational operators (2)

    With SQL, three boolean

    operators can be used:

    NOT

    AND

    OR

    (Chang,

    2006)

    The shaded area represents:

    the complement of A (NOT)

    the union of A and B (OR)

    the intersection of A and B (AND)

  • Wang,X., ENGO351

    Unit 5: Data Management

    SQL commands (1)

    SQL is used to create and query the database using a set

    of SQL commands. e.g.

    the create command creates databases and tables

    the select command selects rows that have been inserted into the

    tables

    By using SQL commands, the user needs only to specify

    the tables, columns, and row qualifiers to retrieve any data

    item in the entire database

    the users do not need to know the technical details of how the data are stored

  • Wang,X., ENGO351

    Unit 5: Data Management

    SQL commands (2)

    SQL commands are used to perform two main functions:

    to define the database structure (data definition)

    e.g., CREATE

    to insert, modify, and retrieve data from the database (data

    manipulation)

    e.g. SELECT

  • Wang,X., ENGO351

    Unit 5: Data Management

    CREATE command

    To create a database:

    CREATE DATABASE database_name

    To create a table in a database:

    CREATE TABLE table_name (column_name1 data_type,

    column_name2 data_type, ....... )

    CREATE TABLE ADDRESS_BOOK (

    NAME CHAR (30)

    COMPANY CHAR (20)

    E-MAIL CHAR (25)

    )

    NAME COMPANY E-MAIL

    ADDRESS_BOOK

    KimSticky Note;

    KimSticky Note;

    KimSticky Note;

  • Wang,X., ENGO351

    Unit 5: Data Management

    INSERT and DELETE

    To insert new rows into a table:

    INSERT INTO table_name VALUES (value1, value2,....)

    INSERT INTO River (Name, Origin, Length) VALUES (Mississippi, USA, 6000)

    To delete rows in a table:

    DELETE FROM table_name WHERE column_name =

    some_value

    DELETE FROM City

    WHERE Country = Canada;

    NAME COMPANY E-MAIL

    John Smith Travelcity [email protected]

    INSERT INTO ADDRESS_BOOK (NAME, COMPANY, E-MAIL)

    VALUES (John Smith, Travelcity, [email protected]) DELETE FROM ADDRESS_BOOK WHERE COMPANY =

    Travelcity;

  • Wang,X., ENGO351

    Unit 5: Data Management

    SELECT command

    The select command extracts data items in specified

    rows of a table. It returns a new table that has a subset of

    tuples of the original.

    SELECT * FROM table_name

    SELECT column_name(s) FROM table_name

    SELECT column_name(s) FROM table_name(s) WHERE conditions

    GROUP BY column_name

  • Wang,X., ENGO351

    Unit 5: Data Management

    World database data tables

  • Wang,X., ENGO351

    Unit 5: Data Management

    SELECT Example 1.

    Simplest Query has SELECT and FROM clauses

    Query: List all the cities and the country they belong to.

    SELECT Name, Country

    FROM CITY

    Result

  • Wang,X., ENGO351

    Unit 5: Data Management

    SELECT Example 2.

    Commonly 3 clauses (SELECT, FROM, WHERE) are used

    Query: List the names of the capital cities in the CITY table.

    SELECT Name

    FROM CITY

    WHERE CAPITAL=Y

    SELECT *

    FROM CITY

    WHERE CAPITAL=Y

    Result

  • Wang,X., ENGO351

    Unit 5: Data Management

    SELECT Example 3

    Query: List the attributes of countries in the Country

    relation where the life-expectancy is less than seventy

    years.

    SELECT Co.Name,Co.Life-Exp

    FROM Country Co

    WHERE Co.Life-Exp

  • Wang,X., ENGO351

    Unit 5: Data Management

    Multi-table Query Examples

    Query: List the capital cities and populations of

    countries whose GDP exceeds one trillion dollars.

    Note:Tables City and Country are joined by matching City.Country = Country.Name.

    SELECT Ci.Name,Co.Pop

    FROM City Ci,Country Co

    WHERE Ci.Country =Co.Name

    AND Co.GDP >1000.0

    AND Ci.Capital=Y

  • Wang,X., ENGO351

    Unit 5: Data Management

    Multi-table Query Example

    Query: What is the name and population of the capital city

    in the country where the St. Lawrence River originates?

    SELECT Ci.Name, Ci.Pop

    FROM City Ci, Country Co, River R

    WHERE R.Origin =Co.Name

    AND Co.Name =Ci.Country

    AND R.Name =St.Lawrence AND Ci.Capital=Y