database management system (dbms) by: dr. mohamed yagoub mohamed e-mail: [email protected] url:
Post on 21-Dec-2015
222 views
TRANSCRIPT
Database Management System (DBMS)
By: Dr. Mohamed Yagoub Mohamed
E-mail: [email protected]
URL: http://www.angelfire.com/mo/yagoub
OverviewOverview
Why we need database
Database Concept
Types of database models
Relational database
DBMS
• DBMS is a collection of data (database) and programs to access that data. The goal of DBMS is to store, retrieve, and display information
• Key characteristics of DBMS are: performance, store large volume of database, share data (access), provide security (authorization), remove redundancy (normalization) and provide concurrent access (different users at the same time).
Why we need database?
Without database GIS is cartography
(electronic map)
No database No spatial analysis
Why we need database?
Principles of GIS
GIS
Hardware
Software
Information
People
Database and GIS
Spatial datalocation
AttributeDBMS
GIS database
Text, ImagesSound, movieMultimedia
Data abstraction
• Physical level: Describe how the data are actually store (word or bytes)
• Conceptual level: Describe what data are actually stored in the database (Structure). it gives Schematic representation of phenomena
• View level: Describe only a part of the entire database. Many users of the database may be concerned with a subset of information. The system may provide many views for the same database
Data abstraction-Continue
Conceptual level
Logical Level
Different users for the same database
Data abstraction
View Level
Instance of schemes
• The collection of database at a particular moment is called the instance of the database
• The overall design of the database is called the database scheme
Types of database models
Data model is a collection of conceptual tools for describing data, data relationship, data semantics, and consistency constraints. There are mainly three types of models
• Object-based logical models Are used to describe data at the
conceptual and view level. Example of these the Entity-Relationship model and object-oriented model
Data models- Continue
• Record-based logical models Are used to describe data at the conceptual and
view level. Example of these are: Network model, Hierarchical model, and relational model.
• Physical data models Are used to describe data at the physical level
(bytes and words). It is mainly deal with hardware.
Entity-Relationship (E-R) model
• It is based on simulation of the real world which consists of basic objects called entities and relationship among these objects
• The overall logical data structure of a database can be expressed graphically by an E-R diagram. Which consists of rectangle (entity), ellipse (attribute), diamond(relationship), and lines.
E-R Diagram for customer CustomAcct relationship associate a customer with each account he has
Customer
Street
NameSSN
Account
NumberBalance
CustomAcct
Date
3 Tables can represent the above relation
12
3
Aggregation
Aggregation is an abstraction through which relationships are treated as higher-level entities i.e. express relationships among relationships
Users
ProjectEmployee Work
Name SSN Number
Hours
Type
Machinery
E-R diagram with aggregation
5 tables canrepresent thediagram
Generalization
In E-R diagram generalization is depicted through a triangle labeled ISA (is a). The attribute of higher level entity are said to be inherited by lower level entity. e.g. both saving and checking account inherit the attributes of account
Balance
Account
Interest rateOver draft
Checking accountSaving account
ISA
Account number
E-R diagram with generalization
3 tables canrepresent thediagram
E-R diagram Table
• An Entity (E) with attributes a1..an can be represented by a table called E with n columns for each attribute.
• Each row in this table corresponds to one entity of the entity set E
Let D1 ==> set of all account number
D2===> set of all balance
Any row consists of 2 tuples (v1,v2) e.g. (255,3000)
The set of all possible rows is the Cartesian product of D1 and D2 i.e. = D1 X D2
For a table with n columns the
total number of rows = D1 X D2 X ..X Dn-1 X Dn
255
560
215
3000Account_no
452
323
balance
32223455521000
456780
Mapping constraints
• Mapping cardinalities express the number of entities to which another entity can be associated via a relationship
• For a binary relationship set R between entity set A and B the mapping can be one-one (1-1), one-many(1-M), many-one (M-1), and many-many(M-M)
a1a2
b1b2b3
1-M relation
Object oriented Model
The basic unit that an object-oriented (OO-DBMS)
manages is the object. It is based on four basic
concepts of abstraction:
• Classification: Mapping of several objects
(instances) to common class
• Generalization: Group several classes which
have the same properties in common (roads,
railway)-transportation network
Object oriented Model-Continue
• Association: Relation between similar
objects is considered a higher level set
object
• Aggregation: Objects which consist of
several other objects (Composed objects)
Object oriented Model-Continue• OO model uses objects rather than records to manage data
• An object is a collection of data elements and operations
that together are considered a single entity
• An object has associated with it a set of variables that
contain the data for the object, a set of messages to which
the object respond, and a method which response to the
message
Object Oriented Model-Continue
• Once the structure is setup, the details of it
need not be user visible
• This approach has the attraction that query is
very natural
• A geographic data handling systems employ
this model are:
TIGRIS, DAPLEX, and PROBE
• It is application in GIS is recommended
Object oriented Model-Continue
• Objects are typed and the format and
operations of an object instance are the same
as some object prototype
• Example of an object might be a lake:• List of border chain: C1, C2, C3, Cn
• List of nodes: N1, N2, N3, Nn
• Attribute: Depth, soil type
Object oriented Model-Continue
For example student can be a superclass. First and secondyear student may represented by a classes that are specialization of a student class variables and methods specific to first year students are associated with fist year student class.Variables and methods that apply both to first and second year students are associated with student class.The variables associated with each class may be:Student: Name, ID, addressFirst year student: Subject Second year student:Practical course
Student
First year Second year
Hierarchical Model
• Based on Tree structure (child-parent)
• No element can have more than one parent
• Requires knowledge by the user of the
actual storage scheme used by the DBMS
• Examples of database are:
System2000 and IMS
• Not commonly applicable in GIS
Network Model
• Organized data in a network or plex
structure (child-parent)
• Children may have more than one parent
• The query language is procedural
• Examples of database are:
DBMS-10, DMS1100, IDMS
• Not commonly applicable in GIS
A1
B1 B2
C1 C2 C3C4 C5 C6
Root (Parent)
ChildParent
ChildChild
Database tree structure (Hierarchical, Network)
Relational model• A relational database consists of a collection of
tables, each of which is assigned a unique name• The relational models differs from network and
hierarchical models in that it does not use pointers or links. Instead , the relational model relate records by the value they contain.This freedom from the use of pointers allows formal mathematical foundation to be defined
• Examples of RDBMS are Oracle, Informix, and Sybase
Reasons to use Relational Model• Independence of the physical data storage and logical
database structure. Results in users do not need to understand the underlying physical layout of the data to access data from a logical structure, such as a table
• Variable and easy access to all data. Results in access to data is not predefined as in hierarchical databases in which users must understand and navigate through the hierarchy to retrieve data
• Flexible in database design. i.e complex objects are expressed as simple tables and relationships
• Applying relational design methods reduces data redundancy (Normalization) and storage requirements
Relational DBMS
• Aspects of an RDBMS
– Structures: Well defined objects– Operations: Clearly defined actions– Integrity Rules: Rules that control which
operations are allowed on the data and structures of the database
Relational DBMS
• Components of a Relational Database– Table: collection of rows all containing the
same columns– Row: Horizontal components of a table.
Consists of values for each column. Each row is equivalent to a record
– Column: Vertical component of a table. Each column in the record is often referred to as a field
Relational DBMS
• Relational Database Rules– Each column in a table must be unique– The order of the rows in a table is not meaningful– The order of the columns in a table is not
meaningful– All data in a column must be the same type– Every table has a primary key, each column in the
primary key must have a value
Relational DBMS
• Primary Key and Foreign Key– Relational database use primary keys and
foreign keys to allow mapping of information from one table to another
– A foreign key is column or group of columns in a table whose value matches those of the primary key of another table
– Values in primary key column must be unique e.g. social security number (SSN)
Relational DBMS
• Primary Key and Foreign Key– Referential Integrity refers to the integrity of
the reference from the primary key in one table to a foreign key in another table.
Relationships between Tables
• One-to-One
• One-to-Many
• Many-to-One
• Many-to-Many
Relational DBMS
• Relational Database Example (1-1)
05-01-94 70
Weather table
Location table
Washington
Seattlecity_name
AmsterdamWarsawTokyo
Washington
country_nameUnited StatesNeatherlands
PolandJapan
05-01-94Primary KeyForeign Key
Primary Key
city_name
AmsterdamWarsawTokyo
Washington
measurement_dt
05-01-9405-01-9405-01-9405-01-94
avg_temp
47436055
One-to-One
NAME AREA POP1990
Caroline 331.8066 27035Talbot 238.2847 30549Dorchester 534.1747 30236Wicomico 383.3481 74339Worcester 457.6503 35028Somerset 269.267 23440
NAME POP1997
Caroline 29424Talbot 32565Dorchester 29953Wicomico 79716Worcester 41885Somerset 24251
NAME AREA POP1990 POP1997
Caroline 331.8066 27035 29424Talbot 238.2847 30549 32565Dorchester 534.1747 30236 29953Wicomico 383.3481 74339 79716Worcester 457.6503 35028 41885Somerset 269.267 23440 24251
Relational DBMS
• Relational Database Example (1-M)
comp_nameKotrayaH.PlazaKomtar
N.Shops444555622
N.Banks704743
Complexes table
comp_nameKotraya
Kotraya Kotraya Kotraya
shop_ownerALiTanLee
RamanNora
Owner table
Foreign Key
Primary Key
Kotraya
One-to-Many
Dept. Advisor
Agri ShorterAgri MollettNat Sci Ruby
Advisor Name
Shorter AaronRuby BrianMollett DevinShorter EricShorter JulieRuby Leigh
Mollett RonnieMollett Ryan
Dept. Advisor Name
Agri Shorter AaronNat Sci Ruby BrianAgri Mollett DevinAgri Shorter EricAgri Shorter JulieNat Sci Ruby Leigh
Agri Mollett RonnieAgri Mollett Ryan
Many-to-One
SITE AREA Veg Type
WV1 331.8066 D1WV2 238.2847 C2MD1 534.1747 D1MD2 383.3481 S3
Veg Type Name Dominant pH of Soil
C2 Forest Coniferous Scrub Pine 6.5D1 Forest Deciduous Maple 8S3 Wetland Swamp Skunk Cabbage 5
SITE AREA Veg Type Name Dominant pH of Soil
WV1 331.8066 D1 Forest Deciduous Maple 8WV2 238.2847 C2 Forest Coniferous Scrub Pine 6.5MD1 534.1747 D1 Forest Deciduous Maple 8MD2 383.3481 S3 Wetland Swamp Skunk Cabbage 5
Data Definition Language (DDL)and Data Manipulation Language (DML)
• DDL store files that contains data about data (metadata). For example storage of structure in data dictionary
• DML enable users to access or manipulate data (retrieval, insertion, deletion). The part of DML that involves information retrieval is called a query language (QL)
Types of DML
• There are two types of DML, procedural and non procedural
• Procedural DML: Require the user to specify what data is needed and how to get it
• Non-procedural DML: Require the user to specify what data is needed without specify how to get it
Query language (QL)
• QL is the language in which a user requests information from the database. Example of QL are relational algebra (procedural) and tuple relational calculus (nonprocedural)
• The most common query languages are Structured Query Language (SQL),Query By Example (QBE), and Quel
• SQL has gain wide acceptance in commercial products
Relational algebra (RA)
• RA is a procedural query language. It consists of a set of operations that take one or two relations as input and produce a new relation as a result
• The fundamental operations in RA are select(6), project(II), Cartesian product(X) , rename, union(U), and set difference(-)
• Other operations include intersection, natural join, division, and assignment
• To select those tuples (rows) of the AccCust relation where customer name is “John” it could be written as:
6name = “John” (AccCust)
The results may be one or more records
or street = “ spring”
6street = “Spring” (AccCust)
• Comparisons can done using =, >, <, >=,etc.
• Let E1 and E2 be relational algebra expressions. Then the following are all relational algebra expressions:
• E1 U E2 union
• E1 - E2 set difference(-)
• E1 X E2 Cartesian product
Structured Query Language (SQL)
• SQL is the standard relational database language
• SQL include commands not only restricted to query but to other functions such as defining relation, deleting relations, creating indices, and modifying relation scheme, access right, integrity, and transaction control
• Basic structure of an SQL expression consists of three clauses: Select, from, and where
• A typical SQL query has the formselect A1, A2, ..., An
from r1, r2, .., rm where P
Each Ai represents an attribute and each ri a relation. P is a predicate (selection). The list of attributes A1..An can be replaced by (*) to select all attributes
• The result of an SQL query is a relation (table)
• The keyword “distinct” is used after select to force the elimination of duplicates. For example, find all customer names having a balance equal $ 6000
select distinct customer-name from AccCust where balance = 6000
• SQL supports union, intersect, and minus
• SQL uses logical connectives and, or, and not and operators (+, -, *, /) on values from tuples
e.g.
select distinct customer-name from AccCust
where balance between 600 and 700
• Typical Roles and Privileges
Setting up User Accounts and Privileges
User 1 User 2
create table create view alter table select update
User 3 User 4
Role 1
– In a relational database, data is stored in tables– Users with connect privileges can create tables– The creator of the table becomes the owner– Users cannot access the tables unless
permission is explicitly given– Users can modify the database by deletion i.e.
remove selected tuples, insertion i.e. insert tuples, or update i.e change a value in a tuple without changing all values in the tuple
Creating Tables
Field types
• INTEGERS e.g. number of population, age• REAL (DECIMAL) e.g. income, salary• CHARACTER e.g. names, description • DATES e.g. date of flood• IMAGES (Multimedia) e.g. image of flood
(*.tif) • SOUND e.g. Sound of thunder (*.wav)• MOVIE e.g. recording film (*.avi, *.mov)
• Granting Object Privileges– Only owners of a database object can grant
object privileges to other users– Only users with DBA privileges can grant
object privileges to roles
• Indexes are optional structures that can be created on any column or set of columns in a table to speed up viewing and retrieval of data rows.
• An index is a list of keywords with the location of the keyword information
• Users can create indexes
Creating and Displaying Indexes
– Index is conceptually similar to an index at the end of a book or cataloging books in a library
– Database index is not a physical structure but a logical structure. The RDBMS, not the user uses the database index. The DBMS chooses the best index to use in the table.
– Index can dramatically speed up data access but will require additional space and may slow down certain SQL operations
Index Concept
• Export and Import utilities allows data to be moved in and out of the native database. For Example, converting database tables to other format such Comma Delimited Format (CDF), Space Delimited Format (SDF), *.dbf, and ASCII flat files (Text)
Exporting and Importing Data
Relational Database design
In the database design considerations must be paid to:
• Repetition of information (Normalization)
• Loss of information
• Number of fields and records required, some RDBMS supports limited number of fields and records (e.g. 255 fields)
• Nature of database, some RDMBS supports only textual data, some supports multimedia (sound, image, etc.)
Database quality
Check accuracy of attribute database by Performing quality assurance / quality control (QA/QC) on the database by:
• Check correctness of data
• Check mis-typing errors
Storage Capacity Terminology
NCGIA
Symbol Term Bytes Actual Bytes
K Kilobyte 1000 1024
M Megabyte 1,000,000 1,048,576
G Gigabyte 1,000,000,000 1,073,741,824
Boolean expressions
• AND
• OR
• NOT
AND
A AND B = ResultT T TT F FF T FF F F
Boolean Operatorsboth expressions are true
Elevation >= 100 AND Veg Type = conif
Elevation Veg Type Rainfall pH
100 Mixed 1 7.0250 conif 3 6.550 conif 2 4.0225 decid 1 8.0
OR
A OR B = ResultT T TT F TF T TF F F
Boolean Operatorsat least one expression is true
Rainfall > 2 OR Veg type = decid
Elevation Veg Type Rainfall pH
100 Mixed 1 7.0250 conif 3 6.550 conif 2 4.0225 decid 1 8.0
NOT
Negates the Boolean value.
Returns TRUE if the Boolean is FALSE
Returns FALSE if the Boolean is TRUE
Boolean OperatorsExcludes informationNOT pH <= 6.5
Elevation Veg Type Rainfall pH
100 Mixed 1 7.0250 conif 3 6.550 conif 2 4.0225 decid 1 8.0
Operators
- equals
- greater than
- less than
- not equal to
Operators
- greater than or equal to
- less than or equal to
- expressions enclosed in
parentheses are evaluated first
Query Syntax
[ ] around field names
_ in place of spaces in field names
“ ” around string values
( ) grouping and evaluation order
* multiple character wildcard
? Single character wildcard
Date format yyyymmdd
Query Syntax Examples
([Roof_type] = "tile”)([Owner_age] > 65 and [Income] < 20000 )([Type_use] = "res*" and [Const_date] < 1955)([State_name] = “A*”)([Name] = “?athy”)([Date] = 19991027)
Common Query Syntax Errors
[Roof_type] = "tile”)
([Owner_age] > 65) and [Income] < 20000 )
([Type_use] = res*)
([Const_date] < “1955”)
Query SyntaxSalisbury Princess AnneHebron Willards
Correct Query Syntax
([City] = “Salisbury” OR [City] = “Princess Anne” OR [City] = “Hebron” OR [City] = “Willards”)
Incorrect Query Syntax
([City] = “Salisbury” OR “Princess Anne” OR “Hebron” OR “Willards”)
References
• Bailey, T. C. (1994). “A review of statistical spatial analysis in geographical information systems.” In: Fotheringham, A. S. and P. A. Rogerson (eds.). Spatial analysis and GIS. Taylor & Francis Ltd., London, UK.
• Batini, C. S. and S. B. Navathe (1992). Conceptual database design. The Benjamin/ Cummings Publishing, California, USA.
• Hoffer, J. A., J. S. Valacich and J. F. George (1996). Modern systems analysis and design. The Benjamin/Cummings Publishing, California, USA.
• Michael, A. and C. Smith (1996). Teach yourself database programming with Visual Basic 4 in 21 days. Sams Publishing, Indiana, USA