database 2nd semester

7/28/2019 Database 2nd Semester

1/18

Data:Data consists of a series of facts or statements that may have beencollected, stored, processed and/or manipulated but have not beenorganized or placed into context. When data is organized, it becomesinformation. Information can be processed and used to draw generalizedconclusions or knowledge.

Examples:

A file listing all of the orders placed through an online service is anexample of data. If we sort the data by ZIP code and summarize thenumber of orders that come from each city, we have created information.We can create knowledge by taking this information and makingstatements such as "Most orders for Widget X come from thenortheastern United States."

META DATA: Metadata is literally "data about data." This term refers toinformation about data itself -- perhaps the origin, size, formatting orother characteristics of a data item. In the database field, metadata isessential to understanding and interpreting the contents of a datawarehouse.

Data Base:A database is a collection of information organized intointerrelated tables of data and specifications of data objects.

Database - Advantages & Disadvantages

Advantages

Reduced data redundancy Reduced updating errors and increased consistency Greater data integrity and independence from applications programs Improved data access to users through use of host and query

languages Improved data security Reduced data entry, storage, and retrieval costs Facilitated development of new applications program

Disadvantages

Database systems are complex, difficult, and time-consuming todesign

Substantial hardware and software start-up costs Damage to database affects virtually all applications programs Extensive conversion costs in moving form a file-based system to a

database system Initial training required for all programmers and users


2/18

Hierarchical Model: The hierarchical data model organizes data in a tree structure. There is

a hierarchy of parent and child data segments. This structure implies that a record can have

repeating information, generally in the child data segments. Data in a series of records, which

have a set of field values attached to it. It collects all the instances of a specific record

together as a record type. These record types are the equivalent of tables in the relational

model, and with the individual records being the equivalent of rows. To create links betweenthese record types, the hierarchical model uses Parent Child Relationships. These are a 1:N

mapping between record types. This is done by using trees, like set theory used in the

relational model, "borrowed" from maths. For example, an organization might store

information about an employee, such as name, employee number, department, salary. The

organization might also store information about an employee's children, such as name and

date of birth. The employee and children data forms a hierarchy, where the employee data

represents the parent segment and the children data represents the child segment. If an

employee has three children, then there would be three child segments associated with one

employee segment. In a hierarchical database the parent-child relationship is one to many.

This restricts a child segment to having only one parent segment. Hierarchical DBMSs were

popular from the late 1960s, with the introduction of IBM's Information Management System(IMS) DBMS, through the 1970s.

Network Model: The popularity of the network data model coincided with the popularity of

the hierarchical data model. Some data were more naturally modeled with more than one

parent per child. So, the network model permitted the modeling of many-to-many

relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL)

formally defined the network model. The basic data modeling construct in the network model

is the set construct. A set consists of an owner record type, a set name, and a member record

type. A member record type can have that role in more than one set, hence the multiparent

concept is supported. An owner record type can also be a member or owner in another set.

The data model is a simple network, and link and intersection record types (called junction

records by IDMS) may exist, as well as sets between them . Thus, the complete network of

relationships is represented by several pairwise sets; in each set some (one) record type is

owner (at the tail of the network arrow) and one or more record types are members (at the

head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is

permitted. The CODASYL network model is based on mathematical set theory.

Relational Model:(RDBMS - relational database management system) A database based on

the relational model developed by E.F. Codd. A relational database allows the definition ofdata structures, storage and retrieval operations and integrity constraints. In such a database

the data and relations between them are organised in tables. A table is a collection of records

and each record in a table contains the same fields.

Properties of Relational Tables:

Values Are Atomic

Each Row is Unique

Column Values Are of the Same Kind

The Sequence of Columns is Insignificant

The Sequence of Rows is Insignificant

Each Column Has a Unique Name


3/18

Certain fields may be designated as keys, which means that searches for specific values of

that field will use indexing to speed them up. Where fields in two different tables take values

from the same set, a join operation can be performed to select related records in the two

tables by matching values in those fields. Often, but not always, the fields will have the same

name in both tables. For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a

given customer's bill you would sum the prices of all products ordered by that customer by

joining on the product-code fields of the two tables. This can be extended to joining multiple

tables on multiple fields. Because these relationships are only specified at retreival time,

relational databases are classed as dynamic database management system. The

RELATIONAL database model is based on the Relational Algebra.

DBMS:Stands for "Database Management System." In short, a DBMS is a database program.Technically speaking, it is a software system that uses a standard method of cataloging,

retrieving, and running queries on data. The DBMS manages incoming data, organizes it, and

provides ways for the data to be modified or extracted by users or other programs.

Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server,

FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro. Since there are so many database

management systems available, it is important for there to be a way for them to communicate

with each other. For this reason, most database software comes with an Open Database

Connectivity (ODBC) driver that allows the database to integrate with other databases. For

example, common SQL statements such as SELECT and INSERT are translated from a

program's proprietary syntax into a syntax other databases can understand.

DBMS Functions: There are several functions that a DBMS performs to ensure data

integrity and consistency of data in the database. The ten functions in the DBMS are: data

dictionary management, data storage management, data transformation and presentation,

security management, multiuser access control, backup and recovery management, data

integrity management, database access languages and application programming interfaces,

database communication interfaces, and transaction management.

1. Data Dictionary ManagementData Dictionary is where the DBMS stores definitions of the data elements and their

relationships (metadata). The DBMS uses this function to look up the required data

component structures and relationships. When programs access data in a database they are

basically going through the DBMS. This function removes structural and data dependency

and provides the user with data abstraction. In turn, this makes things a lot easier on the end

user. The Data Dictionary is often hidden from the user and is used by DatabaseAdministrators and Programmers.

2. Data Storage Management: This particular function is used for the storage of data and

any related data entry forms or screen definitions, report definitions, data validation rules,

procedural code, and structures that can handle video and picture formats. Users do not need

to know how data is stored or manipulated. Also involved with this structure is a term called

performance tuning that relates to a databases efficiency in relation to storage and access

speed.

3. Data Transformation and Presentation: This function exists to transform any data

entered into required data structures. By using the data transformation and presentation

function the DBMS can determine the difference between logical and physical data formats.

4. Security Management: This is one of the most important functions in the DBMS.Security management sets rules that determine specific users that are allowed to access the
http://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbc


4/18

database. Users are given a username and password or sometimes through biometric

authentication (such as a fingerprint or retina scan) but these types of authentication tend to

be more costly. This function also sets restraints on what specific data any user can see or

manage.

5. Multiuser Access Control

Data integrity and data consistency are the basis of this function. Multiuser accesscontrol is a very useful tool in a DBMS, it enables multiple users to access the database

simultaneously without affecting the integrity of the database.

6. Backup and Recovery ManagementBackup and recovery is brought to mind whenever there is potential outside threats

to a database. For example if there is a power outage, recovery management is how long it

takes to recover the database after the outage. Backup management refers to the data safety

and integrity; for example backing up all your mp3 files on a disk.

7. Data Integrity ManagementThe DBMS enforces these rules to reduce things such as data redundancy, which is

when data is stored in more than one place unnecessarily, and maximizing data consistency,making sure database is returning correct/same answer each time for same question asked.

ERD: Entity-Relationship Diagrams (ERD)Data models are tools used in analysis to describe

the data requirements and assumptions in the system from a top-down perspective. They also

set the stage for the design of databases later on in the SDLC.There are three basic elements

in ER models:Entities are the "things" about which we seek information.Attributes are the

data we collect about the entities.Relationships provide the structure

Elements of ER Model:

ENTITIES

According to the English Dictionary [19], an entity is "Something that exists as a particular

and discrete unit ", and adapted from [20] , a definition that can be the starting point in the

discussion is that an entity is something that has a distinct, separate existence, though it need

not be a material existence. In the context of databases, entities became the main discrete data

objects that make the subject of collecting and keeping data. There have been developed

techniques and methodologies of identifying entities for a certain problem or world which we

do not cover here, we mention just that in the general case entities are usually recognizableconcrete or abstract concepts.


5/18

Examples of entities are: person, places, things, or events which have relevance to the

database.

RELATIONSHIPS

A relationship represents an association between two or more entities. An example of arelationship in the medical world would be:

any drug is produced by one manufacturer.

a disease presents zero, one or more symptoms.

a drug causes more reactions, and a reaction can be caused by one or more drugs.

ATTRIBUTES

An attribute is the abstraction used to describe one property of the entity set ( the totality of

the one entity instances makes up the entity set ). A value is an attribute's particular instance.

The entire collection of possible values an attribute can have is called the domain of an

attribute.

The classification of attributes is done according to their role : whether they identify an

instance of an entity or not. If they do, they are called identifiers, and if describe a non-unique

characteristic they are called descriptors. Identifiers are generally named keys.

Having introduced all the key elements of the ER model ( entities, attributes and relationships

), the introduction on special entity types and the discussion about relationships classification

( which have been intentionally omitted ) is required.

Entity Relationship Diagrams:Entity Relationship Diagrams (ERDs) illustrate the logical

structure of databases.

Entity Relationship Diagram Notations

EntityAn entity is an object or concept about which you want to store information.

Weak Entity

A weak entity is an entity that must defined by a foreign key relationship with another entity

as it cannot be uniquely identified by its own attributes alone.


6/18

Attribute:

A key attribute is the unique, distinguishing characteristic of the entity. For example, an

employee's social security number might be the employee's key attribute.

Multivalued attribute

A multivalued attribute can have more than one value. For example, an employee entity can

have multiple skill values.

Derived attribute

A derived attribute is based on another attribute. For example, an employee's monthly salary

is based on the employee's annual salary.

Relationships

Relationships illustrate how two entities share information in the database structure.

how to draw relationships:

First,connect the two entities, then drop the relationship notation on the line.

Cardinality

Cardinality specifies how many instances of an entity relate to one instance of another

entity.Ordinality is also closely linked to cardinality. While cardinality specifies the

occurences of a relationship, ordinality describes the relationship as either mandatory or

optional. In other words, cardinality specifies the maximum number of relationships and

ordinality specifies the absolute minimum number of relationships.
http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-2http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-2http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1


7/18

Recursive relationship

In some cases, entities can be self-linked. For example, employees can supervise otheremployees.

Cardinality Notations

Cardinality specifies how many instances of an entity relate to one instance of another entity.

Ordinality is also closely linked to cardinality. While cardinality specifies the occurances of a

relationship, ordinality describes the relationship as either mandatory or optional. In other

words, cardinality specifies the maximum number of relationships and ordinality specifies the

absolute minimum number of relationships. When the minimum number is zero, the

relationship is usually called optional and when the minimum number is one or more, the

relationship is usually called mandatory.

There are many notation styles that express cardinality and they are all supported by

SmartDraw.

Degrees of Relationship (Cardinality)

The degree of relationship (also known as cardinality) is the number ofoccurrences in one entity which are associated (or linked) to the numberof occurrences in another.

There are three degrees of relationship, known as:

1. one-to-one (1:1)


8/18

2. one-to-many (1:M)3. many-to-many (M:N)

One-to-one (1:1)

This is where one occurrence of an entity relates to only one occurrence inanother entity.

A one-to-one relationship rarely exists in practice, but it can. However,you may consider combining them into one entity.

For example, an employee is allocated a company car, which can only bedriven by that employee.

Therefore, there is a one-to-one relationship between employee andcompany car.

One-to-Many Relationships

One-to-Many (1:M)

Is where one occurrence in an entity relates to many occurrences inanother entity.

For example, taking the employee and department entities shown on theprevious page, an employee works in one department but a departmenthas many employees.

Therefore, there is a one-to-many relationship between department andemployee.

Many-to-Many (M:N)

This is where many occurrences in an entity relate to many occurrences inanother entity.

The normalisation process discussed earlier would prevent any suchrelationships but the definition is included here for completeness.


9/18

As with one-to-one relationships, many-to-many relationships rarelyexist. Normally they occur because an entity has been missed.

For example, an employee may work on several projects at the same timeand a project has a team of many employees.

Therefore, there is a many-to-many relationship between employee andproject.

Normalization

Normalization is the process of eliminating redundant data from database tables. There are 5

levels of normalization - also termed as the 5 normal forms. Most database designers stop at

either levels 2 or 3. This is because although normalization reduces data redundancy, it also

results in increased complexity which will cause a decrease in performance. This decrease in

performance is due to the requirement to join the normalized tables in queries. Levels 4 and 5

of normalization remains largely an academic field of study and is not applied in industry.

Anomaly in database:Data anomaly means same type of data present in database as a duplication.So while updating

or modifying the information in the database we gets the problem of data inconsistency to

solve this problem we need to remove the duplicated data

Functional Dependency:A functional dependency occurs when one attribute in a relation uniquely determines another

attribute. This can be written A -> B which would be the same as stating "B is functionally

dependent upon A."

Examples:In a table listing employee characteristics including Social Security Number

(SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN ->

name) because an employee's name can be uniquely determined from their SSN. However,

the reverse statement (name -> SSN) is not true because more than one employee can have

the same name but different SSNs.

First Normal Form (1NF)

The next step is to transform the table of unnormalized data into firstnormal form (1NF). The rule is:remove any repeating attributes to anew table. The process is as follows:

Identify repeating attributes.

Remove these repeating attributes to a new table together witha copy of the key from the UNF table.


10/18

Assign a key to the new table (and underline it). The key from theoriginal unnormalised tablealways becomes part of the key of thenew table.

A compound key is created. The value for this key must be uniquefor each entity occurrence.

Second normal form (2NF). At this level of normalization, each column in a

table that is not a determiner of the contents of another column must itself be a

function of the other columns in the table. For example, in a table with three columns

containing customer ID, product sold, and price of the product when sold, the price

would be a function of the customer ID (entitled to a discount) and the specific

product.

Third normal form (3NF). At the second normal form, modifications are stillpossible because a change to one row in a table may affect data that refers to this

information from another table. For example, using the customer table just cited,

removing a row describing a customer purchase (because of a return perhaps) will

also remove the fact that the product has a certain price. In the third normal form,

these tables would be divided into two tables so that product pricing would be

tracked separately.

Normalization in Detail

What is Normalization ? Why should we use it?

Normalization is a database design technique which organizes tables in a manner that

reduces redundancy and

dependency of data.

It divides larger tables to smallertables and link them using relationships.

The inventor of the relational model Edgar Codd proposed the theory of normalization with

the introduction of

FirstNormal Form and he continued to extend theory with Second andThird Normal

Form. Later he joined with

Raymond F. Boyce to develop the theory ofBoyce-Codd Normal Form.

Theory of Normalization is still being developed further. For example there are discussions even on 6th Normal Form.


11/18

But in most practical applications normalization achieves its best in 3rd Normal Form. The evolution of

Normalization

theories is illustrated below-

Lets learn Normalization with practical example -

Assume a video library maintains a database of movies rented out. Without any normalization all information is stored in

one table as shown below.

Table 1

Here you see Movies Rented column has multiple values.

Now lets move in to 1st Normal Form

1NF Rules

Each table cell should contain single value.

Each record needs to be unique.

The above table in 1NF-


12/18

Table 1 : In 1NF Form

Before we proceed lets understand a few things --

What is a KEY ?

A KEY is a value used to uniquely identify a record in a table. A KEY could be a single column or combination of

multiple columns

Note: Columns in a table that are NOT used to uniquely identify a record are called non-key columns.

What is a primary Key?

A primary is a single column values used to uniquely identify a database record.

It has following attributes

A primary key cannot be NULL

A primary key value must be unique

The primary key values can not be changed

The primary key must be given a value when a new record is inserted.

What is a composite Key?

A composite key is a primary key composed of multiple columns used to identify a record uniquely

In our database , we have two people with the same name Robert Phil but they live at different places.


13/18

Hence we require both Full Name and Address to uniquely identify a record. This is a composite key.

Lets move into 2NF

2NF Rules

Rule 1- Be in 1NF

Rule 2- Single Column Primary Key

It is clear that we cant move forward to make our simple database in 2nd Normalization form unless we partition the

table above.

Table 1

Table 2

We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information.

Table 2 contains information on movies rented.

We have introduced a new column called Membership_id which is the primary key for table 1. Records can be

uniquely identified in Table 1 using membership id

Introducing Foreign Key!

In Table 2, Membership_ID is the foreign Key


14/18

Foreign Key references primary key of another Table!It helps connect your Tables

A foreign key can have a different name from its primary key

It ensures rows in one table have corresponding rows in another

Unlike Primary key they do not have to be unique. Most often they arentForeign keys can be null even though primary keys can not


15/18

Why do you need a foreign key ?

Suppose an idiot inserts a record in Table B such as

You will only be able to insert values into your foreign key that exist in the unique key in the parent table.

This helps in referential integrity.


16/18

The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id

from Table1

Now , if somebody tries to insert a value in the membership id field that does not exist in the parent table ,

an error will be shown!

What is a transitive functional dependencies?

A transitive functional dependency is when changing a non-key column , might cause any of the other non-key

columns to change

Consider the table 1. Changing the non-key column Full Name , may change Salutation.

Lets move ito 3NF

3NF Rules

Rule 1- Be in 2NF

Rule 2- Has no transitive functional dependencies

To move our 2NF table into 3NF we again need to need divide our table.


17/18

TABLE 1

Table 2

Table 3

We have again divided our tables and created a new table which stores Salutations.

There are no transitive functional dependencies and hence our table is in 3NF

In Table 3 Salutation ID is primary key and in Table 1 Salutation ID is foreign to primary key in Table 3

Now our little example is in a level that cannot further be decomposed to attain higher forms of normalization.

In fact it is already in higher normalization forms. Separate efforts for moving in to next levels of normalization

are normally needed in complex databases. However we will be discussing about next levels of normalizations

in brief in the following.

Boyce-Codd Normal Form (BCNF)


18/18

Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.

Sometimes is BCNF is also referred as 3.5 Normal Form.

4th Normal Form

If no database table instance contains two or more, independent and multivalued data describing the relevant entity ,

then it is in 4th Normal Form.

5th Normal Form

A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed in to any number of smaller tables

without loss of data.

6th Normal Form

6th Normal Form is not standardized yet however it is being discussed by database experts for some time. Hopefully

we would have clear standardized definition for 6th Normal Form in near future.

database 2nd semester

Documents