database 2nd semester
TRANSCRIPT
-
7/28/2019 Database 2nd Semester
1/18
Data:Data consists of a series of facts or statements that may have beencollected, stored, processed and/or manipulated but have not beenorganized or placed into context. When data is organized, it becomesinformation. Information can be processed and used to draw generalizedconclusions or knowledge.
Examples:
A file listing all of the orders placed through an online service is anexample of data. If we sort the data by ZIP code and summarize thenumber of orders that come from each city, we have created information.We can create knowledge by taking this information and makingstatements such as "Most orders for Widget X come from thenortheastern United States."
META DATA: Metadata is literally "data about data." This term refers toinformation about data itself -- perhaps the origin, size, formatting orother characteristics of a data item. In the database field, metadata isessential to understanding and interpreting the contents of a datawarehouse.
Data Base:A database is a collection of information organized intointerrelated tables of data and specifications of data objects.
Database - Advantages & Disadvantages
Advantages
Reduced data redundancy Reduced updating errors and increased consistency Greater data integrity and independence from applications programs Improved data access to users through use of host and query
languages Improved data security Reduced data entry, storage, and retrieval costs Facilitated development of new applications program
Disadvantages
Database systems are complex, difficult, and time-consuming todesign
Substantial hardware and software start-up costs Damage to database affects virtually all applications programs Extensive conversion costs in moving form a file-based system to a
database system Initial training required for all programmers and users
-
7/28/2019 Database 2nd Semester
2/18
Hierarchical Model: The hierarchical data model organizes data in a tree structure. There is
a hierarchy of parent and child data segments. This structure implies that a record can have
repeating information, generally in the child data segments. Data in a series of records, which
have a set of field values attached to it. It collects all the instances of a specific record
together as a record type. These record types are the equivalent of tables in the relational
model, and with the individual records being the equivalent of rows. To create links betweenthese record types, the hierarchical model uses Parent Child Relationships. These are a 1:N
mapping between record types. This is done by using trees, like set theory used in the
relational model, "borrowed" from maths. For example, an organization might store
information about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name and
date of birth. The employee and children data forms a hierarchy, where the employee data
represents the parent segment and the children data represents the child segment. If an
employee has three children, then there would be three child segments associated with one
employee segment. In a hierarchical database the parent-child relationship is one to many.
This restricts a child segment to having only one parent segment. Hierarchical DBMSs were
popular from the late 1960s, with the introduction of IBM's Information Management System(IMS) DBMS, through the 1970s.
Network Model: The popularity of the network data model coincided with the popularity of
the hierarchical data model. Some data were more naturally modeled with more than one
parent per child. So, the network model permitted the modeling of many-to-many
relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL)
formally defined the network model. The basic data modeling construct in the network model
is the set construct. A set consists of an owner record type, a set name, and a member record
type. A member record type can have that role in more than one set, hence the multiparent
concept is supported. An owner record type can also be a member or owner in another set.
The data model is a simple network, and link and intersection record types (called junction
records by IDMS) may exist, as well as sets between them . Thus, the complete network of
relationships is represented by several pairwise sets; in each set some (one) record type is
owner (at the tail of the network arrow) and one or more record types are members (at the
head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is
permitted. The CODASYL network model is based on mathematical set theory.
Relational Model:(RDBMS - relational database management system) A database based on
the relational model developed by E.F. Codd. A relational database allows the definition ofdata structures, storage and retrieval operations and integrity constraints. In such a database
the data and relations between them are organised in tables. A table is a collection of records
and each record in a table contains the same fields.
Properties of Relational Tables:
Values Are Atomic
Each Row is Unique
Column Values Are of the Same Kind
The Sequence of Columns is Insignificant
The Sequence of Rows is Insignificant
Each Column Has a Unique Name
-
7/28/2019 Database 2nd Semester
3/18
Certain fields may be designated as keys, which means that searches for specific values of
that field will use indexing to speed them up. Where fields in two different tables take values
from the same set, a join operation can be performed to select related records in the two
tables by matching values in those fields. Often, but not always, the fields will have the same
name in both tables. For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a
given customer's bill you would sum the prices of all products ordered by that customer by
joining on the product-code fields of the two tables. This can be extended to joining multiple
tables on multiple fields. Because these relationships are only specified at retreival time,
relational databases are classed as dynamic database management system. The
RELATIONAL database model is based on the Relational Algebra.
DBMS:Stands for "Database Management System." In short, a DBMS is a database program.Technically speaking, it is a software system that uses a standard method of cataloging,
retrieving, and running queries on data. The DBMS manages incoming data, organizes it, and
provides ways for the data to be modified or extracted by users or other programs.
Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server,
FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro. Since there are so many database
management systems available, it is important for there to be a way for them to communicate
with each other. For this reason, most database software comes with an Open Database
Connectivity (ODBC) driver that allows the database to integrate with other databases. For
example, common SQL statements such as SELECT and INSERT are translated from a
program's proprietary syntax into a syntax other databases can understand.
DBMS Functions: There are several functions that a DBMS performs to ensure data
integrity and consistency of data in the database. The ten functions in the DBMS are: data
dictionary management, data storage management, data transformation and presentation,
security management, multiuser access control, backup and recovery management, data
integrity management, database access languages and application programming interfaces,
database communication interfaces, and transaction management.
1. Data Dictionary ManagementData Dictionary is where the DBMS stores definitions of the data elements and their
relationships (metadata). The DBMS uses this function to look up the required data
component structures and relationships. When programs access data in a database they are
basically going through the DBMS. This function removes structural and data dependency
and provides the user with data abstraction. In turn, this makes things a lot easier on the end
user. The Data Dictionary is often hidden from the user and is used by DatabaseAdministrators and Programmers.
2. Data Storage Management: This particular function is used for the storage of data and
any related data entry forms or screen definitions, report definitions, data validation rules,
procedural code, and structures that can handle video and picture formats. Users do not need
to know how data is stored or manipulated. Also involved with this structure is a term called
performance tuning that relates to a databases efficiency in relation to storage and access
speed.
3. Data Transformation and Presentation: This function exists to transform any data
entered into required data structures. By using the data transformation and presentation
function the DBMS can determine the difference between logical and physical data formats.
4. Security Management: This is one of the most important functions in the DBMS.Security management sets rules that determine specific users that are allowed to access the
http://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbchttp://www.techterms.com/definition/odbc -
7/28/2019 Database 2nd Semester
4/18
database. Users are given a username and password or sometimes through biometric
authentication (such as a fingerprint or retina scan) but these types of authentication tend to
be more costly. This function also sets restraints on what specific data any user can see or
manage.
5. Multiuser Access Control
Data integrity and data consistency are the basis of this function. Multiuser accesscontrol is a very useful tool in a DBMS, it enables multiple users to access the database
simultaneously without affecting the integrity of the database.
6. Backup and Recovery ManagementBackup and recovery is brought to mind whenever there is potential outside threats
to a database. For example if there is a power outage, recovery management is how long it
takes to recover the database after the outage. Backup management refers to the data safety
and integrity; for example backing up all your mp3 files on a disk.
7. Data Integrity ManagementThe DBMS enforces these rules to reduce things such as data redundancy, which is
when data is stored in more than one place unnecessarily, and maximizing data consistency,making sure database is returning correct/same answer each time for same question asked.
ERD: Entity-Relationship Diagrams (ERD)Data models are tools used in analysis to describe
the data requirements and assumptions in the system from a top-down perspective. They also
set the stage for the design of databases later on in the SDLC.There are three basic elements
in ER models:Entities are the "things" about which we seek information.Attributes are the
data we collect about the entities.Relationships provide the structure
Elements of ER Model:
ENTITIES
According to the English Dictionary [19], an entity is "Something that exists as a particular
and discrete unit ", and adapted from [20] , a definition that can be the starting point in the
discussion is that an entity is something that has a distinct, separate existence, though it need
not be a material existence. In the context of databases, entities became the main discrete data
objects that make the subject of collecting and keeping data. There have been developed
techniques and methodologies of identifying entities for a certain problem or world which we
do not cover here, we mention just that in the general case entities are usually recognizableconcrete or abstract concepts.
-
7/28/2019 Database 2nd Semester
5/18
Examples of entities are: person, places, things, or events which have relevance to the
database.
RELATIONSHIPS
A relationship represents an association between two or more entities. An example of arelationship in the medical world would be:
any drug is produced by one manufacturer.
a disease presents zero, one or more symptoms.
a drug causes more reactions, and a reaction can be caused by one or more drugs.
ATTRIBUTES
An attribute is the abstraction used to describe one property of the entity set ( the totality of
the one entity instances makes up the entity set ). A value is an attribute's particular instance.
The entire collection of possible values an attribute can have is called the domain of an
attribute.
The classification of attributes is done according to their role : whether they identify an
instance of an entity or not. If they do, they are called identifiers, and if describe a non-unique
characteristic they are called descriptors. Identifiers are generally named keys.
Having introduced all the key elements of the ER model ( entities, attributes and relationships
), the introduction on special entity types and the discussion about relationships classification
( which have been intentionally omitted ) is required.
Entity Relationship Diagrams:Entity Relationship Diagrams (ERDs) illustrate the logical
structure of databases.
Entity Relationship Diagram Notations
EntityAn entity is an object or concept about which you want to store information.
Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity
as it cannot be uniquely identified by its own attributes alone.
-
7/28/2019 Database 2nd Semester
6/18
Attribute:
A key attribute is the unique, distinguishing characteristic of the entity. For example, an
employee's social security number might be the employee's key attribute.
Multivalued attribute
A multivalued attribute can have more than one value. For example, an employee entity can
have multiple skill values.
Derived attribute
A derived attribute is based on another attribute. For example, an employee's monthly salary
is based on the employee's annual salary.
Relationships
Relationships illustrate how two entities share information in the database structure.
how to draw relationships:
First,connect the two entities, then drop the relationship notation on the line.
Cardinality
Cardinality specifies how many instances of an entity relate to one instance of another
entity.Ordinality is also closely linked to cardinality. While cardinality specifies the
occurences of a relationship, ordinality describes the relationship as either mandatory or
optional. In other words, cardinality specifies the maximum number of relationships and
ordinality specifies the absolute minimum number of relationships.
http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-2http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-2http://www.smartdraw.com/resources/tutorials/Drawing-ER-Diagrams-1 -
7/28/2019 Database 2nd Semester
7/18
Recursive relationship
In some cases, entities can be self-linked. For example, employees can supervise otheremployees.
Cardinality Notations
Cardinality specifies how many instances of an entity relate to one instance of another entity.
Ordinality is also closely linked to cardinality. While cardinality specifies the occurances of a
relationship, ordinality describes the relationship as either mandatory or optional. In other
words, cardinality specifies the maximum number of relationships and ordinality specifies the
absolute minimum number of relationships. When the minimum number is zero, the
relationship is usually called optional and when the minimum number is one or more, the
relationship is usually called mandatory.
There are many notation styles that express cardinality and they are all supported by
SmartDraw.
Degrees of Relationship (Cardinality)
The degree of relationship (also known as cardinality) is the number ofoccurrences in one entity which are associated (or linked) to the numberof occurrences in another.
There are three degrees of relationship, known as:
1. one-to-one (1:1)
-
7/28/2019 Database 2nd Semester
8/18
2. one-to-many (1:M)3. many-to-many (M:N)
One-to-one (1:1)
This is where one occurrence of an entity relates to only one occurrence inanother entity.
A one-to-one relationship rarely exists in practice, but it can. However,you may consider combining them into one entity.
For example, an employee is allocated a company car, which can only bedriven by that employee.
Therefore, there is a one-to-one relationship between employee andcompany car.
One-to-Many Relationships
One-to-Many (1:M)
Is where one occurrence in an entity relates to many occurrences inanother entity.
For example, taking the employee and department entities shown on theprevious page, an employee works in one department but a departmenthas many employees.
Therefore, there is a one-to-many relationship between department andemployee.
Many-to-Many (M:N)
This is where many occurrences in an entity relate to many occurrences inanother entity.
The normalisation process discussed earlier would prevent any suchrelationships but the definition is included here for completeness.
-
7/28/2019 Database 2nd Semester
9/18
As with one-to-one relationships, many-to-many relationships rarelyexist. Normally they occur because an entity has been missed.
For example, an employee may work on several projects at the same timeand a project has a team of many employees.
Therefore, there is a many-to-many relationship between employee andproject.
Normalization
Normalization is the process of eliminating redundant data from database tables. There are 5
levels of normalization - also termed as the 5 normal forms. Most database designers stop at
either levels 2 or 3. This is because although normalization reduces data redundancy, it also
results in increased complexity which will cause a decrease in performance. This decrease in
performance is due to the requirement to join the normalized tables in queries. Levels 4 and 5
of normalization remains largely an academic field of study and is not applied in industry.
Anomaly in database:Data anomaly means same type of data present in database as a duplication.So while updating
or modifying the information in the database we gets the problem of data inconsistency to
solve this problem we need to remove the duplicated data
Functional Dependency:A functional dependency occurs when one attribute in a relation uniquely determines another
attribute. This can be written A -> B which would be the same as stating "B is functionally
dependent upon A."
Examples:In a table listing employee characteristics including Social Security Number
(SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN ->
name) because an employee's name can be uniquely determined from their SSN. However,
the reverse statement (name -> SSN) is not true because more than one employee can have
the same name but different SSNs.
First Normal Form (1NF)
The next step is to transform the table of unnormalized data into firstnormal form (1NF). The rule is:remove any repeating attributes to anew table. The process is as follows:
Identify repeating attributes.
Remove these repeating attributes to a new table together witha copy of the key from the UNF table.
-
7/28/2019 Database 2nd Semester
10/18
Assign a key to the new table (and underline it). The key from theoriginal unnormalised tablealways becomes part of the key of thenew table.
A compound key is created. The value for this key must be uniquefor each entity occurrence.
Second normal form (2NF). At this level of normalization, each column in a
table that is not a determiner of the contents of another column must itself be a
function of the other columns in the table. For example, in a table with three columns
containing customer ID, product sold, and price of the product when sold, the price
would be a function of the customer ID (entitled to a discount) and the specific
product.
Third normal form (3NF). At the second normal form, modifications are stillpossible because a change to one row in a table may affect data that refers to this
information from another table. For example, using the customer table just cited,
removing a row describing a customer purchase (because of a return perhaps) will
also remove the fact that the product has a certain price. In the third normal form,
these tables would be divided into two tables so that product pricing would be
tracked separately.
Normalization in Detail
What is Normalization ? Why should we use it?
Normalization is a database design technique which organizes tables in a manner that
reduces redundancy and
dependency of data.
It divides larger tables to smallertables and link them using relationships.
The inventor of the relational model Edgar Codd proposed the theory of normalization with
the introduction of
FirstNormal Form and he continued to extend theory with Second andThird Normal
Form. Later he joined with
Raymond F. Boyce to develop the theory ofBoyce-Codd Normal Form.
Theory of Normalization is still being developed further. For example there are discussions even on 6th Normal Form.
-
7/28/2019 Database 2nd Semester
11/18
But in most practical applications normalization achieves its best in 3rd Normal Form. The evolution of
Normalization
theories is illustrated below-
Lets learn Normalization with practical example -
Assume a video library maintains a database of movies rented out. Without any normalization all information is stored in
one table as shown below.
Table 1
Here you see Movies Rented column has multiple values.
Now lets move in to 1st Normal Form
1NF Rules
Each table cell should contain single value.
Each record needs to be unique.
The above table in 1NF-
-
7/28/2019 Database 2nd Semester
12/18
Table 1 : In 1NF Form
Before we proceed lets understand a few things --
What is a KEY ?
A KEY is a value used to uniquely identify a record in a table. A KEY could be a single column or combination of
multiple columns
Note: Columns in a table that are NOT used to uniquely identify a record are called non-key columns.
What is a primary Key?
A primary is a single column values used to uniquely identify a database record.
It has following attributes
A primary key cannot be NULL
A primary key value must be unique
The primary key values can not be changed
The primary key must be given a value when a new record is inserted.
What is a composite Key?
A composite key is a primary key composed of multiple columns used to identify a record uniquely
In our database , we have two people with the same name Robert Phil but they live at different places.
-
7/28/2019 Database 2nd Semester
13/18
Hence we require both Full Name and Address to uniquely identify a record. This is a composite key.
Lets move into 2NF
2NF Rules
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key
It is clear that we cant move forward to make our simple database in 2nd Normalization form unless we partition the
table above.
Table 1
Table 2
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information.
Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1. Records can be
uniquely identified in Table 1 using membership id
Introducing Foreign Key!
In Table 2, Membership_ID is the foreign Key
-
7/28/2019 Database 2nd Semester
14/18
Foreign Key references primary key of another Table!It helps connect your Tables
A foreign key can have a different name from its primary key
It ensures rows in one table have corresponding rows in another
Unlike Primary key they do not have to be unique. Most often they arentForeign keys can be null even though primary keys can not
-
7/28/2019 Database 2nd Semester
15/18
Why do you need a foreign key ?
Suppose an idiot inserts a record in Table B such as
You will only be able to insert values into your foreign key that exist in the unique key in the parent table.
This helps in referential integrity.
-
7/28/2019 Database 2nd Semester
16/18
The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id
from Table1
Now , if somebody tries to insert a value in the membership id field that does not exist in the parent table ,
an error will be shown!
What is a transitive functional dependencies?
A transitive functional dependency is when changing a non-key column , might cause any of the other non-key
columns to change
Consider the table 1. Changing the non-key column Full Name , may change Salutation.
Lets move ito 3NF
3NF Rules
Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF we again need to need divide our table.
-
7/28/2019 Database 2nd Semester
17/18
TABLE 1
Table 2
Table 3
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies and hence our table is in 3NF
In Table 3 Salutation ID is primary key and in Table 1 Salutation ID is foreign to primary key in Table 3
Now our little example is in a level that cannot further be decomposed to attain higher forms of normalization.
In fact it is already in higher normalization forms. Separate efforts for moving in to next levels of normalization
are normally needed in complex databases. However we will be discussing about next levels of normalizations
in brief in the following.
Boyce-Codd Normal Form (BCNF)
-
7/28/2019 Database 2nd Semester
18/18
Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.
Sometimes is BCNF is also referred as 3.5 Normal Form.
4th Normal Form
If no database table instance contains two or more, independent and multivalued data describing the relevant entity ,
then it is in 4th Normal Form.
5th Normal Form
A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed in to any number of smaller tables
without loss of data.
6th Normal Form
6th Normal Form is not standardized yet however it is being discussed by database experts for some time. Hopefully
we would have clear standardized definition for 6th Normal Form in near future.