Download - MI0034 SET-1 & SET-2
SIKKIM MANIPAL UNIVERSITY
DATABASE MANAGEMENT SYSTEM – 4 CREDITS
SUBJECT CODE – MI0034
ASSIGNMENT SET - 1
Q1. Differentiate between Traditional File System & Modern Database System?
Describe the properties of Database & the Advantage of Database?
Traditional File Systems Vs Modern Database Management Systems
Traditional File System Modern Database Management
Systems
Traditional File system is the system that
was followed before the advent of DBMS
i.e., it is the older way.
This is the Modern way which has
replaced the older concept of File
system.
In Traditional file processing, data
definition is part of the application
program and works with only specific
application.
Data definition is part of the DBMS
Application is independent and can be
used with any application.
File systems are Design Driven; they
require design/coding change when new
kind of data occurs.
E.g.: In a traditional employee the master
file has Emp_name, Emp_id, Emp_addr,
Emp_design, Emp_dept, Emp_sal, if we
want to insert one more column Emp_Mob
number then it requires a complete
One extra column (Attribute) can be
added without any difficulty
Minor coding changes in the
Application program may be required.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
restructuring of the file or redesign of the
application code, even though basically all
the data except that in one column is the
same.
Traditional File system keeps redundant
[duplicate] information in many locations.
This might result in the loss of Data
Consistency.
For e.g.: Employee names might exist in
separate files like Payroll Master File and
also in Employee Benefit Master File etc.
Now if an employee changes his or her last
name, the name might be changed in the
pay roll master file but not be changed in
Employee Benefit Master File etc. This
might result in the loss of Data
Consistency.
Redundancy is eliminated to the
maximum extent in DBMS if properly
defined.
In a File system data is scattered in various
files, and each of these files may be in
different formats, making it difficult to
write new application programs to retrieve
the appropriate data.
This problem is completely solved
here.
Security features are to be coded in the
Application Program itself.
Coding for security requirements is
not required as most of them have
been taken care by the DBMS.
Hence, a data base management system is the software that manages a database, and is
responsible for its storage, security, integrity, concurrency, recovery and access.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The DBMS has a data dictionary, referred to as system catalog, which stores data about
everything it holds, such as names, structure, locations and types. This data is also
referred to as Meta data.
Properties of Database
The following are the important properties of Database:
1. A database is a logical collection of data having some implicit meaning. If the data are
not related then it is not called as proper database.
E.g. Student studying in class II got 5th rank.
Stud_name Class Rank obtained
Vijetha Class II 5th
2. A database consists of both data as well as the description of the database structure and
constraints.
E.g.
Field Name Type Description
Stud_name Character It is the students name
Class Alpha numeric It is the class of the student
3. A database can have any size and of various complexity. If we consider the above
example of employee database the name and address of the employee may consists of
very few records each with simple structure. E.g.
Emp_name Emp_id Emp_addr Emp_desig Emp_Sal
Prasad 100 Shubhodaya, Near Project 40000
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Katariguppe Big Bazaar,
BSK II stage, Bangalore
Leader
Usha 101 #165, 4th main Chamrajpet,
Bangalore
Software
engineer
10000
Nupur 102 #12, Manipal Towers,
Bangalore
Lecturer 30000
Peter 103 Syndicate house, Manipal IT executive 15000
Like this there may be n number of records.
4. The DBMS is considered as general-purpose software system that facilitates the
process of defining, constructing and manipulating databases for various applications.
5. A database provides insulation between programs, data and data abstraction. Data
abstraction is a feature that provides the integration of the data source of interest and
helps to leverage the physical data however the structure is.
6. The data in the database is used by variety of users for variety of purposes. For E.g.
when you consider a hospital database management system the view of usage of patient
database is different from the same used by the doctor. In this case the data are stored
separately for the different users. In fact it is stored in a single database. This property is
nothing but multiple views of the database.
7. Multiple user DBMS must allow the data to be shared by multiple users
simultaneously. For this purpose the DBMS includes concurrency control software to
ensure that the updating done to the database by variety of users at single time must get
updated correctly. This property explains the multiuser transaction processing.
Advantages of using DBMS
1. Redundancy is reduced
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
2. Data located on a server can be shared by clients
3. Integrity (accuracy) can be maintained
4. Security features protect the Data from unauthorized access
5. Modern DBMS support internet based application.
6. In DBMS the application program and structure of data are independent.
7. Consistency of Data is maintained
8. DBMS supports multiple views. As DBMS has many users, and each one of them
might use it for different purposes, and may require to view and manipulate only on a
portion of the database, depending on requirement.
Q2. What is the disadvantage of sequential file organization? How do you overcome it?
What are the advantages & disadvantages of Dynamic Hashing?
In this file organization, the records of the file are stored one after another both
physically and logically. That is, record with sequence number 16 is located just after the
15th record.
A record of a sequential file can only be accessed by reading all the previous
records.
The records are discriminated from one another using the record length declared
in the associated FD statement of the FILE-SECTION. For example, If the record
structure that the programmer has declared is 52 bytes, blocks of 52 byte data (records)
are assumed to placed one after another in the file. If the programmer is reading the data
in a sequential file, every READ statement brings 52 bytes into the memory.
If the file contains, say, 52 byte records; but the programmer tries to read this file
with a program which has declared 40 byte records (i.e the total length of the FD
structure is 40 bytes), the program will certainly read some pieces of information into the
memory but the after the first READ statement, some meaningless pieces of records will
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
be brought into memory and the program will start processing some physical records
which contain logically meaningless data.
It is the programmer's responsibility to take care of the record sizes in files. You
must be careful when declaring record structures for files. Any mistake you make in
record sizes will cause your program to read/write erroneous information. This is
especially dangerous if the file contents are being altered (changed, updated).
Since the records are simply appended to each other when building SEQUENTIAL files,
you simply end up with a STREAM of byte. If this string does not contain any "Carriage
Return/Line Feed" control characters in it, the whole file will appear as a single LINE of
character and would be impsossible to process with regular text editors. As you should
know by now, text editors are good in reading/writing/modifying text files. These
programs will assume that the file consists of LINES and expect the lines to separated
from each other by a pair of control characters called "Carriage Return/Line Feed" (or
CR/LF).
COBOL has a special type of sequential file organization, which is called the
LINE SEQUENTIAL ORGANIZATION which places a CR/LF pair at the end of each
record while adding records to a file and expect such a pair while reading. LINE
SEQUENTIAL files are much easier to use while developing programs because you can
always use a simple text editor to see the contents of your sequential file and trace/debug
your program.
Please note that LINE SEQUENTIAL files have two extra characters for each
record. For files, which have millions of records, this might use up a significant amount
of disk space.
SEQUENTIAL files have only one ACCESS MODE and that is "sequential
access". Therefore you need not specify an ACCESS MODE in the SELECT statement.
Typical SELECT statements for SEQUENTIAL files are :
SELECT MYFILE ASSIGN TO DISK "MYFILE.DAT"
ORGANIZATION IS SEQUENTIAL.
SELECT MYFILE-2 ASSIGN TO DISK "C:\DATADIR\MYFILE2.TXT"
ORGANIZATION IS LINE SEQUENTIAL.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
In the FILE-SECTION, you must provide FD blocks for each file; hence for a sequential
file you could have something like :
FD MYFILE.
01 MYFILE-REC.
02 M-NAME PIC X(16).
02 M-SURNAME PIC X(16).
02 M-BIRTHDATE.
03 M-BD-YEAR PIC 9999.
03 M-BD-MONTH PIC 99.
03 M-BD-DAY PIC 99.
Note : You must NOT provide record fields for the extra two CR/LF bytes in record
descriptions of LINE SEQ files. Once you declare the file to be a LINE SEQ file, these
two extra bytes are automatically taken in consideration and added for all new records
that are added to a file.
It is NOT possible to delete records of a seq file. If you do not want a specific record to
be kept in a seq file any more, all you can do is to modify the contents of the record so
that it contains some special values that your program will recognize as deleted
(remember to open the file in I-O mode and REWRITE a new record).
Can be only processed sequentially. If you need to read record number N, you must first
read the previous N-1 records. Especially no good for programs that make frequent
searches in the file.
To overcome these disadvantages some of the following hashing techniques are in use:
One disadvantage of sequential file organization is that we must use linear search or
binary search to locate the desired record and that results in more i/o operations. In this
there are a number of unnecessary comparisons. In hashing technique or direct file
organization, the key value is converted into an address by performing some arithmetic
manipulation on the key value, which provides very fast access to records.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Let us consider a hash function h that maps the key value k to the value h(k). The
VALUE h(k) is used as an address.
The basic terms associated with the hashing techniques are:
1) Hash table: It is simply an array that is having address of records.
2) Hash function: It is the transformation of a key into the corresponding location or
address in the hash table (it can be defined as a function that takes key as input and
transforms it into a hash table index).
3) Hash key: Let 'R' be a record and its key hashes into a key value called hash key.
The different hashing techniques are:
Internal Hashing
Dynamic hashing
Extendable hashing
Dynamic Hashing Technique
A major drawback of the static hashing is that address space is fixed. Hence it is difficult
to expand or shrink the file dynamically.
In dynamic hashing, the access structure is built on the binary representation of the hash
value. In this, the number of buckets is not fixed [as in regular hashing] but grows or
diminishes as needed. The file can start with a single bucket, once that bucket is full, and
a new record is inserted, the bucket overflows and is slit into two buckets. The records
are distributed among the two buckets based on the value of the first [leftmost] bit of their
hash values. Records whose hash values start with a 0 bit are stored in one bucket, and
those whose hash values start with a 1 bit are stored in another bucket. At this point, a
binary tree structure called a directory is built. The directory has two types of nodes.
1. Internal nodes: Guide the search, each has a left pointer corresponding to a 0 bit, and a
right pointer corresponding to a 1 bit.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
2. Leaf nodes: It holds a pointer to a bucket a bucket address.
Each leaf node holds a bucket address. If a bucket overflows, for example: a new record
is inserted into the bucket for records whose hash values start with 10 and causes
overflow, then all records whose hash value starts with 100 are placed in the first split
bucket, and the second bucket contains those whose hash value starts with 101. The
levels of a binary tree can be expanded dynamically.
Advantages of dynamic hashing:
1. The main advantage is that splitting causes minor reorganization, since only the
records in one bucket are redistributed to the two new buckets.
2. The space overhead of the directory table is negligible.
3. The main advantage of extendable hashing is that performance does not degrade as the
file grows. The main space saving of hashing is that no buckets need to be reserved for
future growth; rather buckets can be allocated dynamically.
Disadvantages:
1. The index tables grow rapidly and too large to fit in main memory. When part of the
index table is stored on secondary storage, it requires extra access.
2. The directory must be searched before accessing the bucket, resulting in two-block
access instead of one in static hashing.
3. A disadvantage of extendable hashing is that it involves an additional level of
indirection.
Q3. What is relationship type? Explain the difference among a relationship instance,
relationship type & a relation set?
Relationships: In the real world, items have relationships to one another. E.g.: A book is
published by a particular publisher. The association or relationship that exists between
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
the entities relates data items to each other in a meaningful way. A relationship is an
association between entities.
A collection of relationships of the same type is called a relationship set.
A relationship type R is a set of associations between E, E2..En entity types
mathematically, R is a set of relationship instances ri.
E.g.: Consider a relationship type WORKS_FOR between two entity types - employee
and department, which associates each employee with the department the employee
works for. Each relationship instance in WORKS_FOR associates one employee entity
and one department entity, where each relationship instance is ri which connects
employee and department entities that participate in ri.
Employee el, e3 and e6 work for department d1, e2 and e4 work for d2 and e5 and e7
work for d3. Relationship type R is a set of all relationship instances.
Some instances of the WORKS_FOR relationship
Degree of relationship type: The number of entity sets that participate in a relationship
set. A unary relationship exists when an association is maintained with a single entity.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
A binary relationship exists when two entities are associated.
A tertiary relationship exists when there are three entities associated.
Degree of relationship type
Constraints on Relationship Types
Relationship types usually have certain constraints that limit the possible combination of
entities that may participate in the relationship instance.
E.g.: If the company has a rule that each employee must work for exactly one department.
The two main types of constraints are cardinality ratio and participation constraints.
The cardinality ratio specifies the number of entities to which another entity can be
associated through a relationship set.
Mapping cardinalities should be one of the following.
One-to-One: An entity in A is associated with at most one entity in B and vice versa.
Employee can manage only one department and that a department has only one manager.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
One-to-Many: An entity in A is associated with any number in B. An entity in B
however can be associated with at most one entity in A.
Each department can be related to numerous employees but an employee can be related to
only one department
Many-to-One: An entity in A is associated with at most one entity in B. An entity in B
however can be associated with any number of entities in A. Many depositors deposit
into a single account.
Man-to-Many: An entity in A is associated with any number of entities in B and an
entity in B is associated with any number of entities in A.
An employee can work on several projects and several employees can work on a project.
Participation Roles: There are two ways an entity can participate in a relationship where
there are two types of participations.
1. Total: The participation of an entity set E in a relationship set R is said to be total if
every entity in E participates in at lest one relationship in R. Every employee must work
for a department. The participation of employee in WORK FOR is called total.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Some instances of the WORKS_FOR relationship
Total participation is sometimes called existence dependency.
2. Partial: If only some entities in E participate in relationship in R, the participation of
entity set E in relationship R is said to be partial.
Some instances of the WORKS_FOR relationship
We do not expect every employee to manage a department, so the participation of
employee in MANAGES relationship type is partial.
Q4. What is SQL? Discuss.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
SQL stands for Structured Query language
The Structured Query language is used for programming the database. The history of
SQL began in an IBM laboratory in San Jose, California, where SQL was developed in
the late 1970's. SQL stands for structured Query Language. It is a non-procedural
language, meaning that SQL describes what data to retrieve delete or insert, rather than
how to perform the operation. It is the standard command set used to communicate with
the RDBMS.
A SQL query is not-necessarily a question to the database. It can be command to do
one of the following.
Create or delete a table.
Insert, modify or delete rows.
Search several rows for specifying information and return the result in order.
Modify security information.
THE SQL STATEMENT CAN BE GROUPED INTO FOLLOWING CATEGORIES.
1. DDL(Data Definition Language)
2. DML(Data Manipulation Language)
3. DCL(Data Control Language)
4. TCL(Transaction Control Language)
DDL: Data Definition Language
DML: (Data Manipulation Language)
The DML statements are used to alter the database tables in someway. The UPDATE,
INSERT and DELETE statements alter existing rows in a database tables, insert new
records into a database table, or remove one or more records from the database table.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
DCL: (Data Control Language)
The Data Control Language Statements are used to Grant permission to the user and
Revoke permission from the user, Lock certain Permission for the user.
SQL DBA>Revoke Import from Akash;
SQL DBA>Grant all on emp to public;
SQL DBA>Grant select, Update on EMP to L.Suresh;
SQlDBA>Grant ALL on EMP to Akash with Grant option;
Revoke: Revoke takes out privilege from one or more tables or views.
SQL DBA>rEOKE UPDATE, DELETE FROM l.sURES;
SQL DBA>Revoke all on emp from Akash
TCL: (Transaction Control Language)
It is used to control transactions.
Eg: Commit
The DDL statement provides commands for defining relation schema i,e for creating
tables, indexes, sequences etc. and commands for dropping, altering, renaming objects.
SQL* COMMANDS:
This subsection discusses the often used commands in sql environment. For example, if
your SQL commands are saved in a file (typically in note pad) you can execute this file
using an "at" @command, similarly there are a number of such commands:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
@<file
name> Runs the command file stored in <filename>
DATA TYPES IN ORACLE 8i SQL:
The fig. shows the complete listing of the data types allowed in oracle.
DATA TYPE DESCRIPTION
CHAR (sizs) Fixed length character. Max = 2000
VARCHAR2(size) Variable length character. Max=4000
DATE Date, valid range is from jan1,4712 B.C to.
DEC 31,4712 A.D.
BLOB Binary large object Max =4GB
CLOB Character large object Max=4G.B.
BFILE Pointer to binary OS file
LONG Character data of variable size, Max=2G.B.SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
LONG RAW Raw binary data. Rest is same as long
NUMBER (size) Numbers. Max. size =40 digits
NUMBER(size,d) Numbers, range=1.0E-130 to 9.9E125
DECIMAL Same as NUMBER. Size /d can't be specified
FLOAT Same as NUMBER
INTEGER Same as NUMBER Size /d can't be specified
SMALLINT Same as NUMBER
Q5. What is Normalization? Discuss various types of Normal Forms?
Introduction to Normalization
In Unit 8 you learnt about how to create database using SQL. In this unit we will
study how to normalize the data in the database. Normalization is the process of building
database structures to store data, because any application ultimately depends on its data
structures. If the data structures are poorly designed, the application will start from a poor
foundation. This will require a lot more work to create a useful and efficient application.
Normalization is the formal process for deciding which attributes should be grouped
together in a relation. Normalization serves as a tool for validating and improving the
logical design, so that the logical design avoids unnecessary duplication of data, i.e. it
eliminates redundancy and promotes integrity. In the normalization process we analyze
and decompose the complex relations into smaller, simpler and well-structured relations.
Normal forms Based on Primary Keys
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
A relation schema R is in first normal form if every attribute of R takes only
single atomic values. We can also define it as intersection of each row and column
containing one and only one value. To transform the un-normalized table (a table that
contains one or more repeating groups) to first normal form, we identify and remove the
repeating groups within the table.
E.g.
Dept.
D.Name D.No D. location
R&D 5 [England, London, Delhi)
HRD 4 Bangalore
Figure A
Consider the figure that each dept can have number of locations. This is not in
first normal form because D.location is not an atomic attribute. The dormain of D
location contains multivalues.
There is a technique to achieve the first normal form. Remove the attribute
D.location that violates the first normal form and place into separate relation
Dept_location
Functional dependency: The concept of functional dependency was introduced by
Prof. Codd in 1970 during the emergence of definitions for the three normal forms. A
functional dependency is the constraint between the two sets of attributes in a relation
from a database.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Given a relation R, a set of attributes X in R is said to functionally determine
another attribute Y, in R, (X->Y) if and only if each value of X is associated with one
value of Y. X is called the determinant set and Y is the dependant attribute.
For eg.: Consider the example of STUDENT_COURSE database.
STUDENT_COURSE
In the STUDENT_COURSE database (Sid) student id does not uniquely identifies
a tuple and therefore it cannot be a primary key. Similarly (Cid) course id cannot be
primary key. But the combination of (Sid, Cid) uniquely identifies a row in
STUDENT_COURSE. Therefore (Sid, Cid) is the primary key which uniquely retrieves
Sname, address, course, marks, which are dependent on the primary key.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Second Normal Form (2 NF)
A second normal form is based on the concept of full functional dependency. A relation
is in second normal form if every non-prime attribute A in R is fully functionally
dependent on the Primary Key of R.
Emp_Project:Emp_ProjectFigure 9.2: 2NF and 3 NF, (a) Normalizing EMP_PROJ
into 2NF relations
Normalizing EMP_DEPT into 3NF relations
A Partial functional dependency is a functional dependency in which one or more non-
key attributes are functionally dependent on part of the primary key. It creates a
redundancy in that relation, which results in anomalies when the table is updated.
Third Normal Form (3NF)
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
This is based on the concept of transitive dependency. We should design relational
schema in such a way that there should not be any transitive dependencies, because they
lead to update anomalies. A functional dependence [FD] x->y in a relation schema 'R' is a
transitive dependency. If there is a set of attributes 'Z' Le x->, z->y is transitive. The
dependency SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN-
>Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset [part] of the key.
According to codd's definition, a relational schema 'R is in 3NF if it satisfies 2NF
and no no_prime attribute is transitively dependent on the primary key. Emp_dept
relation is not in 3NF, we can normalize the above table by decomposing into E1 and E2.
Note: Transitive is a mathematical relation that states that if a relation is true between the
first value and the second value, and between the second value and the 3 rd value, then it is
true between the 1st and the 3rd value.
Example 2:
Consider a relation schema 'Lots' which describes the parts of land for sale in
various countries of a state. Suppose there are two candidate keys: property_ID and
{Country_name.lot#}; that is, lot numbers are unique only within each country, but
property_ID numbers are unique across countries for entire state.
Based on the two candidate keys property_ID and {country name,Lot} we know
that functional dependencies FD1 and FD2 hold. Suppose the following two additional
functional dependencies hold in LOTS.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
FD3: Country_name -> tax_rate
FD4: Area -> price
Here, FD3 says that the tax rate is fixed for a given country , FD4
says that price of a Lot is determined by its area, . The Lots relation schema
violates 2NF, because tax_rate is partially dependent upon candidate key
{ Country_namelot#} Due to this, it decomposes lots relation into two relations - lots1
and lots 2.
Lots1 violates 3NF, because price is transitively dependent on candidate key of Lots1 via
attribute area. Hence we could decompose LOTS1 into LOTS1A and LOTS1B.
A relation schema R is in 3NF when it satisfies the conditions below.
1. It is fully functionally dependent on every key of 'R'
2. It is non_transitively dependent on every key of 'R'
Fourth Normal Form (4NF)
Multi valued dependencies are based on the concept of first normal form, which
prohibits attributes having a set of values. If we have two or more multi valued
independent attributes in the same relation, we get into a situation where we have to
repeat every value of one of the attributes, with every value of the other attributes to keep
the relation state consistent, and to maintain independence among the attributes involved.
This constraint is specified by a Multi valued dependency.
Consider a table employee that has the attribute name, project and hobby.
An employee can work in more than one project and can have more than one
hobby.
The employees projects and hobbies are independent of one another.
A given project or hobby is associated with any number of employees.
To keep the Relation State consistent we must have separate tuples to represent every
combination of employee's project and employees hobbies.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The drawback of EMPLOYEE relation is redundant data. This redundant data leads to
update anomaly. For example, if we wish to add one more project on Sybase, so that
employ B is handling, then we must add two more tuples for each hobby. The values
Reading and Movie of hobby are repeated with each value of project. This redundancy is
undesirable. One way to remove redundancy is to decompose EMPLOYEE relation into
two relations PROJECT AND HOBBY.
NOW, if we wish to insert Sybase in PROJECT relation, then there is only one entry
required.
Definition (MVD): A relation R(X.Y.Z) is said to have multivalued dependency
if the set of Y values for a given [X,Z] pair does not depend on Z, but depends only on X,
then we say "X multi-determines y" or "y is multi-dependent on x". Then such
FD is called Multivalued Dependency (MVD) and is represented by double arrows
We can also define MVD as, for each value of X there is a set of values for Y, and a set
of values for Z. However, the set of values for Y and Z are independent of each other.
So wherever two independent one_to_many relationships (A:B and A:C) are mixed on
the same relation, a multivalued dependency arises. Multivalued dependency can be
avoided using the fourth normal form.
EMPLOYEE
NAME PROJECT HOBBY
A Microsoft Cricket
A Oracle Music
A Microsoft Music
A Oracle Cricket
B INTEL Movies
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
B Sybase Reading
B INTEL Reading
B Sybase Movies
Decomposed relation to reduce redundancy
PROJECT
NAME PROJECT
A Microsoft
A Oracle
B Intel
B Sybase
HOBBY
NAME PROJECT
A Cricket
A Music
B Movie
B Reading
Fourth Normal Form (4NF): The definition of 4NF is violated when a relation has
undesirable multivalued dependencies, and hence identify such relations and decompose
into 4NF relations.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Alternate definition: A relation R is said to be in 4NF if for every MVD that
holds over R, one of the following is true:
B A (trivial), or
AB = R or
A is a super key
The Employee relation is not in 4NF because of the non-trivial MVDs (project and hobby
attributes of employee relation are independent of each other) and NAME is not a super
key of EMPLOYEE. To make this relation into 4NF you have to decompose
EMPLOYEE to PROJECT AND HOBBY.
Q6. What do you mean by Shared Lock & Exclusive lock? Describe briefly two phase
locking protocol?
Shared Locks: It is used for read only operations, i.e., used for operations that do not
change or update the data.
E.G., SELECT statement:,
Shared locks allow concurrent transaction to read (SELECT) a data. No other
transactions can modify the data while shared locks exist. Shared locks are released as
soon as the data has been read.
Exclusive Locks: Exclusive locks are used for data modification operations, such as
UPDATE, DELETE and INSERT. It ensures that multiple updates cannot be made to the
same resource simultaneously. No other transaction can read or modify data when locked
by an exclusive lock.
Exclusive locks are held until transaction commits or rolls back since those are used for
write operations.
There are three locking operations: read_lock(X), write_lock(X), and unlock(X). A lock
associated with an item X, LOCK(X), now has three possible states: "read locked",
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
"write-locked", or "unlocked". A read-locked item is also called share-locked, because
other transactions are allowed to read the item, whereas a write-locked item is called
exclusive-locked, because a single transaction exclusive holds the lock on the item.
Each record on the lock table will have four fields: <data item name, LOCK,
no_of_reads, locking_transaction(s)>. The value (state) of LOCK is either read-locked or
write-locked.
read_lock(X):
B, if LOCK(X)='unlocked'
Then begin LOCK(X)"read-locked"
No_of_reads(x)1
end
else if LOCK(X)="read-locked"
then no_of_reads(X)no_of_reads(X)+1
else begin wait(until)LOCK(X)="unlocked" and
the lock manager wakes up the transaction);
goto B
end;
write_lock(X):
B: if LOCK(X)="unlocked"
Then LOCK(X)"write-locked";
else begin
wait(until LOCK(X)="unlocked" and
the lock manager wkes up the transaction);
goto B
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
end;
unlock(X):
if LOCK(X)="write-locked"
Then begin LOCK(X)"un-locked";
Wakeup one of the waiting transctions, if any
end
else if LOCK(X)=read-locked"
then begin
no_of_reads(X)no_of_reads(X)-1
if no_of_reads(X)=0
then begin LOCK(X)=unlocked";
wakeup one of the waiting transactions, if any
end
end;
The Two Phase Locking Protocol
The two phase locking protocol is a process to access the shared resources as their own
without creating deadlocks. This process consists of two phases.
1. Growing Phase: In this phase the transaction may acquire lock, but may not release any
locks. Therefore this phase is also called as resource acquisition activity.
2. Shrinking phase: In this phase the transaction may release locks, but may not acquire
any new locks. This includes the modification of data and release locks. Here two
activities are grouped together to form second phase.
IN the beginning, transaction is in growing phase. Whenever lock is needed the
transaction acquires it. As the lock is released, transaction enters the next phase and it can
stop acquiring the new lock request.SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
DATABASE MANAGEMENT SYSTEM – 4 CREDITS
SUBJECT CODE – MI0034
ASSIGNMENT SET - 2
Q1.Define Data Model & discuss the categories of Data Models? What is the difference
between logical data Independence & Physical Data Independence?
A database model is a theory or specification describing how a database is structured and
used. Several such models like Hierarchical model, Network model, Relational model
etc., have been suggested.
Data Model, Schemas and Instances:
Data Model It is a set of Concepts for viewing a set of data in a structured way.
This can be easily understood by professionals and non-technical users.
It can explain the way in which the organization uses and manages the information.
Concepts used in a Data Model
Entity
An entity is something that has a distinct, separate existence, though it need not be of a material existence.
E.g. - Employee.
Attribute
It is the property that describes an entity
It is a characteristic or property of an object, such as weight, size, or color
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Relationship
Describes the relationship between two or more entities
Schemas The description of the data base means defining the names, data type, size of a column in a table and database [actual data in the table] itself.
The description of a database is called the database schema [or the Meta data].
Description of a database is specified during database design and is not frequently changed.
Roll No.
Name
Semester
Branch
Instances The collection of data stored in the database at a particular moment is a database instance or database state or snapshot.
These changes very frequently due to addition, deletion and modification.
Roll No.NameSemesterBranch1Rajesh PrabhuiiE & C
Data independence is defined as the ability to modify a schema definition in one level
without affecting a schema definition in a higher level.
Physical data independence Logical data independence
This is the ability to modify the physical scheme without causing application programs to be rewritten. Modifications at this level are usually to improve performance.
This is the ability to modify the conceptual scheme without causing application programs to be rewritten. This is usually done when the logical structure of database is altered. Logical
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
data independence is harder to achieve, as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages.
Q2. What is a B+Trees? Describe the structure of both internal and leaf nodes of a
B+Tree?
Indexes are used to speed up the retrieval of records.
Indexes can be created using one or more columns, providing the basis for both rapid
random lookups and efficient ordering of access to records.
The disk space required to store the index is typically less than the storage of the table
(since indexes usually contain only the key-fields according to which the table is to be
arranged, and exclude all the other details in the table).
Index file consists of two fields, the first field contains the value and second field
contains the list of pointers to address values in the disk block
Searching an index is much faster than searching the table because the index is sorted and
its rows are very small.
Index access structure is usually defined on a single field of a file, called an indexing
field.
B + Tree Index Files
The main disadvantage of the index-sequential file organization is that performance
degrades as the file grows. A B+-tree index takes the form of a balanced tree in which
every path from the root of the tree to a leaf of the tree is of the same length.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
In a B- tree every value of the search field appears once at some level in the tree, along
with a data pointer [may be in internal nodes also]. In a B+-tree, data pointers [address of
a particular search value] are stored only at the leaf nodes of the tree; hence, the structure
of leaf nodes differs from the structure of internal nodes. The leaf nodes have an entry for
every value of the search field, along with a data pointer to the record.
A B+ tree is a multilevel index, but it has got different a structure. A typical node of the
B+ tree contains upto n-1 search key values such as k1,k2.n-1 and n pointers p1,p2..pn.
The search key values within a node are kept in sorted order, ki < kj.
The number of pointers in a node is called the fan out of the node.
The structure of a non-leaf node is the same as leaf nodes, except that all pointers are
pointers to tree nodes.
Each internal node is of the form >p1, k1,p2,k2.pq-1, kq-1, pq>
The root node has at least 2 tree pointers.
Each leaf node is of the form
<<k1, pr1>,<k2, pr2><kn-1, prn-1>, pnext>
each pri is a data pointer, and pnext points to the next leaf node of the B+ tree
All leaf nodes are at the same level.
Consider an example, assume that we wish to insert a record in a B+ tree of order n=3
and pleaf=2, first we observe that root is the only node in the tree, so it is also a leaf node.
As soon as more than one level is created, the tree is divided into internal nodes and leaf
nodes. Notice that every value must exist at the leaf level, because all the data pointers
are at the leaf level. However, only some values exist in internal nodes to guide the
search. Notice also that every value appearing in an internal node also appears in the sub
tree as the rightmost value.
Say for example, to insert 12, the node is split into two nodes.
The figure shows the two leaf nodes that result from inserting 12. An existing node
contains 7 and 8 and remaining value 12 in a new node. The first J = [((P leaf + 1)1/2)] =
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
3/2 = 2 entries in the original node are kept there and the remaining entries are moved to
a new leaf node. The Jth search value is replicated in the parent internal node, and an extra
pointer to the new node is created in the parent. If the parent internal node is full, it must
be split. This splitting can propagate all the way up to create a new root node.
Figure 4.5: An example of insertion in a B+ tree with p=3 and Pleaf=2
Q3. Describe Projection operation, Set theoretic operation & join operation?
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Project operation:
Projection operation is used to select only few columns from a table. the mathematical
symbol p<ATTRIBUTE LIST>(<relation>)
Here, <attribute list> is a list of attributes from the relation r hence the degree (number of
columns) of the result is equal to the number of attributes specified in the attribute list.
Eg 1. Select the name and salary of all the employees.
pNAME. SALARY (EMPLOYEE).
This query selected only name and salary attributes from relation EMPLOYEE
Eg. 2. Select names and addresses of all employees working for department 10.
pNAME, ADDRESS (DNO=10(EMPLOYEE)
Set theoretic operations:
These are used to merge the elements of two sets in various ways, including union,
intersection and difference. Three of these operations require the table to be union
compatible. The two relations are said to require the table to be union compatible. The
two relations are said to be union compatible if the following conditions are satisfied.
1. The two relation/tables (say R & S) that have the same number of columns (have the
same degree)
2. Each column of the first relation/table must be either the same data type as the
corresponding column of the second relation/table(s).
Relations R & S
Intersection (?):SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The intersection operation selects the common tuples from the two relations.
The result of the operation R?S is
Union ( ):
The result of this operation denoted by RS, is a relation that includes all tuples that are
either in R or in S or in both. Duplicate tuples will not appear in the output.
Difference ( ):
The result of the difference consists of all tuples in R but not in S
Cartesian products (X):
The Cartesian product or cross-product is a binary operation that is used to combine two
relations. Assuming R & S as relations with n and m attributes respectively, the Cartesian
products R x S can be written as,
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
R (A1, A2..An) x S (B1, B2.Bn)
The result of the above set operation is
Q (A1, A2..An, B1, B2.Bn)
Total number of columns in Q: degree (Q) = n + m
Total number of tuples in Q: count (Q) = Number of tuples in R* Number of tuples in S
Cartesian product of R and S can be written as,
The relation R has 2 columns and 3 tuples. The relation S has 2 columns and 3 tuples. So
the Cartesian product has 4
columns (2+2) and 6 tuples
(3 x 2).
The Cartesian product operation applied by itself is generally meaningless. It is useful
only when followed by selection and projection operations.
Renaming r (rho):
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
This operation is used to rename the relations or attributes. The symbol r(rho) is used to
denote the rename operator. In some situations, it is better to break down a complex
query into two or more simple querys. We must rename the relations that hold the
intermediate result relations. It improves the readability and facilitates better
understanding.
The syntax is as follows:
Rename <OLD TABLE> to <NEW TABLE>
Here S is
new new relation and R is original relation.
6.4.4 The
Join Operation
Join ( ): The capability of retrieving data from multiple tables using a single SQL
statement is one of the most powerful and useful features of RDBMS. It is the availability
of join operation. We know that one table may not give all the information about a
particular entity.
The join operation, denoted by is used to combine two relations to retrieve useful
information. A join operation matches data from two or more tables; based on the values
of one or more columns in each table, it allows us to process more than one table at a
time.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
For e.g.: The employee table gives only the department id's, if we want to know the
department name, then we have to get the information by joining employee table and
dept. table.
In join, only combinations of tuples satisfying the join condition appear in the result.
The general form of a Join operation is
R<join condition>S
For example by joining employee and department relations, we can get the name of the
department in which the employee is working (department name exists in department
table).
Select emp_no, ename, dept.dname from emp.dept
Where emp.deptno = dept.dept_no and
emp_no = &emp_no.
Emp_dept<--employee e.deptno=d.deptnoDEPT
Result<-IIemp.enam,dname)emp_dept)
The first operation in the joint operation will combine the tuples of the employee and
department relations on the basis of the dept no.to form a relation called emp_dept. Then
the PROJECT operation will create a relation RESULT with the attributes eno. Ename,
and dname. To perform join between two relations, there should be a common field
between them.
Theta Join: A join condition is of the form
<Condition>and<condition>and<condition>
Where each condition is of the form Ai 0 Bj (dept.deptno = emp.dept_no). Ai is an
attribute of R and Bj is an attribute of S. Ai and Bj have the same domain (same values)
and 0 is one of the comparison operators (=,<,<=,>,>=,!=).
A join operation with such a general join condition is called a "Theta join".
Equi Join: While joining if the comparison operator is = then it is equijoin.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Eg. Select emp_no.ename.dept.dname from emp.dept.
Where emp.deptno = dept.dept_no.
Natural Join: It is denoted by symbol. The standard definition of natural join
requires that the join attributes have the same name in both relations. In general, natural
join is performed by equating all attribute pairs that have the same name in the two
relations. The general format is:
Here list l specifies list of attributes from R and list2 specifies a list of attributes from S.
Here, the joining is done over the attribute DNumber of Department relation and DNum
of Project relation. In fact, DNum of Project is a foreign key which references DNumber
of Department. Generally, in a natural join, the joining attribute is implicitly considered.
Suppose the two relations have no attribute(s) in common, is simply the cross
product of these two relations. Joining can be done between any set of attributes and need
not be always with respect to the primary key and foreign key combinations.
The expected size of the join result divided by maximum size i.e. leads to a
relation called join selectively.SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Outer join:
It returns both matching and non matching rows. It differs from the inner join, in that the
rows in one table having no matching rows in the other table will also appear in the
results table, with nulls in the other attribute position, instead of being ignored as in the
case with the inner join. It outputs rows even if they do not satisfy the join condition; the
outer join operator is used with the table having n matching rows.
In the above example even though there is no matching row with B name, all workers are
listed along with age and skill. If there is no match, simply get an empty skill column.
The outer join can be used when we want to keep all the tuples in R or in S; those in both
relations, whether or not they have matching tuples in the other relation.
Left outer join: It is denoted by . The left outer join operation keeps every tuple in
the first or left relation R in relation . If no matching tuple is found in S in the
join, result is filled with null values.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Right outer join: It is denoted by , and keeps every tuple in the second or right
relation S in the result of R
Full outer join: It is denoted by and keeps all tuples in both the left and right
relations and when no matching tuples are found, filled with null values as needed.
Division
A division operation (denoted by ) is useful for a special kind of query; occasionally it
may be used to solve certain kind of problems.
Consider the relations P (P) and Q (Q) as shown in the figure. The result of dividing P by
Q is the relation R and it has two tuples. For each tuple in R, its product with the tuples of
Q must be in P. In our example (a1, b1) must both be tuples in P: the same is true for (a5,
b1) and (a5, b2)
Examples of the division operations R = P + Q:
For e.g.: To retrieve the names of employees who work on all the projects that 'John
Smith' works on.
1. Retrieve the list of project numbers that John Smith works on the intermediate relation
SMITH_PNOS:
2. create a relation that includes a tuple < PNO,ESSN> whenever the employee whose
social security number is ESSN works on the project whose number is PNO in the
intermediate relation SSN_PNOS.
SSN_PNOS < ESSN.PNO(WORKS_ON)
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
3. Apply the DIVISION operation to the two relations which gives the desired employees
social security numbers.
TABLES
Notice here that 123,453 appear in SSN_PNOS in combination with all two tuples in
SMITH_PNOS; that is why they appear in the resulting relation SSNS.
Q.4. Discuss Multi Table Queries?
Multi Table Queries
So far the queries that we have discussed were containing only one table in the clause.
There are many occasions in the database applications where we need to retrieve data
from more than one table. This section addresses these kinds of queries.
SIMPLE EQUI-JOINS:
When two tables are joined together we must follow these guidelines:
Table names in the FROM clause are separated by commas.
Use appropriate joining condition. This means that the foreign key of table 1 will be
made equal to the primary key of table 2. This column acts as the joining attribute. For
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
example, dno of employee table and dno of department will be involved in the joining
condition of WHERE clause.
EXAMPLE-1: This example demonstrates the equijoin and the purpose is to display the
employee names and the department names for which they work.
SELECT NAME, DNAME
FROM Employee, Department
WHERE employe.Dno = department.Dno;
OUTPUT:
NAME DNAME
Prasad Accounts
Reena Accounts
Deepak Admin
Venkat Accounts
Pooja Research
EXAMPLE 2:
Let us now try to display only employees working for Accounts department.
SELECT Name, salary, Dname
FROM Employee, department
WHERE (Emplyee.DNO = Department.DNO)
AND (Dname = 'Accounts');
OUT PUT:
NAME SALARY DNAME
Prasad 32000 Accounts
Reena 8000 AccountsSANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Venkat 30000 Accounts
SELF JOIN and TABLE ALIASES:
The self-join is one where you involve the same table in the join. This is illustrated in the
following example. This technique is used fully to solve many queries.
To find the employee who earns more than venkat
SELECT e1.name, e1.salary
FROM Employee e1, Employee e2
WHERE (e1.salary > e2.salary) AND (e2.name = 'venkat')
OUT PUT:
NAME SALARY
Prasad 32000
OUTER JOINS:
Outer joins are used to display rows that do not meet the join condition. For left outer join
use a plus sign (+) to left condition and for right outer join use the plus sign to the right
condition. The syntax for left and right outer joins is given below:
Left outer join
SELECT table1.col, table2.col
FROM table1 t1, table2 t2
WHERE t1.col (+) = t2.col;
Notice that the plus sign cannot be placed on both sides of the condition.
EXAMPLE 1: This example demonstrates the right outer join by retaining the right side
table (department) tuples and giving null values for the tuples that do not match the left
side table (employee).
SELECT Name, Dname
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
FROM Employee E, Department D
WHERE E.Name(+) =D.Dname;
OUTPUT::
NAME DNAME
Accounts
Admin
EXAMPLE 2: This is same as ex.1, but the only difference is that it is a left outer join.
So all the left table (employee) rows are kept, and if no match occurs with the right side
table (department) a null is shown.
SELECT Name, Dnaem
FROM Employee E, Department D
WHERE E.Name = D.Dname(+);
OUT PUT:
NAME DNAME
Deepak
Venkat
Pooja
Prasad
Reena
Q.5. Discuss Transaction Processing Concept? 10.2 Describe properties of
Transactions?
Transaction management is the ability of a database management system to manage the
various transactions that occur within the system. Transaction is a set of program
statements or collections of operations that form a single logical unit of work. A database SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
management system should ensure that the transactions are executed properly, either the
entire transaction should execute or none of the operations should have been executed.
This is also called atomic cooperation. The DBMS should execute this task or transaction
in total to avoid inconsistency.
Transaction Processing Concepts
Definition: A transaction is an atomic unit comprised of one or more SQL statements. A
transaction begins with the first executable statement and ends when it is committed or
rolled back.
Single User V/S Multi User systems: A DBMS is used if at most one user at a time can
use the system. It is multi-user if many users can use the system and have access to the
DB concurrently. For e.g.: An air line reservation system is used by 100's of travel
agency and clerks concurrently.
Multiple users can access databases and use computer systems simultaneously. Because
of the concept of multiprogramming, this system executes some commands from one
process than suspend that process, and executes some command from the next process.
Therefore it is inter leaved.
In a single user system one can execute at most one process at a time.
Interleaved concurrency of operators A and B
operators A and B
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Figure 10.1: Interleaved concurrency versus parallel execution
The Read and Write operations and DBMS Buffers:
A transaction is a logical unit of database processing that includes one or more database
access operations (insertion, delete etc). Only retrieving of data is called read only
transaction.
The basic database access operations are
1) Read-item It reads a database item named 'x' into a program variable.
2) Write-item writes the value of the program variable x into the database.
Read-item (x) includes the following steps:
1. Find the address of the disk block that contains item 'x'.
2. Copy that disk block into a bugger in main memory.
3. Copy item x from the buffer to the program variable x.
Executing the write-item (x) includes the following steps.
1. Find the address of the disk block that contains item (x).
2. Copy that disk block into a buffer in main memory.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
3. Copy item x from the program variable into its current location in the buffer
4. Store the updated block from the buffer back to disk.
a. b.
T1 T2
Read_item (X) Read_item(X);
X=X-N' X:=X+M
Write_item(X); Write_item(X)
Read_item(Y)
Y=Y+N;
Write_item(Y)
Concurrent control: The data in the database must perform their transactions
concurrently without violation the ACID (Atomicity, Consistency, Integrity and
Durability) properties of a database. It takes place during the progression of an activity. It
involves the regulation of ongoing activities that are part of transformation process to
ensure that they conform to organizational standards. Concurrency control solves the
major issues involved with allowing multiple people simultaneous access to shared
entities, and their object representations...
Why concurrency control is needed: In a multiuser database, transactions submitted by
the various users may execute concurrently and may update the same data. Concurrently
executing transactions must be guaranteed to produce the same effect as serial execution
of transactions [one by one]. Several problems can occur when concurrent transactions
execute in an uncontrolled manner, therefore the primary concern of a multiuser database
includes how to control data concurrency and consistency.
Data concurrency: Access to data concurrently (simultaneously) used by many users
must be co-ordinates.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Data consistency: A user always sees a consistent (accurate) view of all data committed
by other transactions as of that time and all changes made by the user up to that time.
Several problems can occur when concurrent transactions execute in an uncontrolled
manner.
For e.g.: Airline reservation database in which a record is stored for each flight. Each
record includes the number of reserved seats on that flight. Fig..a shows a Transaction
"T1" that transfers N reservations from one flight, whose number of reserved seats is 'x',
to another flight whose number of reserved seats is 'y'. Fig.b shows a transaction T2 that
reserves m seats on the first flight. We now discuss the types of problems we may
encounter when these two transactions run concurrently.
1. The lost update problem: Suppose transactions T1 and T2 are submitted at the same
time, when these two transactions are executed concurrently as shown in fig. a, then the
final value of x is incorrect. Because T2 reads the value of x before T1 changes it in the
database, and hence the updated value resulting from T1 is lost. For e.g.: x=80 at the start
(80 reservation at the beginning), n=5 (T1 transfers 5 seat reservation from the flight x to
y), and m=4 (T2 reserves 4 seats on x), the final result should be x=79 but due to
interleaving of operations x=84, because updating T1 that removed the 5 seats from x
was lost.
2. Dirty read problem: This problem occurs when one transaction updates a database
item and then the transaction fails for some reason. The updated item is accessed by
another transaction before it is changed back to its original value.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
For e.g.: T1 updates item x and then fails before completion, so the system must change x
back to original value. Before it can do so, however, transaction T2 reads the temporary
value of x, which will not be recorded permanently in the database, because of the failure
of T1. The value of item x that is read by T2 is called Dirty Data, because it has been
created by a transaction that has not been completed and committed yet. Hence this
problem is also known as the temporary update problem.
3. Incorrect Summary Problem: If one transaction is calculating an aggregate summary
function on a number of records, while other transactions are updating some of these
records, the aggregate function may calculate some values before they are updated and
others after they are updated.
For ex: Transaction T3 is calculating the total no. of reservations on all the flights,
meanwhile transaction T1 is executing. The T3 reads the values of x after n seats have
been subtracted from it, but reads the value of y before those n seats have been added to
it.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Why is recovery needed?
A major responsibility of the data base administrator is to prepare for the possibility of
hardware, software, network and system failure. It is usually desirable to recover the
databases and return to normal operation as quickly as possible. Recovery should proceed
in such a manner to protect the database and users from unnecessary problems.
Whenever a transaction is submitted to a DBMS for execution, the system is
responsible for making sure that either.
1. All the operations in the transactions are completed successfully and their effects are
recorded permanently in the DB or
2. The transaction has no effect on the DB; this may happen if a transaction fails after
executing some of it's operations, but before executing all of them.
Types of failures:
1. A computer failure (System Crash):
Hardware, software, network error occurs in the computer system during transaction
2. Transaction or system error:
Some operation in the transaction may cause it to fail, such as integer overflow or
division by 'Zero' etc.
3. Local errors or exception conditions detected by the transaction:
During transaction execution, certain conditions may occur that perform cancellation of
the transaction. For ex. Data for the transaction my not be found.
4. Concurrency control enforcement:
The concurrency control method may decide to abort the transactions, to be restarted
later, because several transactions are in a state of deadlock.
5. Disk failure:
Some disk blocks may lose their data because of read or write malfunctions
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
6. Physical problems and catastrophes:
This refers to a list of problems that includes power or air conditioning failure, fire, theft,
overwriting disks etc.
Transaction states and additional operations: A transaction is an atomic unit of work
that is entirely completed or not done at all. For recovery purpose the system needs to
keep track of when the transaction starts, terminates, commits or aborts. Hence the
recovery manager keeps track of the following operations.
1. Begin transaction: This marks the beginning of transaction execution,
2. Read/Write: These specify read/write operation execution.
3. End transaction: This specifies that the read and write transaction operations have
ended, and marks the end of the transaction execution. At this point it maybe necessary to
check whether the changes can be permanently applied to the DB or aborted.
4. Commit transaction: This signals a successful end of the transaction, so that any
changes executed by the transaction can be committed to the DB.
5. Roll Back: This signals that the transactions has ended unsuccessfully, so that any
changes that the transaction may have applied to the database must be undone.
Fig. 10.2: State transition diagram illustrating the states for transaction execution
Figure 10.2 shows a state transition diagram that describes how a transaction moves
through its execution states. A transaction goes into an active state immediately after it
starts execution, where it can issue Read and Write operations. When the transaction SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
ends, it moves to the partially committed state. At this point some recovery protocols
need to ensure that there is no system failure. Once this check is successful, the
transaction is said to have reached its commit point and enters the committed state.
However, a transaction can go to the failed state if one of the checks fails or if the
transaction is aborted during its active state. The transaction may then have to be rolled
back to undo the effect of its Write operations on the database. The terminated state
corresponds to the transaction leaving the system or end of the transaction.
Desirable Properties of Transactions
To ensure data integrity, the database management system should maintain the following
transaction properties. These are often called the ACID properties.
1. Atomicity: A transaction is an atomic unit of processing. It is either performed in its
entirety (completely) or not performed at all.
2. Consistency: The basic idea behind ensuring atomicity is as follows. The database
system keeps back of the old values of any data on which a transaction performs a write,
and if the transaction does not complete its execution, the old values are restored to make
it appear as though the transaction was never executed.
For Ex: Let Ti be a transaction that transfers 850 from account A to account B. This
transaction can be defined as
Ti ; read(A)
A :=A-50;
Writ (A);
Read(B);
B:=B+50;
Write (B).
Suppose that before execution of transactions Ti the values of accounts A and B are
Rs.1000 and Rs.2000 respectively. Now suppose that, during the execution of transaction
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Ti, a failure has occurred after write(A) operation, that prevents Ti from completing its
execution successfully. But before the write of B operation was executed values of A and
B in database are Rs.950 and`Rs.2000. We have lost Rs.50 which is executed in a
sequential fashion.
3. Durability: Once a transaction changes the database and the changes are committed,
these changes must never be lost because of subsequent failures. The users need not
worry about the incomplete transactions. Partially executed transactions can be rolled
back to the original state, ensuring durability is the responsibility of the recovery
management component of the DBMS.
Q.6. Describe the advantage of Distributed database? What is Client/server Model?
Discuss briefly the security and Internet violation?
In a centralized database system, all system components such as data, DBMS software,
storage devices reside at a single computer or site, where as in distributed database
system data is spread over one or more computer connected by a network.
Distributed database is thus a set of databases stored on multiple computers but it appears
to a user as a single database. The data on several computers can be simultaneously
accessed and modified (data from local and remote databases) using a network. Each
database server in the DDB is controlled by its local DBMS, and each cooperates to
maintain the consistency of the global database.
As a general goal, distributed computing systems divide a big, unmanageable problem
into smaller pieces and solve it efficiently in a coordinated manner.
Advantages of Distributed Databases
1. Increased reliability and availability: Reliability is broadly defined as the probability
that a system is running at a certain time point, whereas reliability is defined as the
system that is continuously available during a time interval. When the data and DBMS
software are distributed over several sites, one site may fail while other sites continue to
operate. Only the data and software that exist at the failed site cannot be accessed. In a
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
centralized system, failure at a single site makes the whole system unavailable to all
users.
2. Improved performance: Large database is divided into smaller databases by keeping
the necessary data where it is needed most. Data localization reduces the contention for
CPU and I/O services, and simultaneously reduces access delays involved in wide area
network. When a large database is distributed over multiple sites, smaller databases exist
at each site. As a result, local queries and transactions accessing data at a single site have
better performance because of the smaller local databases. To improve parallel query
processing a single large transaction is divided into a number of smaller transactions and
executes multiple transactions at different sites.
3. Data sharing: Data can be accessed by users at other remote sites through the
distributed database management system (DDBMS) Software.
Client-Server Model
The Client-Server model is basic to distributed systems; it allows clients to make requests
that are routed to the appropriate server in the form of transactions. The client-server
model consists of three parts.
1. Client The client is the machine (workstation or pc) running the front and applications.
It interacts with a user through the keyboard, display and mouse. The client has no direct
data access responsibilities. The client machine provides front-end application software
for accessing the data on the server. The clients initiates transactions, the server processes
the transactions.
Interaction between client and server might be processed as follows during
processing of an SQL query.
1. The client passes a user query and decomposes it into a number of independent site
queries. Each site query is sent to the appropriate server site.
2. Each server processes the local query and sends the resulting relation to the client site.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
3. The client site combines the results of the queries to produce the result of the originally
submitted query.
So the server is called database processor or back end machine, where as the client is
called application processor or front end machine.
Another function controlled by the client is that of ensuring consistency of replicated
copies of a data item by using distributed concurrency control techniques. The client must
also ensure the atomicity of global transactions by performing global recovery when
certain sites fail. It provides distribution transparency, which is the client hides the details
of data distribution from the user.
1. Server The server is the machine that runs the DMS software. It is referred to as back
end. The server processes SQL and other query statements received from client
applications. It can have large disk capacity and fast processors.
2. Network The network enables remote data access through client server and server-to-
server communication.
Each computer in a network is a node, acts as a client, a server, or both, depending on the
situation.
Advantages:
Client applications are not dependent on physical location of the data. If the data is
moved or distributed to other database servers, the application continues to function with
little or no modification.
It provides multi-tasking and shared memory facilities; as a result they can deliver the
highest possible degree of concurrency and data integrity.
In networked environment, shared data is stored on the servers, rather than on all
computers in the system. This makes it easier and more efficient to manage concurrent
access. Inexpensive, low-end client work stations can access the remote data of the server
effectively.
Security and Integrity Violations
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Misuse of database can be categorized as being either intentional or accidental.
Accidental loss of data consistency:
1. System crashes during transaction processing
2. Due to multi-users accessing the database.
3. Distribution of data over several computers.
Intentional loss of data may be due to reading, writing or destruction of data by
unauthorized users.
Database security usually protects data by several techniques.
Certain portion [selected columns] of a database is available only to those persons who
are authorized to access it. This ensures that the confidentiality of data is maintained.
For e.g.: In large organizations, where different users may use the same database,
sensitive information such as employees salaries should be kept confidential from most of
the other users.
To protect database we must take security measures at several levels. Network security is
also important as database security.
Security within the operating system is implemented by providing a password for the user
accounts. It Protects data in primary memory by avoiding direct access to the data.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143