data redundancy: rollno name branch hod office tel€¦ · 2nf (2nd normal form) a database is in...

14
CORE COURSE-IX DATABASE SYSTEMS BCA 4 th SEMESTER Data Redundancy: Consider the database table of student. rollno name branch hod office_tel 401 Akon CSE Mr. X 53337 402 Bkon CSE Mr. X 53337 403 Ckon CSE Mr. X 53337 404 Dkon CSE Mr. X 53337 In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod (Head of Department) and office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy. Simply we can say that same data in multiple times in a single table.it will take extra memory. We must have to avoid the Data Redundancy in our table. Anomaly What is anomaly? An anomaly is where there is an issue or problems in the data that is not meant to be there. Insertion Anomaly Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL. SYLLABUS UNIT-4 Functional Dependencies and Normalization for Relational Databases, Relational Database Algorithms and Further Dependencies, Practical Database Design Methodology and use of UML Diagrams.

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

CORE COURSE-IX DATABASE SYSTEMS BCA 4th SEMESTER

Data Redundancy:

Consider the database table of student.

rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337

In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod (Head of Department) and office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy.

Simply we can say that same data in multiple times in a single table.it will take extra memory.

We must have to avoid the Data Redundancy in our table.

Anomaly

What is anomaly?

An anomaly is where there is an issue or problems in the data that is not meant to be there.

Insertion Anomaly

Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL.

SYLLABUS UNIT-4 Functional Dependencies and Normalization for Relational Databases, Relational Database Algorithms and Further Dependencies, Practical Database Design Methodology and use of UML Diagrams.

Page 2: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.

These scenarios are nothing but Insertion anomalies.

Updation Anomaly

What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly

In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly.

Normalization o Normalization is the process of organizing the data in the database. o Normalization is used to minimize the redundancy from a relation or set of relations. It

is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.

o Normalization divides the larger table into the smaller table and links them using relationship.

o The normal form is used to reduce redundancy from the database table.

DIFFERENT NORMAL FORM:

(1) 1NF (1st Normal Form)

A database is in first normal form if it satisfies the following Rules:

Contains only atomic values There are no repeating tuples.

An atomic value is a value that cannot be divided. For example, in the table shown below, the values in the [Color] column in the first row can be divided into "red" and "green", hence [TABLE_PRODUCT] is not in 1NF.

Example: How do we bring an unnormalized table into first normal form? Consider the following example:

Page 3: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

To bring this table to first normal form, we split the table into two tables and now we have the resulting tables:

Now first normal form is satisfied, as the columns on each table all hold just one value.

2NF (2nd Normal Form)

A database is in second normal form if it satisfies the following rules:

It is in first normal form. No partial dependency exists in the relation for non prime attributes.

Example

Consider the following example:

Prime attribute − An a ribute, which is a part of the candidate-key, is known as a prime attribute.

Non-prime attribute − An a ribute, which is not a part of the prime-key, is said to be a non-prime attribute.

Page 4: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table does not satisfy second normal form.

To bring this table to second normal form, we break the table into two tables, and now we have the following:

3NF (Third Normal Form)

A given relation is called in Third Normal Form (3NF) if and only if-

1. Relation already exists in 2NF. 2. No transitive dependency exists.[ No a->b, b->c,a->c]

Consider one example

Empid Empname DeptId Deptname E1 A D1 cse E2 B D2 ME E3 C D1 cse E4 D D2 ME

Page 5: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

Empname can be determined by empid=> Empid----Empname

DeptId can be determined by empid=> Empid---- DeptId

Deptname can be determined by Deptid=> Deptname

By transitive dependency Empid=>Deptname [No directly which is violate]

We will divide into 2 tables.

Emp_details

Empid Empname DeptId E1 A D1 E2 B D2 E3 C D1 E4 D D2

Departmet_details

DeptId Deptname D1 cse D2 ME D1 cse D2 ME

BCNF(3.5 NormalForm)

BCNF is an extension to Third Normal Form (3NF) and is slightly stronger than 3NF. A relation R is in BCNF, if P -> Q is a trivial functional dependency and P is a super key for R. If a relation is in BCNF, that would mean that redundancy based on function dependency have been removed, but some redundancies are still there. Example:

IpAdd PortNum ProcessReq

10.4.9.34 80 Register Application form

10.11.4.99 110 Gmail message request

Page 6: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

10.1.11.111 25 Remote User request

Functional dependencies exist on this table are:

1. IpAdd, PortNum -> ProcessReq 2. ProcReq -> PortNum

Applying normalization means converting into BCNF. For that we first check 1NF, 2NF, 3NF.

By default every relational schema is in 1NF.

Before proceeding to next normal forms, we should find candidate keys. If we find candidate keys we get { IpAdd, PortNum } and { IpAdd, ProcessReq } are candidate keys. So prime attributes (part of candidate keys) are IpAdd, PortNum, ProcessReq. As per formal definition of 3NF, if right hand side has prime attribute, it is enough to say that it is in 3NF. Since all attributes are prime attributes we can say that table is in 3NF also. If already in 3NF, no need to check 2NF. So up to 1NF, 2NF, 3NF all are fine.

Now check for BCNF. According to the definition of BCNF left hand side should be key. So FD IpAdd, PortNum -> PorcessReq . Therefore AB is a key there is no problem.

Other FD PorcessReq -> PortNum, here this FD not deriving all attributes, since it’s not deriving everything ProcessReq is not a key. We can say that it is not in BCNF. To make it into BCNF,

ProcessReq+ = { ProcessReq, PortNum } is a separate table.

PortNum ProcessReq

80 Register Application form

110 Gmail message request

25 Remote User request

And { IpAdd, ProcReq} is other table.

IpAdd ProcessReq

10.4.9.34 Register Application form

10.11.4.99 Gmail message request

Page 7: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

10.1.11.111 Remote User request

4NF (4th Normal Form)

Rules for 4th Normal Form For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:

1. It should be in the Boyce-Codd Normal Form.

2. The table should not have any Multi-valued Dependency.

What is Multi-valued Dependency? A table is said to have multi-valued dependency, if the following conditions are true,

1. For a dependency A → B, if for a single value of A, mul ple value of B exists, then the table

may have multi-valued dependency.

2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.

3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and B, then

B and C should be independent of each other.

Example

Below we have a college enrolment table with columns s_id, course and hobby.

s_id course hobby

1 Science Cricket

1 Maths Hockey

2 C# Cricket

2 Php Hockey

As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey.

Page 8: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

You must be thinking what problem this can lead to, right?

Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one student, two hobbies exists, hence along with both the courses, these hobbies should be specified.

s_id course hobby

1 Science Cricket

1 Maths Hockey

1 Science Hockey

1 Maths Cricket

And, in the table above, there is no relationship between the columns course and hobby. They are independent of each other.

So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.

How to satisfy 4th Normal Form?

To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.

CourseOpted Table

s_id course

1 Science

1 Maths

2 C#

2 Php

Page 9: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

And, Hobbies Table,

s_id hobby

1 Cricket

1 Hockey

2 Cricket

2 Hockey

Fifth normal form (5NF)

o A relation is in 5NF if it is in 4NF and not contains any join dependency o Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data.Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

Page 10: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Page 11: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

Relational database design process

There are two approaches are used in relational database design.

Bottom-up approach Top down approach

Bottom-up approach: This approach builds relations on the basis of the relationships existing individual relations and attributes/entities. Once combined all lower level relation and attributes/entities to higher level. The build process start from bottom and move towards up. Hence called Bottom-up approach.This approach is Suitable when small database design with few tables is present. Example: personal application only login table with 2 columns (username, password).

Top-down Approach: This process is just opposite to Bottom-up approach. The top level relation divided into number of lower level relation and attributes/entities. The divide process start from top and movedown. Hence called topdown – approach. This approach is Suitable when large database design with many tables are required. Example: university having many departments and under each department students and couses are there.

Database Design process:

(1) Purpose of Database Design:

During this step, the database designers have to interview the customers (database users) to understand the proposed system and obtain and document the data and functional requirements. The result of this step is a document that includes the detailed requirements provided by the users.

Identification of enties and attributes and relationship.

Database designers will identify the enties and attributes with relationship.

E-R diagram: This can be done with respective notations. Transformation is done to

logical Schema.

Physical database implementation.

Page 12: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

DATABASE ALGORITHITM AND FURTHER DEPENDENCY

Step-1: accept the database table Step-2: check each and every value and Functional dependency of the table. Step-3: apply the rules Step-4: decompose the table and create new tables Step-5: accept the new tables.

Unified Modeling Language (UML)

What is UML? The Unified Modeling Language (UML) is a general-purpose, developmental, modeling language in the field of software engineering that is intended to provide a standard way to visualize the design of a system.

The use of UML diagrams

It reduces thousands of words of explanation in a few graphical diagrams that may Reduce time consumption to understand. It makes communication more clear and real. It becomes very much easy for the software programmer to implement the actual

implementations once they have the clear picture of the problem.

Class diagrams:

UML class diagrams: Class diagrams are the main building blocks of every object oriented methods. The class diagram can be used to show the classes, relationships, interface, association, and collaboration.

There are three types of modifiers which are used to decide the visibility of attributes and operations. + is used for public visibility(for everyone) # is used for protected visibility (for friend and derived) – is used for private visibility (for only me)

Example of Class dog which has two fields says name and colour. It has two functions like run() and bark. The UML diagram is given below.

Page 13: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

Sequence diagram:

A sequence diagram simply depicts interaction between objects in a sequential order i.e. the order in which these interactions take place. We can also use the terms event diagrams or event scenarios to refer to a sequence diagram. Sequence diagramsdescribe how and in what order the objects in a system function.

Usecase diagrams

A use case diagram at its simplest is a representation of a user's interaction with the system that shows the relationship between the user and the different use cases in which the user is involved. A use case diagram can identify the different types of users of a system and the different use cases and will often be accompanied by other types of diagrams as well. The use cases are represented by either circles or ellipses.

Page 14: Data Redundancy: rollno name branch hod office tel€¦ · 2NF (2nd Normal Form) A database is in second normal form if it satisfies the following rules: ... BCNF(3.5 NormalForm)

Activity diagrams:

An activity diagram visually presents a series of actions or flow of control in a system similar to a flowchart or a data flow diagram. Activity diagrams are often used to understand the the activity.

Example: in login screen user has to enter the username and password and if both username and password are correct then it will allow next screen and if username or/and password wrong then it will give error message to enter correct username and password.