dbms unit ii

17
1 IIMC Prasanth Kumar K B.Com (Computers) II Year DATABASE MANAGEMENT SYSTEM Unit- II 1. What is model? A. A representation of reality that retains only selected details. 2. What is an object set? A. Objects represent things that are important to users in the portion of reality we want to model. A set of things of the same kind are called as object sets. A particular instance of an object set is called object instance. 3. What is an Entity and Attribute? A. An entity is a thing or object in the real world that is distinguishable from the the other objects. An entity set has a set of properties called attributes. 4. Explain the different Mapping cardinalities. A. Mapping cardinalities express the number of entities to which another entity can be associated via a relationship set. For a binary relationship set R between entity sets A and B, the mapping cardinalities must be one of the following. (i). One to One: An entity in A is associated with at most one entity in B and an entity in B is associated with at most one entity in A. (ii). One to many: An entity in A is associated with any number of entities in B. An entity in B, however can be associated with at most one entity in A. (iii). Many to One: An entity in A is associated with at most one entity in B and an entity in B, however can be associated with any number of entities in A. (iv).Many to many: An entity in A is associated with any number of entities in B and an entity in B is associated with any number of entities in A. 5. What is Generalization and Specialization? A. Generalization: An object set that is a superset of another object set. Specialization: An object set that is a subset of another object set. 6. What is Aggregation? A. Aggregation: It is a relationship set viewed as an object set. 7. Explain Entity-Relationship diagram. A. E-R Diagram: An entity-relationship diagram (ERD) is a data modeling technique that creates a graphical representation of the entities, and the relationships between entities, within an information system. Any ER diagram has an equivalent relational table, and any relational table has an equivalent ER diagram. Entity: The entity is a person, object, place or event for which data is collected. It is equivalent to a database table. An entity can be defined by means of its properties, called attributes. For example, the CUSTOMER entity may have attributes for such things as name, address and telephone number. Relationship: The relationship is the interaction between the entities. E-R diagram components are: · Rectangles representing entity sets. · Ellipses representing attributes.

Upload: kprasanthmca

Post on 27-Apr-2015

819 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dbms Unit II

1

IIMC Prasanth Kumar K

B.Com (Computers) II YearDATABASE MANAGEMENT SYSTEM

Unit- II

1. What is model?A. A representation of reality that retains only selected details.

2. What is an object set?A. Objects represent things that are important to users in the portion of reality

we want to model. A set of things of the same kind are called as object sets.A particular instance of an object set is called object instance.

3. What is an Entity and Attribute?A. An entity is a thing or object in the real world that is distinguishable from the

the other objects. An entity set has a set of properties called attributes.

4. Explain the different Mapping cardinalities.A. Mapping cardinalities express the number of entities to which another entity

can be associated via a relationship set.For a binary relationship set R between entity sets A and B, the

mapping cardinalities must be one of the following.(i). One to One: An entity in A is associated with at most one entity in B andan entity in B is associated with at most one entity in A.(ii). One to many: An entity in A is associated with any number of entities in B.An entity in B, however can be associated with at most one entity in A.(iii). Many to One: An entity in A is associated with at most one entity in Band an entity in B, however can be associated with any number of entities inA.(iv).Many to many: An entity in A is associated with any number of entities inB and an entity in B is associated with any number of entities in A.

5. What is Generalization and Specialization?A. Generalization: An object set that is a superset of another object set.

Specialization: An object set that is a subset of another object set.

6. What is Aggregation?A. Aggregation: It is a relationship set viewed as an object set.

7. Explain Entity-Relationship diagram.A. E-R Diagram: An entity-relationship diagram (ERD) is a data modeling

technique that creates a graphical representation of the entities, and therelationships between entities, within an information system. Any ER diagramhas an equivalent relational table, and any relational table has an equivalentER diagram.

Entity: The entity is a person, object, place or event for which data iscollected. It is equivalent to a database table. An entity can be defined bymeans of its properties, called attributes. For example, the CUSTOMER entitymay have attributes for such things as name, address and telephone number.

Relationship: The relationship is the interaction between the entities.

E-R diagram components are:

Rectangles representing entity sets.

Ellipses representing attributes.

Page 2: Dbms Unit II

2

IIMC Prasanth Kumar K

Diamonds representing relationship sets.

Lines’ linking attributes to entity sets and entity sets to relationship sets.

Explain the different Implementation models.

A. There are three types of Implementation models1. The Hierarchical Data Model: The Hierarchical Data Model can berepresented as follows:

A hierarchical database consists of the following:

It contains nodes connected by branches.

The top node is called the root.

If multiple nodes appear at the top level, the nodes are called root segments.Each node (with the exception of the root) has exactly one parent.One parent may have many children.

2. The Network Data Model:

The Network Data Model can be represented as follows:

Like the Hierarchical Data Model the Network Data Model also consists of nodes andbranches, but a child may have multiple parents within the network structure.

3. The Relational Data Model:

Page 3: Dbms Unit II

3

IIMC Prasanth Kumar K

The Relational Data Model has the relation at its heart, but then a whole series ofrules governing keys, relationships, joins, functional dependencies, transitivedependencies, multi-valued dependencies, and modification anomalies.

The Relation

The Relation is the basic element in a relational data model.

A relation is subject to the following rules:

Relation (file, table) is a two-dimensional table.

Attribute (i.e. field or data item) is a column in the table.

Each column in the table has a unique name within that table.

Each column is homogeneous. Thus the entries in any column are all of thesame type (e.g. age, name, employee-number, etc).

Each column has a domain, the set of possible values that can appear in thatcolumn.

A Tuple (i.e. record) is a row in the table.

The order of the rows and columns is not important.

Values of a row all relate to some thing or portion of a thing.

Repeating groups (collections of logically related attributes that occur multipletimes within one record occurrence) are not allowed.

Duplicate rows are not allowed (candidate keys are designed to prevent this).

Cells must be single-valued (but can be variable length). Single valued meansthe following:

Cannot contain multiple values such as 'A1, B2, and C3’.Cannot contain combined values such as 'ABC-XYZ' where 'ABC' meansone thing and 'XYZ' another.

A relation may be expressed using the notation R (A, B, C...) where:

R = the name of the relation.

(A,B,C, ...) = the attributes within the relation.

A = the attribute(s) which form the primary key.

9. Explain the different types of keys in Relational Data Model.

A. Keys

1. A simple key contains a single attribute.

2. A composite key is a key that contains more than one attribute.

Page 4: Dbms Unit II

4

IIMC Prasanth Kumar K

3. A candidate key is an attribute (or set of attributes) that uniquely identifies arow. A candidate key must possess the following properties:

Unique identification - For every row the value of the key must uniquelyidentify that row.Non redundancy - No attribute in the key can be discarded withoutdestroying the property of unique identification.

4. A primary key is the candidate key which is selected as the principal uniqueidentifier. Every relation must contain a primary key. The primary key is usuallythe key selected to identify a row when the database is physically implemented.For example, empno is selected in EMP relation.

5. A superkey is any set of attributes that uniquely identifies a row. A superkeydiffers from a candidate key in that it does not require the non redundancyproperty.

6. A foreign key is an attribute (or set of attributes) that appears (usually) as anon key attribute in one relation and as a primary key attribute in anotherrelation. I say usually because it is possible for a foreign key to also be thewhole or part of a primary key:

A many-to-many relationship can only be implemented by introducing anintersection or link table which then becomes the child in two one-to-manyrelationships. The intersection table therefore has a foreign key for each ofits parents, and its primary key is a composite of both foreign keys.A one-to-one relationship requires that the child table has no more than oneoccurrence for each parent, which can only be enforced by letting theforeign key also serve as the primary key.

8. What is Determinant and Dependent?

A. The terms determinant and dependent can be described as follows:

The expression X Y means 'if I know the value of X, then I canobtain the value of Y' (in a table or somewhere).

In the expression X Y, X is the determinant and Y is thedependent attribute.

The value X determines the value of Y.

The value Y depends on the value of X.

9. What is Functional Dependency?

A. A functional dependency can be described as follows:

1. An attribute is functionally dependent if its value is determined by anotherattribute.

2. That is, if we know the value of one (or several) data items, then we can findthe value of another (or several).

3. Functional dependencies are expressed as X Y, where X is the determinantand Y is the functionally dependent attribute.

4. If A (B,C) then A B and A C.

5. If (A,B) C, then it is not necessarily true that A C and B C.

6. If A B and B A, then A and B are in a 1-1 relationship.

7. If A B then for A there can only ever be one value for B.

Page 5: Dbms Unit II

5

IIMC Prasanth Kumar K

10. What is Transitive Dependency?

A. A transitive dependency can be described as follows:

1. An attribute is transitively dependent if its value is determined by anotherattribute which is not a key.

2. If X Y and X is not a key then this is a transitive dependency.

3. A transitive dependency exists when A B C but NOT A C.

12. What is Multi-Valued Dependency?

A. A multi-valued dependency can be described as follows:

1. A table involves a multi-valued dependency if it may contain multiple values foran entity.

2. A multi-valued dependency may arise as a result of enforcing 1st normal form.

3. X Y, ie X multi-determines Y, when for each value of X we can have morethan one value of Y.

4. If A B and A C then we have a single attribute A which multi-determinestwo other independent attributes, B and C.

5. If A (B,C) then we have an attribute A which multi-determines a set ofassociated attributes, B and C

13. What is Join Dependency?

A. A join dependency can be described as follows:

If a table can be decomposed into three or more smaller tables, it must becapable of being joined again on common keys to form the original table.

14. Explain different forms of Normalization.

A. Normalization is the process of converting the table into standard form.

It is of different types:

1st Normal FormA table is in first normal form if all the key attributes have been defined and itcontains no repeating groups.

Taking the ORDER entity as an example we could end up with a set of attributes likethis:

ORDER

order_id customer_id product1 product2 product3

123 456 abc1 def1 ghi1

456 789 abc2 Abc3 Ghi4

This structure creates the following problems:

Order 123 has no room for more than 3 products.

Order 456 has wasted space for product2 and product3.

Page 6: Dbms Unit II

6

IIMC Prasanth Kumar K

In order to create a table that is in first normal form we must extract the repeatinggroups and place them in a separate table, which I shall call ORDER_LINE.

ORDER

order_id customer_id

123 456

456 789

I have removed 'product1', 'product2' and 'product3', so there are no repeatinggroups.

ORDER_LINE

order_id product

123 abc1

123 def1

123 ghi1

456 abc2

Each row contains one product for one order, so this allows an order to contain anynumber of products.

The new relationships can be expressed as follows:

1 instance of an ORDER has 1 to many ORDER LINES

1 instance of a PRODUCT has 0 to many ORDER LINES

2nd Normal FormA table is in second normal form (2NF) if and only if it is in 1NF and every non keyattribute is fully functionally dependent on the whole of the primary key (i.e. there areno partial dependencies).

1. Anomalies can occur when attributes are dependent on only part of amulti-attribute (composite) key.

2. A relation is in second normal form when all non-key attributes are dependenton the whole key. That is, no attribute is dependent on only a part of the key.

3. Any relation having a key with a single attribute is in second normal form.

Take the following table structure as an example:

order(order_id, cust, cust_address, cust_contact, order_date, order_total)

Here we should realize that cust_address and cust_contact are functionallydependent on cust but not on order_date, therefore they are not dependent on thewhole key. To make this table 2NF these attributes must be removed and placedsomewhere else.

Page 7: Dbms Unit II

7

IIMC Prasanth Kumar K

3rd Normal FormA table is in third normal form (3NF) if and only if it is in 2NF and every non keyattribute is non transitively dependent on the primary key (i.e. there are no transitivedependencies).

1. Anomalies can occur when a relation contains one or more transitivedependencies.

2. A relation is in 3NF when it is in 2NF and has no transitive dependencies.

3. A relation is in 3NF when 'All non-key attributes are dependent on the key, thewhole key and nothing but the key'.

Take the following table structure as an example:

order(order_id, cust, cust_address, cust_contact, order_date, order_total)

Here we should realise that cust_address and cust_contact are functionallydependent on cust which is not a key. To make this table 3NF these attributes mustbe removed and placed somewhere else.

You must also note the use of calculated or derived fields. Take the example where atable contains PRICE, QUANTITY and EXTENDED_PRICE where EXTENDED_PRICE iscalculated as QUANTITY multiplied by PRICE. As one of these values can becalculated from the other two then it need not be held in the database table. Do notassume that it is safe to drop any one of the three fields as a difference in thenumber of decimal places between the various fields could lead to different resultsdue to rounding errors. For example, take the following fields:

AMOUNT - a monetary value in home currency, to 2 decimal places.

EXCH_RATE - exchange rate, to 9 decimal places.

CURRENCY_AMOUNT - amount expressed in foreign currency, calculated asAMOUNT multiplied by EXCH_RATE.

If you were to drop EXCH_RATE could it be calculated back to its original 9 decimalplaces?

Reaching 3NF is adequate for most practical needs, but there may be circumstanceswhich would benefit from further normalization.

Boyce-Codd Normal FormA table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and everydeterminant is a candidate key.

1. Anomalies can occur in relations in 3NF if there is a composite key in which partof that key has a determinant which is not a candidate key.

2. This can be expressed as R(A,B,C), C A where:The relation contains attributes A, B and C.A and B form a candidate key.C is the determinant for A (A is functionally dependent on C).C is not part of any key.

3. Anomalies can also occur where a relation contains several candidate keyswhere:

Page 8: Dbms Unit II

8

IIMC Prasanth Kumar K

The keys contain more than one attribute (they are composite keys).

An attribute is common to more than one key.

Take the following table structure as an example:

schedule(campus, course, class, time, room/bldg)

Take the following sample data:

campus course class time room/bldg

East English 101 1 8:00-9:00 212 AYE

East English 101 2 10:00-11:00 305 RFK

West English 101 3 8:00-9:00 102 PPR

Note that no two buildings on any of the university campuses have the same name,thus ROOM/BLDG CAMPUS. As the determinant is not a candidate key this table isNOT in Boyce-Codd normal form.

This table should be decomposed into the following relations:

R1(course, class, room/bldg, time)

R2(room/bldg, campus)

As another example take the following structure:

enrol(student#, s_name, course#, c_name, date_enrolled)

This table has the following candidate keys:

(student#, course#)

(student#, c_name)

(s_name, course#) - this assumes that s_name is a unique identifier

(s_name, c_name) - this assumes that c_name is a unique identifier

The relation is in 3NF but not in BCNF because of the following dependencies:

student# s_name

course# c_name

4th Normal FormA table is in fourth normal form (4NF) if and only if it is in BCNF and contains no morethan one multi-valued dependency.

1. Anomalies can occur in relations in BCNF if there is more than one multi-valueddependency.

2. If A B and A C but B and C are unrelated, ie A (B,C) is false, then wehave more than one multi-valued dependency.

3. A relation is in 4NF when it is in BCNF and has no more than one multi-valueddependency.

Page 9: Dbms Unit II

9

IIMC Prasanth Kumar K

Take the following table structure as an example:

info(employee#, skills, hobbies)

Take the following sample data:

employee# skills hobbies

1 Programming Golf

1 Programming Bowling

1 Analysis Golf

1 Analysis Bowling

2 Analysis Golf

2 Analysis Gardening

2 Management Golf

2 Management Gardening

This table is difficult to maintain since adding a new hobby requires multiple new rowscorresponding to each skill. This problem is created by the pair of multi-valueddependencies EMPLOYEE# SKILLS and EMPLOYEE# HOBBIES. A much betteralternative would be to decompose INFO into two relations:

skills(employee#, skill)

hobbies(employee#, hobby)

5th (Projection-Join) Normal FormA table is in fifth normal form (5NF) or Projection-Join Normal Form (PJNF) if it is in4NF and it cannot have a lossless decomposition into any number of smaller tables.

Another way of expressing this is:

... and each join dependency is a consequence of the candidate keys.

Yet another way of expressing this is:

... and there are no pairwise cyclical dependencies in the primary key comprised ofthree or more attributes.

Anomalies can occur in relations in 4NF if the primary key has three or more fields.

5NF is based on the concept of join dependence - if a relation cannot bedecomposed any further then it is in 5NF.

Pairwise cyclical dependency means that:

You always need to know two values (pairwise).

For any one you must know the other two (cyclical).

Take the following table structure as an example:

buying(buyer, vendor, item)

Page 10: Dbms Unit II

10

IIMC Prasanth Kumar K

This is used to track buyers, what they buy, and from whom they buy.

Take the following sample data:

buyer vendor item

Sally Liz Claiborne Blouses

Mary Liz Claiborne Blouses

Sally Jordach Jeans

Mary Jordach Jeans

Sally Jordach Sneakers

The question is, what do you do if Claiborne starts to sell Jeans? How many recordsmust you create to record this fact?

The problem is there are pairwise cyclical dependencies in the primary key. That is, inorder to determine the item you must know the buyer and vendor, and to determinethe vendor you must know the buyer and the item, and finally to know the buyer youmust know the vendor and the item.

The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item,and Vendor-Item.

6th (Domain-Key) Normal FormA table is in sixth normal form (6NF) or Domain-Key normal form (DKNF) if it is in 5NFand if all constraints and dependencies that should hold on the relation can beenforced simply by enforcing the domain constraints and the key constraints specifiedon the relation.

Another way of expressing this is:

... if every constraint on the table is a logical consequence of the definition of keysand domains.

1. An domain constraint (better called an attribute constraint) is simply aconstraint to the effect a given attribute A of R takes its values from somegiven domain D.

2. A key constraint is simply a constraint to the effect that a given set A, B, ..., Cof R constitutes a key for R.

This standard was proposed by Ron Fagin in 1981, but interestingly enough he madeno note of multi-valued dependencies, join dependencies, or functional dependenciesin his paper and did not demonstrate how to achieve DKNF. However, he did manageto demonstrate that DKNF is often impossible to achieve.

15. Explain the different operators in Relational Algebra?

A. The eight relational algebra operators are

1. SELECT To retrieve specific tuples/rows from a relation.

Page 11: Dbms Unit II

11

IIMC Prasanth Kumar K

Ord# OrdDate Cust#

101 02-08-94 002

104 18-09-94 002

2. PROJECT To retrieve specific attributes/columns from a relation.

Descr Price

Power Supply 4000

101-Keyboard 2000

Mouse 800

MS-DOS 6.0 5000

MS-Word 6.0 8000

3. PRODUCT To obtain all possible combination of tuples from two relations.

Page 12: Dbms Unit II

12

IIMC Prasanth Kumar K

Ord# OrdDate O.Cust# C.Cust# CustName City

101 02-08-94 002 001 Shah Bombay

101 02-08-94 002 002 Srinivasan Madras

101 02-08-94 002 003 Gupta Delhi

101 02-08-94 002 004 Banerjee Calcutta

101 02-08-94 002 005 Apte Bombay

102 11-08-94 003 001 Shah Bombay

102 11-08-94 003 002 Srinivasan Madras

4. UNION To retrieve tuples appearing in either or both the relations participating inthe UNION.

Ord# OrdDate Cust#

101 03-07-94 001

102 27-07-94 003

101 02-08-94 002

Page 13: Dbms Unit II

13

IIMC Prasanth Kumar K

102 11-08-94 003

103 21-08-94 003

104 28-08-94 002

105 30-08-94 005

5. INTERSECT- To retrieve tuples appearing in both the relations participating in theINTERSECT.

Eg:To retrieve Cust# of Customers who've placed orders in July and inAugust

Cust#

003

6. DIFFERENCE To retrieve tuples appearing in the first relation participating in theDIFFERENCE but not the second.

Page 14: Dbms Unit II

14

IIMC Prasanth Kumar K

Eg:To retrieve Cust# of Customers who've placed orders in July but not in August

Cust#

001

7. JOIN To retrieve combinations of tuples in two relations based on a common field inboth the relations.

Eg:

ORD_AUG join CUSTOMERS (here, the common column isCust#)

Ord# OrdDate Cust# CustNames City

101 02-08-94 002 Srinivasan Madras

102 11-08-94 003 Gupta Delhi

103 21-08-94 003 Gupta Delhi

104 28-08-94 002 Srinivasan Madras

105 30-08-94 005 Apte Bombay

Note: The above join operation logically implies retrieval of details of all orders and thedetails of the corresponding customers who placed the orders.

Such a join operation where only those rows having corresponding rows in the boththe relations are retrieved is called the natural join or inner join. This is the mostcommon join operation.

Consider the example of EMPLOYEE and ACCOUNT relations.

EMPLOYEE

EMP # EmpName EmpCity Acc#

Page 15: Dbms Unit II

15

IIMC Prasanth Kumar K

X101 Shekhar Bombay 120001

X102 Raj Pune 120002

X103 Sharma Nagpur Null

X104 Vani Bhopal 120003

ACCOUNT

Acc# OpenDate BalAmt

120001 30. Aug. 1998 5000

120002 29. Oct. 1998 1200

120003 1. Jan. 1999 3000

120004 4. Mar. 1999 500

A join can be formed between the two relations based on the common column Acc#.The result of the (inner) join is :

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X104 Vani Bhopal 120003 1. Jan 1999 3000

Note that, from each table, only those records which have corresponding records inthe other table appear in the result set. This means that result of the inner join showsthe details of those employees who hold an account along with the account details.

The other type of join is the outer join which has three variations – the left outerjoin, the right outer join and the full outer join. These three joins are explained asfollows:

The left outer join retrieves all rows from the left-side (of the join operator) table. Ifthere are corresponding or related rows in the right-side table, the correspondencewill be shown. Otherwise, columns of the right-side table will take null values.

EMPLOYEE left outer join ACCOUNT gives:

Page 16: Dbms Unit II

16

IIMC Prasanth Kumar K

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X103 Sharma Nagpur NULL NULL NULL

X104 Vani Bhopal 120003 1. Jan 1999 3000

The right outer join retrieves all rows from the right-side (of the join operator) table.If there are corresponding or related rows in the left-side table, the correspondencewill be shown. Otherwise, columns of the left-side table will take null values.

EMPLOYEE right outer join ACCOUNT gives:

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X104 Vani Bhopal 120003 1. Jan 1999 3000

NULL NULL NULL 120004 4. Mar. 1999 500

(Assume that Acc# 120004 belongs to someone who is not an employee and hencethe details of the Account holder are not available here)

The full outer join retrieves all rows from both the tables. If there is a correspondenceor relation between rows from the tables of either side, the correspondence will beshown. Otherwise, related columns will take null values.

EMPLOYEE full outer join ACCOUNT gives:

Page 17: Dbms Unit II

17

IIMC Prasanth Kumar K

Emp# EmpName EmpCity Acc# OpenDate BalAmt

X101 Shekhar Bombay 120001 30. Aug. 1998 5000

X102 Raj Pune 120002 29. Oct. 1998 1200

X103 Sharma Nagpur NULL NULL NULL

X104 Vani Bhopal 120003 1. Jan 1999 3000

NULL NULL NULL 120004 4. Mar. 1999 500

The result of a natural joins operation between R1 and R2

a1 b1 c1

a2 b2 c2

a3 b3 c3

8. DIVIDE Consider the following three relations:

R1 divide by R2 per R3 gives:

a

Thus the result contains those values from R1 whose corresponding R2 values in R3include all R2 values.