master of computer application-mc0077

10
 Master of Computer Application (MCA) MC0077 ± Advanced Database Systems  Answer all Questions Each Question carries TEN Marks 1. Explain the following normal forms with a suitabl e example demonstrating the reduction of a sample table into the said normal forms: A) First Normal Form B) Second Normal Form C) Third Normal Form Ans: 1NF A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only Example: 1NF but not 2NF FIRST (supplier_no, status, city, part_no, quantity) Functional Dependencies: (supplier_no, part_no) ® quantity (supplier_no) ® s tatus (supplier_no) ® city city ® status (Supplier's s tatus is determined by location) Comments: Non-key attributes are not mutually independent (city ® status). Non-key attributes are not fully functionally dependent on the primary key (i.e., status and city are dependent on just part of the key, namely supplier_no). Anomalies: INSERT: We cannot enter the fact that a given supplier is located in a given city until that supplier supplies at least one part (otherwise, we would have to enter a null value for a column participating in the primary key C a violatio n of the de finition of a relation). DELETE: If we delete the las t (only) row for a given supplier, we lose the information that the supplier is located in a particular city. UPDATE: The city value appears many times for the same supplier. This can lead to inconsistency or the need to change many values of city if a supplier moves. Decomposition (into 2NF): SECOND (supplier_no, status, city) SUPPLIER_PART (supplier_no, part_no, quantity) 2NF A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key Example (2NF but not 3NF): SECOND (supplier_no, status, city) Functional Dependencies: supplier_no ® s tatus supplier_no ® city city ® status Comments: Lacks mutual independence among non-key attributes.

Upload: v-srinivasa-rao

Post on 08-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 1/10

 Master of Computer Application (MCA) MC0077 ± Advanced Database Systems 

Answer all Questions Each Question carries TEN Marks

1. Explain the following normal forms with a suitable example demonstrating the reduction of a

sample table into the said normal forms:

A) First Normal Form B) Second Normal Form C) Third Normal Form

Ans: 1NF A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only

Example: 1NF but not 2NF

FIRST (supplier_no, status, city, part_no, quantity)

Functional Dependencies:

(supplier_no, part_no) ® quantity

(supplier_no) ® status

(supplier_no) ® citycity ® status (Supplier's status is determined by location)

Comments:

Non-key attributes are not mutually independent (city ® status).

Non-key attributes are not fully functionally dependent on the primary key (i.e., status and city are dependent on just part of the key, namely

supplier_no).

Anomalies:

INSERT: We cannot enter the fact that a given supplier is located in a given city until that supplier supplies at least one part (otherwise, we

would have to enter a null value for a column participating in the primary key C a violation of the definition of a relation).

DELETE: If we delete the last (only) row for a given supplier, we lose the information that the supplier is located in a particular city.

UPDATE: The city value appears many times for the same supplier. This can lead to inconsistency or the need to change many values of city if a

supplier moves.

Decomposition (into 2NF):

SECOND (supplier_no, status, city)

SUPPLIER_PART (supplier_no, part_no, quantity)

2NF A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key

Example (2NF but not 3NF):

SECOND (supplier_no, status, city)

Functional Dependencies:

supplier_no ® status

supplier_no ® city

city ® status

Comments:

Lacks mutual independence among non-key attributes.

Page 2: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 2/10

Mutual dependence is reflected in the transitive dependencies: supplier_no ® city, city ® s tatus.

Anomalies:

INSERT: We cannot record that a particular city has a particular status until we have a supplier in that city.

DELETE: If we delete a supplier which happens to be the last row for a given city value, we lose the fact that the city has the given status.

UPDATE: The status for a given city occurs many times, therefore leading to multiple updates and possible loss of consistency.

Decomposition (into 3NF):

SUPPLIER_CITY (supplier_no, city)

CITY_STATUS (city, status)

3NF A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the

primary key. An attribute C is transitively dependent on attribute A if there exists an attribute B such that: A ® B and B ® C. Note that 3NF is

concerned with transitive dependencies which do not involve candidate keys. A 3NF relation with more than one candidate key will clearly have

transitive dependencies of the form: primary_key ® other_candidate_key ® any_non-key_column

An alternative (and equivalent) definition for relations with just one candidate key is:

A relation R having just one candidate key is in third normal form (3NF) if and only if the non-key attributes of R (if any) are: 1) mutuallyindependent, and 2) fully dependent on the primary key of R. A non-key attribute is any column which is not part of the primary key. Two or

more attributes are mutually independent if none of the attributes is functionally dependent on any of the others. Attribute Y is fully

functionally dependent on attribute X if X ® Y, but Y is not functionally dependent on any proper subset of the (possibly composite) attribute X

For relations with just one candidate key, this is equivalent to the simpler:

A relation R having just one candidate key is in third normal form (3NF) if and only if no non-key column (or group of columns) determines

another non-key column (or group of columns)

Example (3NF but not BCNF):

SUPPLIER_PART (supplier_no, supplier_name, part_no, quantity)Functional Dependencies:

We assume that supplier_name's are always unique to each supplier. Thus we have two candidate keys:

(supplier_no, part_no) and (supplier_name, part_no)

Thus we have the following dependencies:

(supplier_no, part_no) ® quantity

(supplier_no, part_no) ® supplier_name

(supplier_name, part_no) ® quantity

(supplier_name, part_no) ® supplier_no

supplier_name ® supplier_nosupplier_no ® supplier_name

Comments:

Although supplier_name ® supplier_no (and vice versa), supplier_no is not a non-key column ³ it is part of the primary key! Hence this

relation technically satisfies the definition(s) of 3NF (and likewise 2NF, again because supplier_no is not a non-key column).

Anomalies:

INSERT: We cannot record the name of a supplier until that supplier supplies at least one part.

DELETE: If a supplier temporarily stops supplying and we delete the last row for that supplier, we lose the supplier's name.

UPDATE: If a supplier changes name, that change will have to be made to multiple rows (wasting resources and risking loss of consistency).

Page 3: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 3/10

 2. Explain the concept of a Query? How a Query Optimizer works .

Ans: Queries are essentially powerful Filters. Queries allow you to decide what fields or expressions are to be shown and what informationto be sought. Queries are usually based on Tables but can also be based on an existing Query. Queries allow you seek from very bas icinformation through to much more complicated specifications. They also allow you to list information in a particular order, such as listing allthe resulting records in Surname order for example.

Queries can select records that fit certain criteria. If you had a list of people and had a gender field you could use a query to select just themales or females in the database. The gender field would have a criteria set as "male" which means that when the query is run only recordswith "male" in the Gender field would be listed. For each record that meets the criteria the you could choose to list other fields that may be inthe table like first name, surname, phone number, date of birth or whatever you may have in the database.

Queries can do much more than just listing out records. It is also possible to list out totals, averages etc. from the data and do various othercalculations. Queries can also be used to do other tasks, such as deleting records, updating records, adding new records, creating new tablesand creating tabulated reports

The query optimizer is the component of a database management system that attempts to determine the most efficient way to execute a

query. The optimizer considers the possible query plans for a given input query, and attempts to determine which of those plans will be themost efficient. Cost-based query optimizers assign an estimated "cost" to each possible query plan, and choose the plan with the smallestcost. Costs are used to estimate the runtime cost of evaluating the query, in terms of the number of I/O operations required, the CPUrequirements, and other factors determined from the data dictionary. The set of query plans examined is formed by examining the possibleaccess paths (e.g. index scan, sequential scan) and join algorithms (e.g. sort-merge join, hash join, nested loop join). The search space canbecome quite large depending on the complexity of the SQL query.

Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to database server, and parsed by theparser, they are then passed to the query optimizer where optimization occurs. However, some database engines allow guiding the queryoptimizer with hints.

Most query optimizers represent query plans as a tree of "plan nodes". A plan node encapsulates a single operation that is required to executethe query. The nodes are arranged as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node has zero ormore child nodes³those are nodes whose output is fed as input to the parent node. For example, a join node will have two child nodes, whichrepresent the two join operands, whereas a sort node would have a single child node (the input to be sorted). The leaves of the tree are nodeswhich produce results by scanning the disk, for example by performing an index scan or a sequential scan

Join ordering 

The performance of a query plan is determined largely by the order in which the tables are joined. For example, when joining 3 tables A, B, C ofsize 10 rows, 10,000 rows, and 1,000,000 rows, respectively, a query plan that joins B and C first can take several orders-of-magnitude moretime to execute than one that joins A and C first. Most query optimizers determine join order via a dynamic programming algorithm pioneeredby IBM's System R database project. This algorithm works in two stages:

1.  First, all ways to access each relation in the query are computed. Every relation in the query can be accessed via a sequential scan.If there is an index on a relation that can be used to answer a predicate in the query, an index scan can also be used. For eachrelation, the optimizer records the cheapest way to scan the relation, as well as the cheapest way to scan the relation that producesrecords in a particular sorted order.

2.  The optimizer then considers combining each pair of relations for which a join condition exists. For each pair, the optimizer wilconsider the available join algorithms implemented by the DBMS. It will preserve the cheapest way to join each pair of relations, inaddition to the cheapest way to join each pair of relations that produces its output according to a particular sort order.

Page 4: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 4/10

3.  Then all three-relation query plans are computed, by joining each two-relation plan produced by the previous phase with theremaining relations in the query.

In this manner, a query plan is eventually produced that joins all the queries in the relation. Note that the algorithm keeps track of the sororder of the result set produced by a query plan, also called an interesting order. During dynamic programming, one query plan is consideredto beat another query plan that produces the same result, only if they produce the same sort order. This is done for two reasons. First, aparticular sort order can avoid a redundant sort operation later on in processing the query. Second, a particular sort order can speed up asubsequent join because it clusters the data in a particular way.

Historically, System-R derived query optimizers would often only consider left-deep query plans, which first join two base tables togetherthen join the intermediate result with another base table, and so on. This heuristic reduces the number of plans that need to be considered (n!instead of 4^n), but may result in not considering the optimal query plan. This heuristic is drawn from the observation that join algorithmssuch as nested loops only require a single tuple (aka row) of the outer relation at a time. Therefore, a left-deep query plan means that fewertuples need to be held in memory at any time: the outer relation's join plan need only be executed until a single tuple is produced, and then theinner base relation can be scanned (this technique is called "pipelining").

Subsequent query optimizers have expanded this plan space to consider "bushy" query plans, where both operands to a join operator could beintermediate results from other joins. Such bushy plans are especially important in parallel computers because they allow different portionsof the plan to be evaluated independently.

Q.3. Explain the following with respect to Heuristics of Query Optimizations:

A) Equivalence of Expressions B) Selection Operation

C) Projection Operation D) Natural Join Operation

  Equivalent expressions

We often want to replace a complicated expression with a simpler one that means the same thing. For example, the expression x + 4 + 2

obviously means the same thing as x + 6, since 4 + 2 = 6. More interestingly, the expression x + x + 4 means the same thing as 2x + 4, because2x is x + x when you think of multiplication as repeated addition. (Which of these is simpler depends on your point of view, but usually 2x + 4 is

more convenient in Algebra.)

Two algebraic expressions are equivalent if they always lead to the same result when you evaluate them, no matter what values you

substitute for the variables. For example, if you substitute x := 3 in x + x + 4, then you get 3 + 3 + 4, which works out to 10; and if you substitute

it in 2x + 4, then you get 2(3) + 4, which also works out to 10. There's nothing special about 3 here; the same thing would happen no matter

what value we used, so x + x + 4 is equivalent to 2x + 4. (That's really what I meant when I said that they mean the same thing.)

When I say that you get the same result, this includes the possibility that the result is undefined. For example, 1/x + 1/x is equivalent to 2/x;

even when you substitute x := 0, they both come out the same (in this case, undefined). In contrast, x 2/x is not equivalent to x; they usuallycome out the same, but they are different when x := 0. (Then x2/x is undefined,

but x is 0.) To deal with this situation, there is a sort of trick you can play, forcing the second expression to be undefined in certain cases. Jus

add the words ¶for x � 0· at the end of the expression to make a new expression; then the new expression is undefined unless x � 0. (You can

put any other condition you like in place of x � 0, whatever is appropriate in a given situation.) So x2/x is equivalent to x for x � 0.

To symbolise equivalent expressions, people often simply use an equals sign. For example, they might say ¶x + x + 4 = 2x + 4·. The idea is that

this is a statement that is always true, no matter what x is. However, it isn't really correct to write ¶1/x + 1/x = 2/x· to indicate an equivalence

Page 5: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 5/10

of expressions, because this statement is not correct when x := 0. So instead, I will use the symbol ¶·, which you can read ¶is equivalent to

(instead of ¶is equal to· for ¶=·). So I'll say, for example,

y  x + x + 4 2x + 4,y  1/x + 1/x   2/x , andy  x 

2/x   x for x � 0.

The tex tbook, however, just uses ¶=· for everything, so you can too,

  Selection Operation

1.  Consider the query to find the assets and branch-names of all banks who have depositors living in Port Chester. In relational algebrathis is

2. 3.  (CUSTOMER DEPOSIT BRANCH))4. 

o  This expression constructs a huge relation,o  CUSTOMER DEPOSIT BRANCHo 

of which we are only interested in a few tuples.

o  We also are only interested in two attributes of this relation.o  We can see that we only want tuples for which CCITY = ``PORT CHESTER''.o  Thus we can rewrite our query as:

o o  DEPOSIT BRANCH)o o  This should considerably reduce the size of the intermediate relation.

  Projection Operation

1.  Like selection, projection reduces the size of relations.

It is advantageous to apply projections early. Consider this form of our example query:

2.  When we compute the subexpression

3. we obtain a relation whose scheme is(CNAME, CCITY, BNAME, ACCOUNT#, BALANCE)

4.  We can eliminate several attributes from this scheme. The only ones we need to retain are those thato  appear in the result of the query or 

Page 6: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 6/10

o  are needed to process subsequent operations.5.  By eliminating unneeded attributes, we reduce the number of columns of the intermediate result, and thus its size.6.  In our example, the only attribute we need is BNAME (to join with BRANCH). So we can rewrite our expression as:

7. 8. 9. 10.  Note that there is no advantage in doing an early project on a relation before it is needed for some other operation:

o  We would access every block for the relation to remove attributes.o  Then we access every block of the reduced-size relation when it is actually needed.o  We do more work in total, rather than less!

  Natural Join Operation

1.  Another way to reduce the size of temporary results is to choose an optimal ordering of the join operations.2.  Natural join is associative:

3. 4.  Although these expressions are equivalent, the costs of computing them may differ.

o Look again at our expression

o o o o  we see that we can compute DEPOSIT BRANCH first and then join with the first part.o  However, DEPOSIT BRANCH is likely to be a large relation as it contains one tuple for every account.o  The other part,

is probably a small relation (comparatively).

o So, if we compute

o first, we get a reasonably small relation.

o  It has one tuple for each account held by a resident of Port Chester.o  This temporary relation is much smaller than DEPOSIT BRANCH.

5.  Natural join is commutative:

6. o  Thus we could rewrite our relational algebra expression as:

o o o o  But there are no common attributes between CUSTOMER and BRANCH, so this is a Cartesian product.o  Lots of tuples!o  If a user entered this expression, we would want to use the associativity and commutativity of natural join to transform this

into the more efficient expression we have derived earlier (join with DEPOSIT first, then with BRANCH).

Page 7: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 7/10

 4. There are a number of historical, organizational, and technological reasons explain the lack 

of an all-encompassing data management system. Discuss few of them with appropriate

examples.

Most current data management systems, DMS, have been built on the assumption that the data collection, or database, to be administered

consists of a single media type - structured tables of "fact" data or unstructured strings of bits representing such media objects as textdocuments, images, or video. The result is that most DMS' store and index a specific type of media data and provide a query (data access)language that is specialized for efficient access to and retrieval of this data type.

A further assumption that has frequently been made is that the information requirements of the system users are known and can be used forstructuring the data collection and tuning the data management system. It has also been assumed that the users would only infrequentlyrequire information/data from some other type of data management system.

These assumptions have been criticized since the early 1980s by researchers who have pointed out that almost from the point of creation, adatabase would not (nor could) contain all of the data required by the user community (Gligor & Luckenbaugh, 1984; Landers & Rosenberg1982; Litwin et al., 1982; among many others). A number of historical, organizational, and technological reasons explain the lack of an all-

encompassing data management system. Among these are:

y  The sensible advice - to build small systems with the plan to extend their scope in later implementation phases - allows a coresystem to be implemented relatively quickly, but has lead to a proliferation of relatively small systems.

y  Department autonomy has led to construction of department specific rather than organization wide systems, again leading to manysmall, overlapping, and often incompatible systems within an organization.

y  The continual evolution of the organization and its interactions both within and to its external environment prohibits completeunderstanding of future information requirements.

y  Parallel development of data management systems for particular applications has lead to different and incompatible systems formanagement of tabular/administrative data, text/document data, historical/statistical data, spatial/geographic data, andstreamed/audio and visual data.

The result is that only a portion of an organization's data is administered by any one data management system and most organizations have a

multitude of special purpose databases, managed by different, and often incompatible, data management system types. The growing need to

retrieve data from multiple databases within an organization, as well as the rapid dissemination of data through the Internet, has given rise to

the requirement of providing integrated access to both internal and external data of multiple types.

A major challenge and critical practical and research problem for the information, computer, and communication technology communities isto develop data management systems that can provide efficient access to the data stored in multiple private and public databases (Brodie1993; Hurson & Bright, 1996; Nordbotten, 1988a, 1988b and Nordbotten, 1994a).

Problems to be resolved include:

1.  Interoperability among systems (Fox & Sornil, 1999; Liwtin, & Abdellatif, 1986),2.  Incorporation of legacy systems (Brodie, 1993) and3.  Integration of management techniques for structured and unstructured data (Stonebraker & Brown, 1999).

Each of the above problems entails an integration of concepts, methods, techniques and tools from separate research and development

communities that have existed in parallel but independently and have had rather minimal interaction. One consequence of which is that there

exists an overlapping and conflicting terminology between these communities.

Page 8: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 8/10

In the previous chapter, a database was defined as a COLLECTION OF RELATED DATA REPRESENTING SOME LOGICALLY COHERENT ASPECT OFTHE REAL WORLDWith this definition, NO limitations are given as to the type of:

y  Data in the collection,y  Model used to structure the collection, ory  Architecture and geographic location of the database

The focus of this text is on on-line - electronic and web accessible - databases containing multiple media data, thus restricting ourinterest/focus to multimedia databases stored on one or more computers (DB servers) and accessible from the Internet. Examples of such

databases include the image collections of the Hermitage Museum, the catalog and full text materials of the ACM digital library, and the

customer records for the 7 sites of Amazon.com

Electronic databases are important since they contain data recording the products and services, as well as the economic history and currentstatus of the owner organization. They are also a source of information for the organization's employees and customers/users. Howeverdatabases can not be used effectively unless there exist efficient and secure data management systems, DMS for the data in the databases.

Q5.Describe the Structural Semantic Data Model (SSM) with relevant examples.

Ans: Modelling Complex and Multimedia Data

Data modelling addresses a need in information system analysis and design to develop a model of the information requirements as well as aset of viable database structure proposals. The data modelling process consists of:

1.  Id entifying and  d escribing the information requirements for an information system,2.  Specifying the d ata to be maintained by the d ata management system, and  3.  Specifying the d ata structures to be used for d ata storage that best support the information requirements.

A fund amental tool used  in this process is the d ata mod el, which is used both for specification of the information requirements at the user

level and  for specification of the d ata structure for the d atabase. During implementation of a d atabase, the d ata mod el guid es construction o

the schema or d ata catalog which contains the metad ata that d escribe the DB structure and   d ata semantics that are used  to suppor

d atabase implementation and  d ata retrieval.

Data mod elling, using a specific d ata mod el type, and as a unique activity d uring information system d esign, is commonly attributed  to CharlesBachman (1969) who presented  the Data Structure Diagram as one of the first, wid ely used  d ata mod els for network d atabase d esign. Severaalternative d ata mod el types were proposed shortly thereafter, the best known of which are the:

y  Relational model (Codd, 1970) and they  Entity-relationship, ER, model (Chen, 1976).

The relational model was quickly criticized for being 'flat' in the sense that all information is represented as a set of tables with atomic cellvalues. The definition of well-formed relational models requires that complex attribute types (hierarchic, composite, multi-valued, and derivedbe converted to atomic attributes and that relations be normalized. Inter-entity (inter-relation) relationships are difficult to visualize in theresulting set of relations, making control of the completeness and correctness of the model difficult. The relational model maps easily to thephysical characteristics of electronic storage media, and as such, is a good tool for design of the physical database.

The entity-relationship approach to modelling, proposed by Chen (1976), had two primary objectives: first to visualize inter-entity relationshipsand second to separate the DB design process into two phases:

Page 9: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 9/10

1.  Record, in an ER model, the entities and inter-entity relationships required "by the enterprise", i.e. by the owner/user of theinformation system or application. This phase and its resulting model should be independent of the DBMS tool that is to be used forrealizing the DB.

2.  Translate the ER model to the data model supported by the DBMS to be used for implementation.

This two-phase design supports modification at the physical level without requiring changes to the enterprise or user view of the DB content.

Also Chen's ER model quickly came under criticism, particularly for its lack of ability to model classification structures. In 1977, (Smith &Smith) presented a method for modelling generalization and aggregation hierarchies that underlie the many extended/enhanced entity-relationship, EER, model types proposed and in use today.

6.What are differences in Global and Local Transactions in distributed database system? What

are the roles of Transaction Manager and Transaction Coordinator in managing transactions in

distributed database?

Ans: A distributed database system consists of a-collection of sites, each of which maintains a local databases system. Each site is able toprocess local transactions, those transactions that access data only in that single site. In addition, a site may participate in the execution of

global transactions, those transactions that access data is several sites. The execution of global transactions requires communication amongthe sitcs.The sites in the system can be connected physically in a variety of ways. The various topologies are represented as graphs whose nodescorrespond to sites. An edge from node A Lo node B corresponds to a direct connection between the two sites. Some of thc most commonconfigurations are depicted in Figure 1. The major differences among these configurations involve:

y  Installation cost. The cost of physically linking the sites in the system.y  Communication cost. The cost in time and money to send a message from site A toy  site B.y  Reliability. The frequency with which a link or site fails.y  Availability. The degree to which data can be accessed despite the failure of somey  links or sites.

As we shall see, these differences play an important role in choosing the appropriatemechanism for handling the distribution of data. The sites of a distributed database system may be distributed physically either over a largegeographical area (such as the all Indian states). or over a small geographical area such as a single building or a number of adjacenbuildings). The former type of network is referred to as a long-haul network, while the latter is referred to as a local-area network. Since thesites in long-haul networks are distributed physically over a large geographical area, the communication links are likely to be relatively slowand less reliable as compared with local-are. networks. Qpical long-haul links are telephone lines, microwave links, and satellite channels. Incontrasf since all the sites in local-area netwoks are close to each other, communication links are of higher speed and lower e m r rate thantheir counterparts inlong-haul networks. The most common links are twisted pair, baseband coaxial, broadband

coaxial, and fiber optics.Let us illustrate these concepts by considering a banking system consisting of four branches located in four different cities. Each branch hasits own computer with a database consisting of all the accounts maintained at that branch. Each such installation is thus a site. There alsoexists one single site which maintains information about all the branches of the bank. Suppose that the database systems at the various sitesare based on the relational model. Thus, each branch maintains (among others) the relation deposite (Dejmsit-scheme) where Deposite-scheme = (branch-name, account-number, customer-name, balance) site containing information about the four branches maintains therelation branch

(Branch-scheme), whereBranch-scheme = (branch-name, assets, branch-city)

There are other relations maintained at the various sites which are ignored for the prrrpose of our example.

Page 10: Master of Computer Application-mc0077

8/7/2019 Master of Computer Application-mc0077

http://slidepdf.com/reader/full/master-of-computer-application-mc0077 10/10

A local transaction is a transaction that accesses accounts in the one single site, at which the D b h l b u t d Datnbs s l uansaction wasinitiated. A global transaction, on the other hand is m e which either access accounts in a site different from the one at which the transactionwas initiated, or accesses accounts in several different sites. To illustrate the difference between these two types of transactions, considerthe transaction to add $ 5 0 to account number 177 located at the Dehi branch. If the transaction was initiated at the Delhi branch, then i t isconsidered local; otherwise, it is considered global. A transaction to uansfer $50 from account 177 to account 305, which is located at theBombay branch, is a global transaction since accounts in two different sites are accessed as a result of its execution. What makes the aboveconfiguration a distributed database system are the facts that

The various sites are aware of each other.

Each site provides an environment for executing both local and global transactions.

There are several reasons for building distributed database systems, including sharing of data, reliability and availability, andspeedup of query processing. However, along with these advantages come several disadvantages, including software development costgreater potential for bugs, and increased processing overhead.

A distributed database system cons is of a collection of sites, each of which maintain a local database system. Each site is able toprocess local transaction, those transaction that access data only in that single site. In addition, a site may participate in the execution oglobal transactions those transactions that access data n several sites. The execution of global transactions requires communication amongthe sites. There are several reasons for building distributed database systems, including sharing of data, reliability and availability, and speed

or query processing. However, along with those advantages come several disadvantages, including software development cost, greaterpotential for bugs, and increased processing overhead. The primary disadvantage of distributed database systems in the added complexityrequired to ensure proper co-ordination among the sites. There are several issues involved in storing, a relation in the distributed databaseincluding replication and fragmentation. It is essential that the system minimise the degree to which a user needs to be aware of how arelation is stored.

A tr ansaction  man ager is a program module that provides the interface between the lowlevel data stored in the database and theapplication programs and queries submitted to the system. The storage manager is responsible for the interaction with the file manager. Theraw data are stored on the disk using the file system, which is usually provided by a conventional operating system. The storage managertranslates the various DML statements into low-level file-system commands. Thus, the storage manager is responsible for storing, retrieving,

and updating data in the database.The storage manager components include:

y  Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks the authority of users toaccess data.

y  Transaction manager, which ensures that the database remains in a consistent (correct) state despite system failures, and thatconcurrent transaction executions proceed without conflicting.

y  File manager, which manages the allocation of space on disk storage and the data structures used to represent information storedon disk.

y  Buffer manager, which is responsible for fetching data from disk storage into main memory, and deciding what data to cache inmain memory. The buffer manager is a critical part of the database system, since it enables the database to handle data sizes thatare much larger than the size of main memory.The storage manager implements several data structures as part of the physical

system implementation:

1.  Data files, which store the database itself.2.  Data dictionary, which stores metadata about the structure of the database, in particular the schema of the database.3.  Indices, which provide fast access to data items that hold particular values.

-------------------------- xxxxxxxxxxxxxxxxx -------------------------