chapter 13 client/server database and distributed database fundamentals of database management...

53
Chapter 13 Chapter 13 Client/Server Client/Server Database and Database and Distributed Database Distributed Database Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal Chin, Ph.D. Virginia Commonwealth University John Wiley & Sons, Inc.

Upload: reynard-dalton

Post on 20-Jan-2018

271 views

Category:

Documents


0 download

DESCRIPTION

13-3 Chapter Objectives  Describe the problem of concurrency control in distributed database.  Describe the distributed join process.  Describe data partitioning in a distributed database.  Describe distributed directory management.

TRANSCRIPT

Chapter 13Chapter 13Client/Server Database Client/Server Database

and Distributed Databaseand Distributed DatabaseFundamentals of Database Management Systems

byMark L. Gillenson, Ph.D.University of Memphis

Presentation by: Amita Goyal Chin, Ph.D.Virginia Commonwealth University

John Wiley & Sons, Inc.

13-13-22

Chapter ObjectivesChapter Objectives Describe the concepts and advantages of Describe the concepts and advantages of

the client/server database approach. the client/server database approach.

Describe the concepts and advantages of Describe the concepts and advantages of the distributed database approach. the distributed database approach.

Explain how data can be distributed and Explain how data can be distributed and replicated in a distributed database. replicated in a distributed database.

13-13-33

Chapter ObjectivesChapter Objectives Describe the problem of concurrency control in Describe the problem of concurrency control in

distributed database. distributed database.

Describe the distributed join process. Describe the distributed join process.

Describe data partitioning in a distributed Describe data partitioning in a distributed database. database.

Describe distributed directory management. Describe distributed directory management.

13-13-44

Local Area Network (LAN)Local Area Network (LAN) An arrangement of An arrangement of

personal computers personal computers connected together by connected together by communications lines.communications lines.

The PCs must be located The PCs must be located fairly close to each other.fairly close to each other.

Allows sharing of Allows sharing of resources such as resources such as servers, printers, etc.servers, printers, etc.

13-13-55

Local Area Network (LAN)Local Area Network (LAN) The PCs on a LAN can The PCs on a LAN can

certainly operate certainly operate independently, but they independently, but they can also communicate can also communicate with each other.with each other.

A gateway computer on A gateway computer on the LAN can link the LAN the LAN can link the LAN and its PCs to other and its PCs to other LANs, to one or more LANs, to one or more mainframe computers, or mainframe computers, or to the Internet.to the Internet.

13-13-66

Two-Tiered Client/Server Two-Tiered Client/Server ArrangementArrangement

Clients = PCs on the LANClients = PCs on the LAN

Server = powerful computer on the LANServer = powerful computer on the LAN

A shared database can be stored on a A shared database can be stored on a LAN server so that all of the PCs on the LAN server so that all of the PCs on the LAN can access it.LAN can access it.

13-13-77

File Server ApproachFile Server Approach When a LAN client needs to query, update, or When a LAN client needs to query, update, or

otherwise use a file on the server, the entire file otherwise use a file on the server, the entire file must be sent from the server to that client.must be sent from the server to that client.

All of the querying, updating, or other processing All of the querying, updating, or other processing is then performed at the client.is then performed at the client.

If changed, the entire file is then shipped back to If changed, the entire file is then shipped back to the server.the server.

13-13-88

DBMS Server ApproachDBMS Server Approach The database is located at the server.The database is located at the server.

The processing is split between the client The processing is split between the client and the server.and the server.

Result: there is much less data traffic on Result: there is much less data traffic on the network.the network.

13-13-99

Two-Tier ApproachTwo-Tier Approach Some databases can be Some databases can be

stored on a client PC’s stored on a client PC’s own hard drive while own hard drive while other databases that the other databases that the client might access are client might access are stored on the LAN’s stored on the LAN’s server.server.

Software has been Software has been developed that makes the developed that makes the location of the data location of the data transparent to the user at transparent to the user at the client.the client.

13-13-1010

Two-Tier ApproachTwo-Tier Approach The user issues a query The user issues a query

at the client.at the client.

The software first checks The software first checks to see if the required data to see if the required data is on the PC’s own hard is on the PC’s own hard drive. If yes, the data is drive. If yes, the data is retrieved from it.retrieved from it.

If it is not on local drive, If it is not on local drive, the software looks for it the software looks for it on the server.on the server.

13-13-1111

Three-Tier ApproachThree-Tier Approach

If the software doesn’t find the data on the client PC’s If the software doesn’t find the data on the client PC’s hard drive or on the LAN server, it can leave the LAN hard drive or on the LAN server, it can leave the LAN through a gateway computer and look for the data on, for through a gateway computer and look for the data on, for example, a large, mainframe computer that may be example, a large, mainframe computer that may be reachable from many LANs. reachable from many LANs.

13-13-1212

Three-Tier ApproachThree-Tier Approach

Another use of the term “Three-Tier Approach” with the Another use of the term “Three-Tier Approach” with the following three tiers:following three tiers: The client PCsThe client PCs Servers known as Servers known as application serversapplication servers Other servers known as database serversOther servers known as database servers

13-13-1313

The Distributed Database The Distributed Database ConceptConcept

Instead of having one, centralized Instead of having one, centralized database, we are going to spread the data database, we are going to spread the data out among various cities on the distributed out among various cities on the distributed network, each of which has its own network, each of which has its own computer and data storage facilities.computer and data storage facilities.

All of this All of this distributed datadistributed data is still is still considered to be a single logical database.considered to be a single logical database.

13-13-1414

The Distributed Database The Distributed Database ConceptConcept

Location transparency - The user just Location transparency - The user just issues the query, and the result is issues the query, and the result is returned.returned. A person or process anywhere on the A person or process anywhere on the

distributed network queries the database.distributed network queries the database.

It is not necessary to know where on the It is not necessary to know where on the network the data being sought is located.network the data being sought is located.

13-13-1515

Distributed DBMSDistributed DBMS Distributed database management systemDistributed database management system

Sophisticated softwareSophisticated software Manages location transparencyManages location transparency

13-13-1616

Distributing the DataDistributing the Data Headquartered in NY, Headquartered in NY,

a company’s a company’s database consists of database consists of 6 large tables: A, B, 6 large tables: A, B, C, D, E, F.C, D, E, F.

With a centralized With a centralized database, all 6 tables database, all 6 tables would be located in would be located in NY.NY.

13-13-1717

Distributing the DataDistributing the Data The company has major sites in Los The company has major sites in Los

Angeles, Memphis, New York, Paris, and Angeles, Memphis, New York, Paris, and Tokyo.Tokyo.

The first and simplest idea in distributing The first and simplest idea in distributing the data would be to disperse the six the data would be to disperse the six tables among the five sites, perhaps tables among the five sites, perhaps based on frequency of use of each table.based on frequency of use of each table.

13-13-1818

Distributing the DataDistributing the Data

Tables A and B are kept Tables A and B are kept at New Yorkat New York

Table C is moved to Table C is moved to MemphisMemphis

Tables D and E are Tables D and E are moved to Tokyomoved to Tokyo

Table F is moved to Table F is moved to Paris.Paris.

13-13-1919

Distributing the DataDistributing the Data Paris employees can now access Table F Paris employees can now access Table F

without incurring telecommunications without incurring telecommunications costs associated with accessing Table F in costs associated with accessing Table F in NY.NY.

Local autonomy - Paris employees, e.g., Local autonomy - Paris employees, e.g., can take responsibility for Table F -- its can take responsibility for Table F -- its security, backup and recovery, and security, backup and recovery, and concurrency control. concurrency control.

13-13-2020

Distributing the Data: Distributing the Data: ProblemsProblems

When the database was centralized at New York, When the database was centralized at New York, a query issued at any of the sites that required a a query issued at any of the sites that required a join of two or more of the tables could be handled join of two or more of the tables could be handled in the standard way by the computer at New York.in the standard way by the computer at New York.

The result would then be sent to the site that The result would then be sent to the site that issued the query.issued the query.

In the dispersed approach, a join might require In the dispersed approach, a join might require tables located at different sites!tables located at different sites!

13-13-2121

Replicated TablesReplicated Tables Duplicated tables at two or more sites on Duplicated tables at two or more sites on

the network.the network.

AdvantagesAdvantages Availability - during a site failure, data can still Availability - during a site failure, data can still

be accessed at a replicated location.be accessed at a replicated location.

Local access - Replicate table at a site Local access - Replicate table at a site requiring frequent access.requiring frequent access.

13-13-2222

Replicated TablesReplicated Tables DisadvantagesDisadvantages

Security riskSecurity risk

Concurrency control - How do you keep data Concurrency control - How do you keep data consistent when it is replicated in tables on consistent when it is replicated in tables on three continents?three continents?

13-13-2323

Full Data ReplicationFull Data Replication

The maximum The maximum approach of approach of replicating every table replicating every table at every site.at every site.

Great for availabilityGreat for availability

Great for joinsGreat for joins

13-13-2424

Full Data ReplicationFull Data Replication

Worst for concurrency Worst for concurrency control - every change to control - every change to every table has to be every table has to be reflected at every site. reflected at every site.

Worst for securityWorst for security

Takes up a lot of disk Takes up a lot of disk spacespace

13-13-2525

Partial ReplicationPartial Replication

Have a copy of the Have a copy of the entire database at entire database at headquarters in New headquarters in New York and have each York and have each table replicated table replicated exactly once at one of exactly once at one of the other sites.the other sites.

13-13-2626

Partial ReplicationPartial Replication Improves availability -Improves availability -

each table is now at each table is now at two sites.two sites.

Security and Security and concurrency concurrency exposures are limited.exposures are limited.

Joins occur at NY.Joins occur at NY.

13-13-2727

Partial ReplicationPartial Replication New York could tend to New York could tend to

become a bottleneck.become a bottleneck.

If a table is heavily used If a table is heavily used in both Tokyo and Los in both Tokyo and Los Angeles, it can only be Angeles, it can only be placed at one of the two placed at one of the two sites (plus the copy of the sites (plus the copy of the entire database in New entire database in New York), leaving the other York), leaving the other with speed and telecom with speed and telecom cost problems.cost problems.

13-13-2828

Replication PrinciplesReplication Principles Place copies of tables at the sites that use Place copies of tables at the sites that use

them most heavily in order to minimize them most heavily in order to minimize telecommunications costs.telecommunications costs.

Ensure that there are at least two copies Ensure that there are at least two copies of important or frequently used tables to of important or frequently used tables to realize the gains in availability.realize the gains in availability.

13-13-2929

Replication PrinciplesReplication Principles Limit the number of copies of any one Limit the number of copies of any one

table to control the security and table to control the security and concurrency issues.concurrency issues.

Avoid any one site becoming a bottleneck.Avoid any one site becoming a bottleneck.

13-13-3030

Replication PrinciplesReplication Principles

13-13-3131

Concurrency Control in Concurrency Control in Distributed DatabaseDistributed Database

The “lost update” problem.The “lost update” problem.

The protections that we discussed earlier The protections that we discussed earlier that can be put into place to handle the that can be put into place to handle the problem of concurrent update in a single problem of concurrent update in a single table are not adequate to handle the new, table are not adequate to handle the new, expanded problem in distributed database expanded problem in distributed database systems.systems.

13-13-3232

Asynchronous ApproachAsynchronous Approach If retrieved data does not necessarily have If retrieved data does not necessarily have

to be up-to-the-minute accurate, we can to be up-to-the-minute accurate, we can use “asynchronous” approaches to use “asynchronous” approaches to updating replicated data.updating replicated data.

13-13-3333

Asynchronous SchemesAsynchronous Schemes The site where the data was updated can send a The site where the data was updated can send a

message to update to the other sites that contain message to update to the other sites that contain a copy of the same table.a copy of the same table.

One of the sites can be chosen to accumulate all One of the sites can be chosen to accumulate all of the updates to all of the tables, and transmit of the updates to all of the tables, and transmit changes regularly.changes regularly.

Each table can have one of the sites be declared Each table can have one of the sites be declared the “dominant” site for that table, which the “dominant” site for that table, which periodically transmits updates to the other sites.periodically transmits updates to the other sites.

13-13-3434

Synchronous ApproachSynchronous Approach If retrieved data does have to be up-to-the-If retrieved data does have to be up-to-the-

minute accurate.minute accurate.

All data in replicated tables worldwide All data in replicated tables worldwide must always be consistent, accurate, and must always be consistent, accurate, and up-do-date.up-do-date.

Use Use two-phase committwo-phase commit..

13-13-3535

Two-Phase Commit: Two-Phase Commit: Prepare PhasePrepare Phase

Each computer on the network has a special log file in addition to its Each computer on the network has a special log file in addition to its database tables.database tables.

The computer at the initiating site sends the updated data to the The computer at the initiating site sends the updated data to the other sites that have copies of the table to be updated.other sites that have copies of the table to be updated.

The computers at the other sites record the changes in their logs The computers at the other sites record the changes in their logs (but not in the actual database tables.)(but not in the actual database tables.) These computers attempt to lock the database tables involved in the These computers attempt to lock the database tables involved in the

update.update.

If they are successful (the tables are not busy and can be locked) they If they are successful (the tables are not busy and can be locked) they inform the initiating site.inform the initiating site.

13-13-3636

Two-Phase Commit:Two-Phase Commit:Commit PhaseCommit Phase

If all of the other sites reported they were If all of the other sites reported they were successful in logging the update and successful in logging the update and locking the tables, the initiating site issues locking the tables, the initiating site issues instructions to transfer the update from the instructions to transfer the update from the logs to the actual database tables.logs to the actual database tables.

13-13-3737

Two-Phase CommitTwo-Phase Commit Either all of the replicated files have to be Either all of the replicated files have to be

updated or none of them must be updated.updated or none of them must be updated.

A complex, costly, and time-consuming process.A complex, costly, and time-consuming process.

The more volatile the data in the database, the The more volatile the data in the database, the less attractive is this procedure for updating less attractive is this procedure for updating replicated tables in the distributed database.replicated tables in the distributed database.

13-13-3838

Distributed JoinsDistributed Joins A query that is run from one of the computers in A query that is run from one of the computers in

a distributed database system that requires a a distributed database system that requires a join of two or more tables that are not all at the join of two or more tables that are not all at the same computer.same computer.

The distributed DBMS must have its own built-in The distributed DBMS must have its own built-in expert system that is capable of figuring out an expert system that is capable of figuring out an efficient way to handle a request for a distributed efficient way to handle a request for a distributed join.join.

13-13-3939

Distributed JoinsDistributed Joins The DBMS evaluates various options for The DBMS evaluates various options for

performing a join by considering:performing a join by considering: The number and size of the records from each table The number and size of the records from each table

involved in the join.involved in the join.

The distances and costs of transmitting the records The distances and costs of transmitting the records from one site to another to execute the join.from one site to another to execute the join.

The distance and cost of shipping the result of the join The distance and cost of shipping the result of the join back to the site that issued the query in the first place. back to the site that issued the query in the first place.

13-13-4040

PartitioningPartitioning The purpose is to have records or columns The purpose is to have records or columns

of a table resident at the sites that use of a table resident at the sites that use them the most frequently.them the most frequently.

Horizontal PartitioningHorizontal Partitioning

Vertical PartitioningVertical Partitioning

13-13-4141

Horizontal PartitioningHorizontal Partitioning

A relational table can A relational table can be split up so that be split up so that some records are some records are located at one site, located at one site, other records are other records are located at another located at another site, and so on.site, and so on.

e.g., partitioning of e.g., partitioning of Table G.Table G.

13-13-4242

Vertical PartitioningVertical Partitioning The columns of a table are divided up The columns of a table are divided up

among several cities on the network.among several cities on the network.

Each such partition must include the Each such partition must include the primary key attribute(s) of the table.primary key attribute(s) of the table.

Makes sense when different sites are Makes sense when different sites are responsible for processing different responsible for processing different functions involving an entity.functions involving an entity.

13-13-4343

Distributed Directory Distributed Directory ManagementManagement

A distributed DBMS must include a directory that A distributed DBMS must include a directory that keeps track of where the database tables, the keeps track of where the database tables, the replicated copies of database tables (if any), and replicated copies of database tables (if any), and the table partitions (if any) are located.the table partitions (if any) are located.

When a query is presented at any site on the When a query is presented at any site on the network, the distributed DBMS can automatically network, the distributed DBMS can automatically use the directory to find out where the required use the directory to find out where the required data is located and maintain location data is located and maintain location transparency.transparency.

13-13-4444

Directory LocationDirectory Location The entire directory could be stored at only The entire directory could be stored at only

one site.one site.

Copies of the directory could be stored at Copies of the directory could be stored at several of the sites.several of the sites.

A copy of the directory could be stored at A copy of the directory could be stored at every site. (This is generally the best every site. (This is generally the best solution.)solution.)

13-13-4545

Distributed DBMS SummaryDistributed DBMS Summary Centralized Database AdvantagesCentralized Database Advantages

Single site provides high degree of security, Single site provides high degree of security, concurrency, and backup and recovery concurrency, and backup and recovery control.control.

No need for a distributed directory since all of No need for a distributed directory since all of the data is in one place.the data is in one place.

No need for distributed joins since all of the No need for distributed joins since all of the data is in one place.data is in one place.

13-13-4646

Distributed DBMS SummaryDistributed DBMS Summary Centralized Database DisadvantagesCentralized Database Disadvantages

All data accesses from other than the site with All data accesses from other than the site with the database incur communications costs.the database incur communications costs.

The site with the database can become a The site with the database can become a bottleneck.bottleneck.

Possible availability problem: if the site with Possible availability problem: if the site with the database goes down, there can be no the database goes down, there can be no data access. data access.

13-13-4747

Distributed DBMS SummaryDistributed DBMS Summary Dispersing Tables on the Network AdvantagesDispersing Tables on the Network Advantages

Local autonomy.Local autonomy.

Reduced communications costs because each table Reduced communications costs because each table can be located at the site that most heavily uses it.can be located at the site that most heavily uses it.

Improved availability because portions of the Improved availability because portions of the database are available even if one or some of the database are available even if one or some of the sites are down. sites are down.

13-13-4848

Distributed DBMS SummaryDistributed DBMS Summary Dispersing Tables on the Network DisadvantagesDispersing Tables on the Network Disadvantages

Several sites have to be concerned with Several sites have to be concerned with security, concurrency, and backup and security, concurrency, and backup and recovery.recovery.

Requires a distributed directory and the Requires a distributed directory and the software to support location transparency.software to support location transparency.

Requires distributed joins. Requires distributed joins.

13-13-4949

Distributed DBMS SummaryDistributed DBMS Summary Targeted Data Replication AdvantagesTargeted Data Replication Advantages

Greatly reduced communications costs for Greatly reduced communications costs for read-only data access because copies of read-only data access because copies of tables can be located at multiple sites that tables can be located at multiple sites that most heavily use them.most heavily use them.

Greatly improved availability because if a site Greatly improved availability because if a site with a database table goes down, there may with a database table goes down, there may be another site with a copy of that table.be another site with a copy of that table.

13-13-5050

Distributed DBMS SummaryDistributed DBMS Summary Targeted Data Replication DisadvantagesTargeted Data Replication Disadvantages

Multi-site concurrency control when data in Multi-site concurrency control when data in replicated tables is updated.replicated tables is updated.

13-13-5151

Distributed DBMS SummaryDistributed DBMS Summary Partitioned Tables AdvantagesPartitioned Tables Advantages

Greatest local autonomy because data at the Greatest local autonomy because data at the record or column level can be stored at the record or column level can be stored at the site(s) that most heavily use it.site(s) that most heavily use it.

Greatly reduced communications costs Greatly reduced communications costs because data at the record or column level because data at the record or column level can be stored at the site(s) that most heavily can be stored at the site(s) that most heavily use it. use it.

13-13-5252

Distributed DBMS SummaryDistributed DBMS Summary Partitioned Tables DisadvantagesPartitioned Tables Disadvantages

Retrieving all or a large portion of a table may Retrieving all or a large portion of a table may require multi-site accesses.require multi-site accesses.

13-13-5353

“Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.”