dr.-ing. eike schallehn · 2017. 11. 2. · distributed data management organization organization...
TRANSCRIPT
Distributed Data Management
Dr.-Ing. Eike Schallehn
OvG Universität MagdeburgFakultät für Informatik
Institut für Technische und Betriebliche Informationssysteme
2007/2008
Distributed Data Management
Organization
Organization
Lecture: Friday, 13:00 - 14:30, Raum G10-110Teacher: Eike Schallehn
Consultation hours: on appointment via emailRoom: G29-125Telephone: (0391) 67-12845Email: [email protected]
Exercise: Thursday, 11:15 - 13:45, Raum G22A-218Starting October 24thTeacher: Eike Schallehn
Distributed Data Management
Organization
Prerequisites
Required: knowledge about database basics fromdatabase introduction course
Basic principles, Relational Model, SQL, database design,ER Model
Helpful: advanced knowledege about database internalsQuery processing, storage structures
Helpful hands-on experience:SQL queries, DDL and DML
Distributed Data Management
Organization
Overview
1 Foundations2 Distributed DBMS:
architectures, distribution, query processing, transactionmanagement, replication
3 Parallel DBMS:architectures, query processing
4 Federated DBS:architectures, conflicts, integration, query processing
5 Peer-to-peer Data Management
Distributed Data Management
Organization
English Literature /1
M. Tamer Özsu, P. Valduriez: Principles of DistributedDatabase Systems. Second Edition, Prentice Hall, UpperSaddle River, NJ, 1999.S. Ceri and G. Pelagatti: Distributed Databases Principlesand Systems, McGraw Hill Book Company, 1984.C. T. Yu, W. Meng: Principles of Database QueryProcessing for Advanced Applications. Morgan KaufmannPublishers, San Francisco, CA, 1998.
Distributed Data Management
Organization
English Literature /2
Elmasri, R.; Navathe, S.: Fundamentals of DatabaseSystems, Addison Wesley, 2003C. Dye: Oracle Distributed Systems, O’Reilly, Sebastopol,CA, 1999.D. Kossmann: The State of the Art in Distributed QueryProcessing, ACM Computing Surveys, Vol. 32, No. 4,2000, S. 422-469.
Distributed Data Management
Organization
German Literature
P. Dadam: Verteilte Datenbanken undClient/Server-Systeme, Springer-Verlag, Berlin, Heidelberg1996.E. Rahm: Mehrrechner-Datenbanksysteme.Addison-Wesley, Bonn, 1994.http://dbs.uni-leipzig.de/buecher/mrdbs/S. Conrad: Föderierte Datenbanksysteme: Konzepte derDatenintegration. Springer-Verlag, Berlin/Heidelberg,1997.
Part I
Introduction
Distributed Data Management
Introduction
Overview
MotivationRecapitulationClassification of Multi-Processor DBMS
Distributed Data Management
Introduction
Motivation
Centralized Data Management
TXN management
Data manipulation
Data definition
Application
DBMS
Application
Application
...
New requirementsSupport for de-centralized organization structuresHigh availabilityHigh performanceScalability
Distributed Data Management
Introduction
Motivation
Data Management in a Network
Node
Network
Node
Node
Node
Distributed Data Management
Introduction
Motivation
Distributed Data Management
Node
Node
Network
Node
Node
Distributed Data Management
Introduction
Motivation
Distributed Data Management: Example
Network
Products San Francisco, Sydney
Customers San Francisco, Sydney
Customers SydneyProducts München, Magdeburg,
Sydney
Customers München
Customers Magdeburg
Products Magdeburg, San Francisco
Products Sydney
Sydney
Magdeburg
München
San Francisco
Distributed Data Management
Introduction
Motivation
Advantages of Distributed DBMS
Transparent management of distributed/replicated dataAvailability and fault tolerancePerformanceScalability
Distributed Data Management
Introduction
Motivation
Transparent Data Management
Transparency: "‘hide"’ implementation detailsFor (distributed) database systems
Data independence (physical, logical)Network transparency
"‘hide"’ existence of the network"‘hide"’ physical location of data
To applications a distributed DBS looks just like acentralized DBS
Distributed Data Management
Introduction
Motivation
Transparent Data Management/2
continued:Replication transparency
Replication: managing copies of remote data (performance,availability, fault-tolerance)Hiding the existence of copies (e.g. during updates)
Fragmentation transparencyFragmentation: decomposition of relations and distribution ofresulting fragmentsHiding decomposition of global relation
Distributed Data Management
Introduction
Motivation
Who provides Transparency?
ApplicationDifferent parts/modules of distributed applicationCommunication / data exchange using standard protocols(RPC, CORBA, HTTP, SOAP, . . . )
DBMSTransparent SQL-access to data on remote DB-instancesRequires query decomposition, transaction coordination,replication
Operating systemOperating systems provides network transparency e.g. onfile system level (NFS) or through standard protocols(TCP/IP)
Distributed Data Management
Introduction
Motivation
Fault-Tolerance
Failure of one single node can be compensatedRequires
Replicated copies on different nodesDistributed transactions
Distributed Data Management
Introduction
Motivation
Performance
Data can be stored, where they are most likely used →reduction of transfer costsParallel processing in distributed systems
Inter-transaction-parallelism: parallel processing of differenttransactionsInter-query-parallelism: parallel processing of differentqueriesIntra-query-parallelism: parallel of one or several operationswithin one query
Distributed Data Management
Introduction
Motivation
Scalability
Requirements raised by growing databases or necessaryperformance improvement
Addition of new nodes/processors often cheaper thandesign of new system or complex tuning measures
Distributed Data Management
Introduction
Motivation
Differentiation
Distributed Information SystemApplication components communicate for purpose of dataexchange (distribution on application level)
Distributed DBSDistribution solely realized on the DBS-level
Distributed Data Management
Introduction
Motivation
Differentiation /2
DBMS on distributed file systemAll data must be read from blocks stored on different disksProcessing is performed only within DBMS node (notdistributed)Distribution handled by operating system
Distributed Data Management
Introduction
Motivation
Special Case: Parallel DBS
Data management on simultaneous computer (multiprocessor, special hardware)Processing capacities are used for performanceimprovementExample
100 GB relation, sequential read with 10 MB/s 17minutesparallel read on 10 nodes (without considering coordinationoverhead) 1:40 minutes
Distributed Data Management
Introduction
Motivation
Special Case: Heterogeneous DBS
Motivation: integration of previously existing DBS (legacysystems)
Integrated access: global queries, relationships betweendata objects in different databases, global integrity
ProblemsHeterogeneity on different levels: system, data model,schema, data
Special importance on the WWW: integration of Websources Mediator concept
Distributed Data Management
Introduction
Motivation
Special Case: Peer-to-Peer-Systems
Peer-to-Peer (P2P): networks without centralized serversAll / many nodes (peers) store dataEach node knows only some "‘close"’ neighbors
No global viewNo centralized coordination
Examples: Napster, Gnutella, Freenet, BitTorrent, . . .Distributed management of data (e.g. MP3-Files)Lookup using centralized servers (Napster) or distributed(Gnutella)
Distributed Data Management
Introduction
Recapitulation
Database Management Systems (DBMS)
Nowadays commonly usedto store huge amounts of data persistently,in collaborative scenarios,to fulfill high performance requirements,to fulfill high consistency requirements,as a basic component of information systems,to serve as a common IT infrastructure for departments ofan organization or company.
Distributed Data Management
Introduction
Recapitulation
Database Management Systems
A database management system (DBMS) is a suite ofcomputer programs designed to manage a database and runoperations on the data requested by numerous clients.
A database (DB) is an organized collection of data.
A database system (DBS) is the concrete instance of adatabase managed by a database management system.
Distributed Data Management
Introduction
Recapitulation
Codd’s 9 Rules for DBMS /1
Differentiate DBMS from other systems managing datapersistently, e.g. file systems
1 Integration: homogeneous, non-redundant managementof data
2 Operations: means for accessing, creating, modifying,and deleting data
3 Catalog: the data description must be accessible as partof the database itself
4 User views: different users/applications must be able tohave a different perception of the data
Distributed Data Management
Introduction
Recapitulation
Codd’s 9 Rules for DBMS /2
5 Integrity control: the systems must provide means togrant the consistency of data
6 Data security: the system must grant only authorizedaccesses
7 Transactions: multiple operations on data can be groupedinto a logical unit
8 Synchronization: parallel accesses to the database aremanaged by the system
9 Data backups: the system provides functionality to grantlong-term accessibility even in case of failures
Distributed Data Management
Introduction
Recapitulation
3 Level Schema Architecture
Internal Schema
Data R
epresentation
Query P
rocessing
Conceptual Schema
External Schema N. . .External Schema 1
Distributed Data Management
Introduction
Recapitulation
3 Level Schema Architecture /2
Important concept of DBMSProvides
transparency, i.e. non-visibility, of storage implementationdetailsease of usedecreased application maintenance effortsconceptual foundation for standardsportability
Describes abstraction steps:Logical data independencePhysical data independence
Distributed Data Management
Introduction
Recapitulation
Data Independence
Logical data independence: Changes to the logical schemalevel must not require a change to an application (externalschema) based on the structure.
Physical data independence: Changes to the physicalschema level (how data is stored) must not require a changeto the logical schema.
Distributed Data Management
Introduction
Recapitulation
Architecture of a DBS
Schema architecture roughly con-forms to general architecture of adatabase systems
Applications accessdatabase using specificviews (external schema)
The DBMS provides accessfor all applications using thelogical schema
The database is stored onsecondary storage accordingto an internal schema
Application n. . .
Database
DBMS
Application 1
Distributed Data Management
Introduction
Recapitulation
Client Server Architecture
Schema architecture does notdirectly relate to client server archi-tecture (communication/networkarchitecture)
Clients may run severalapplications
Applications may run onseveral clients
DB servers may bedistributed
...
DB Server
. . .
Database
Client 1 Client n
Distributed Data Management
Introduction
Recapitulation
The Relational Model
Developed by Edgar F. Codd (1923-2003) in 1970Derived from mathematical model of n-ary relationsColloquial: data is organized as tables (relations) ofrecords (n-tuples) with columns (attributes)Currently most commonly used database modelRelational Database Management Systems (RDBMS)First prototype: IBM System R in 1974Implemented as core of all major DBMS since late ’70s:IBM DB2, Oracle, MS SQL Server, Informix, Sybase,MySQL, PostgreSQL, etc.Database model of the DBMS language standard SQL
Distributed Data Management
Introduction
Recapitulation
Basic Constructs
A relational database is a database that is structuredaccording to the relational database model. It consists of aset of relations.
Relation
. . .
. . .
. . .
. . .R 1 nA A
Tuple
Relation schema
Relation name Attributes
}
Distributed Data Management
Introduction
Recapitulation
Integrity Constraints
Static integrity constraints describe valid tuples of arelation
Primary key constraintForeign key constraint (referential integrity)Value range constraints...
In SQL additionally: uniqueness and not-NULLTransitional integrity constraints describe valid changes toa database
Distributed Data Management
Introduction
Recapitulation
The Relational Algebra
A relational algebra is a set of operations that are closedover relations.
Each operation has one or more relations as inputThe output of each operation is a relation
Distributed Data Management
Introduction
Recapitulation
Relational Operations
Primitive operations:Selection σ
Projection π
Cartesian product(cross product) ×Set union ∪Set difference −Rename β
Non-primitive operationsNatural Join ./
θ-Join and Equi-Join./ϕ
Semi-Join nOuter-Joins = ×Set intersection ∩. . .
Distributed Data Management
Introduction
Recapitulation
Notation for Relations and Tuples
If R denotes a relation schema (set of attributes), than thefunction r(R) denotes a relation conforming to that schema(set of tuples)R and r(R) are often erroneously used synonymously todenote a relation, assuming that for a given relationschema only one relation existst(R) denotes a tuple conforming to a relation schemat(R.a) denotes an attribute value of a tuple for an attributea ∈ R
Distributed Data Management
Introduction
Recapitulation
The Selection Operation σ
Select tuples based on predicate or complex condition
PROJECTPNAME PNUMBER PLOCATION DNUMProductX 1 Bellaire 5ProductY 2 Sugarland 5ProductZ 3 Houston 5Computerization 10 Stafford 4Reorganization 20 Houston 1Newbenefits 30 Stafford 4
σPLOCATION=′Stafford ′(r(PROJECT ))
PNAME PNUMBER PLOCATION DNUMComputerization 10 Stafford 4Newbenefits 30 Stafford 4
Distributed Data Management
Introduction
Recapitulation
The Projection Operation π
Project to set of attributes - remove duplicates if necessaryPROJECT
PNAME PNUMBER PLOCATION DNUMProductX 1 Bellaire 5ProductY 2 Sugarland 5ProductZ 3 Houston 5Computerization 10 Stafford 4Reorganization 20 Houston 1Newbenefits 30 Stafford 4
πPLOCATION,DNUM(r(PROJECT ))
PLOCATION DNUMBellaire 5Sugarland 5Houston 5Stafford 4Houston 1
Distributed Data Management
Introduction
Recapitulation
Cartesian or cross product ×
Create all possible combinations of tuples from the two inputrelations
RA B1 23 4
SC D E5 6 78 9 1011 12 13
r(R)× r(S)
A B C D E1 2 5 6 71 2 8 9 101 2 11 12 133 4 5 6 73 4 8 9 103 4 11 12 13
Distributed Data Management
Introduction
Recapitulation
Set: Union, Intersection, Difference
All require compatible schemas: attribute names anddomainsUnion: duplicate entries are removedIntersection and Difference: ∅ as possible result
Distributed Data Management
Introduction
Recapitulation
The Natural Join Operation ./
Combine tuples from two relations r(R) and r(S) where forall attributes a = R ∩ S (defined in both relations)is t(R.a) = t(S.a).
Basic operation for following key relationshipsIf there are no common attributes result is CartesianproductR ∩ S = ∅ =⇒ r(R) ./ r(S) = r(R)× r(S)
Can be expressed as combination of π, σ and ×r(R) ./ r(S) = πR∪S(σV
a∈R∩S t(R.a)=t(S.a)(r(R)× r(S)))
Distributed Data Management
Introduction
Recapitulation
The Natural Join Operation ./ /2
RA B1 23 45 6
SB C D4 5 66 7 88 9 10
r(R) ./ r(S)A B C D3 4 5 65 6 7 8
Distributed Data Management
Introduction
Recapitulation
The Semi-Join Operation n
Results all tuples from one relation having a (natural) joinpartner in the other relationr(R) n r(S) = πR(r(R) ./ r(S))
PERSONPID NAME
1273 Dylan2244 Cohen3456 Reed
CARPID BRAND
1273 Cadillac1273 VW Beetle3456 Stutz Bearcat
r(PERSON) n r(CAR)
PID NAME1273 Dylan3456 Reed
Distributed Data Management
Introduction
Recapitulation
Other Join Operations
Conditional join: join condition ϕ is explicitly specifiedr(R) ./ϕ r(S) = σϕ(r(R)× r(S))
θ-Join: special conditional join, where ϕ is a singlepredicate of the form aθb with a ∈ R, b ∈ S, andθ ∈ {=, 6=, >, <,≤,≥, . . . }Equi-Join: special θ-Join where θ is =.(Left or Right) Outer Join: union of natural join result andtuples from the left or right input relation which could notbe joined (requires NULL-values to grant compatibleschemas).
Distributed Data Management
Introduction
Recapitulation
Relational Database Management Systems
A Relational Database Management System (RDBMS) is adatabase management system implementing the relationaldatabase model.
Today, most relational DBMS implement the SQL databasemodelThere are some significant differences between therelational model and SQL (duplicate rows, tuple ordersignificant, anonymous column names, etc.)Most distributed and parallel DBMS have a relational(SQL) data model
Distributed Data Management
Introduction
Recapitulation
SQL Data Model
Said to implement relational database modelDefines own terms
Table
. . .
. . .
. . .
. . .R 1 nA A
Row
Table head
Table name Columns
}
Some significant differences exist
Distributed Data Management
Introduction
Recapitulation
Structured Query Language
Structured Query Language (SQL): declarative languageto describe requested query resultsRealizes relational operations (with the mentioneddiscrepancies)Basic form: SELECT-FROM-WHERE-block (SFW)
SELECT FNAME, LNAME, MGRSTARTDATEFROM EMPLOYEE, DEPARTMENTWHERE SSN=MGRSSN
Distributed Data Management
Introduction
Recapitulation
SQL: Selection σ
σDNO=5∧SALARY>30000(r(EMPLOYEE))
SELECT *FROM EMPLOYEEWHERE DNO=5 AND SALARY>30000
Distributed Data Management
Introduction
Recapitulation
SQL: Projection π
πLNAME ,FNAME(r(EMPLOYEE))
SELECT LNAME,FNAMEFROM EMPLOYEE
Difference to RM: does not remove duplicatesRequires additional DISTINCT
SELECT DISTINCT LNAME,FNAMEFROM EMPLOYEE
Distributed Data Management
Introduction
Recapitulation
SQL: Cartesian Product ×
r(EMPLOYEE)× r(PROJECT )
SELECT *FROM EMPLOYEE, PROJECT
Distributed Data Management
Introduction
Recapitulation
SQL: Natural Join ./
r(DEPARTMENT ) ./ r(DEPARTMENT_LOCATIONS)
SELECT *FROM DEPARTMENT
NATURAL JOIN DEPARTMEN_LOCATIONS
Distributed Data Management
Introduction
Recapitulation
SQL: Equi-Join
r(EMPLOYEE) ./SSN=MGRSSN r(DEPARTMENT )
SELECT *FROM EMPLOYEE, DEPARTMENTWHERE SSN=MGRSSN
Distributed Data Management
Introduction
Recapitulation
SQL: Union
r(R) ∪ r(S)
SELECT * FROM RUNIONSELECT * FROM S
Other set operations: INTERSECT, MINUSDoes remove duplicates (in compliance with RM)If duplicates required:
SELECT * FROM RUNION ALLSELECT * FROM S
Distributed Data Management
Introduction
Recapitulation
SQL: Other Features
SQL provides several features not in the relational algebraGrouping And Aggregation Functions, e.g. SUM, AVG,COUNT, . . .Sorting
SELECT PLOCATION, AVG(HOURS)FROM EMPLOYEE, WORKS_ON, PROJECTWHERE SSN=ESSN AND PNO=PNUMBERGROUP BY PLOCATIONHAVING COUNT(*) > 1ORDER BY PLOCATION
Distributed Data Management
Introduction
Recapitulation
SQL DDL
Data Definition Language to create, modify, and deleteschema objects
CREATE DROP ALTER TABLE mytable ( id INT, ...)DROP TABLE ...ALTER TABLE ...CREATE VIEW myview AS SELECT ...DROP VIEW ...CREATE INDEX ...DROP INDEX ......
Distributed Data Management
Introduction
Recapitulation
Simple Integrity Constraints
CREATE TABLE employee(ssn INTEGER,lname VARCHAR2(20) NOT NULL,dno INTEGER,...FOREIGN KEY (dno)
REFERENCES department(dnumber),PRIMARY KEY (ssn)
)
Additionally: triggers, explicit value domains, ...
Distributed Data Management
Introduction
Recapitulation
SQL DML
Data Manipulation Language to create, modify, anddelete tuples
INSERT INTO (<COLUMNS>) mytable VALUES (...)
INSERT INTO (<COLUMNS>) mytable SELECT ...
UPDATE mytableSET ...WHERE ...
DELETE FROM mytableWHERE ...
Distributed Data Management
Introduction
Recapitulation
Other Parts of SQL
Data Control Language (DCL):GRANT, REVOKE
Transaction management:START TRANSACTION, COMMIT, ROLLBACK
Stored procedures and imperative programming conceptsCursor definition and management
Distributed Data Management
Introduction
Recapitulation
Transactions
Sequence of database operationsRead and write operationsIn SQL sequence of INSERT, UPDATE, DELETE, SELECTstatements
Build a semantic unit, e.g. transfer of an amount from onebank account to anotherHas to conform to ACID properties
Distributed Data Management
Introduction
Recapitulation
Transactions: ACID Properties
Atomicity means that a transaction can not be interruptedor performed only partially
TXN is performed in its entirety or not at allConsistency to preserve data integrity
A TXN starts from a consistent database state and endswith a consistent database state
IsolationResult of a TXN must be independent of other possiblyrunning parallel TXNs
Durability or persistenceAfter a TXN finished successfully (from the user’s view) itsresults must be in the database and the effect can not bereversed
Distributed Data Management
Introduction
Recapitulation
Functional Dependencies
A functional dependency (FD) X→Y within a relationbetween sets r(R) of attributes X ⊆ R and Y ⊆ R exists, iffor each tuple the values of X determine the values for Yi.e.
∀t1, t2 ∈ r(R) : t1(X ) = t2(X ) ⇒ t1(Y ) = t2(Y )
Distributed Data Management
Introduction
Recapitulation
Derivation Rules for FDs
R1 Reflexivity if X ⊇ Y =⇒ X→YR2 Accumulation {X→Y} =⇒ XZ→YZR3 Transitivity {X→Y , Y →Z} =⇒ X→ZR4 Decomposition {X→YZ} =⇒ X→YR5 Unification {X→Y , X→Z} =⇒ X→YZR6 Pseudotransitivity {X→Y , WY →Z} =⇒ WX→Z
R1-R3 known as Armstrong-Axioms (sound, complete)
Distributed Data Management
Introduction
Recapitulation
Normal Forms
Formal criteria to force schemas to be free of redundancyFirst Normal Form (1NF) allows only atomic attributevalues
i.e. all attribute values ar of basic data types like integeror string but not further structured like e.g. an array or aset of values
Second Normal Form (2NF) avoids partial dependenciesA partial dependency exist, if a non-key attribute isfunctionally dependent on a real subset of the primary keyof the relation
Distributed Data Management
Introduction
Recapitulation
Normal Forms /2
Third Normal Form (3NF) avoids transitive dependenciesDisallows functional dependencies between non-keyattributes
Boyce-Codd-Normal Form (BCNF) disallows transitivedependencies also for primary key attributes
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Multi-Processor DBMS
In general: DBMS which are able to use multipleprocessors or DBMS-instances to process databaseoperations [Rahm 94]Can be classified according to different criteria
Processors with same or different functionalityAccess to external storageSpatial distributionProcessor connectionIntegrated vs. federated architectureHomogeneous vs. heterogeneous architecture
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Processor Functionality
Assumption: each processor provides the samefunctionalityClassification [Rahm94]
Connection
External Storage
local
loose (close) loose
Shared−NothingShared−DiskShared−Everything
Multi−Processor DBMS
partitioned
local distributed
tight close loose
Spatial
shared
Distribution
Processor
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Access to External Storage
Partitioned accessExternal storage is divided among processors/nodes
Each processor has only access to local storageAccessing different partitions requires communication
Shared accessEach processor has access to full databaseRequires synchronisation
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Spatial Distribution
Locally distributed: DB-ClusterFast inter-processor communicationFault-toleranceDynamic load balancing possibleLittle administration effortsApplication: parallel DBMS, solutions for high availabilty
Remotely distributed: distributed DBS in WAN scenariosSupport for distributed organization structuresFault-tolerant (even to major catastrophes)Application: distributed DBS
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Processor Connection
Tight connectionProcessors share main memoryEfficient co-operationLoad-balancing by means of operating systemProblems: Fault-tolerance, cache coherence, limitednumber of processors (≤ 20)Parallel multi-processor DBMS
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Processor Connection /2
Loose connection:Independent nodes with own main memory and DBMSinstancesAdvantages: failure isolation, scalabilityProblems: expensive network communication, costly DBoperations, load balancing
Close connection:Mix of the aboveIn addition to own main memory there is connection viashared memoryManaged by operating system
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Class: Shared-Everything
Cache
CPU
Shared Main Memory
Cache
CPU CPU
DBMS Buffer
Shared Hard Disks
Cache
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Class: Shared-Everything /2
Simple realization of DBMS
Distribution transparency provided by operating system
Expensive synchronization
Extended implementation of query processing
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Class: Shared-Nothing
CPU
Cache
CPU
Cache
CPU
DBMS−
Cache
DBMS−
Buffer
DBMS−
BufferBuffer
Main Memory Main Memory Main Memory
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Class: Shared-Nothing /2
Distribution of DB across various nodes
Distributed/parallel execution plans
TXN management across participating nodes
Management of catalog and replicas
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Class: Shared-Disk
Cache
CPU
Main Memory
Cache
CPU
gemeinsame Festplatten
Highspeed Communication
DBMS−
Buffer
DBMS−
Puffer
DBMS−
Buffer
CPU
Main Memory Main Memory
Cache
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Class: Shared-Disk /2
Avoids physical data distribution
No distributed TXNs and query processing
Requires buffer invalidation
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Integrated vs. Federated DBS
Integrated:Shared database for all nodes one conceptual schemaHigh distribution transparency: access to distributed DB vialocal DBMSRequires co-operation of DBMS nodes restrictedautonomy
Federated:Nodes with own DB and own conceptual schemaRequires schema integration global conceptual schemaHigh degree of autonomy of nodes
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Integrated vs. Federates DBS /2
homogeneous
Multi−Processor−DBS
Shared−Disk Shared−Nothing
integrated integrated federated
heterogeneoushomogeneous homogeneous
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Centralized vs. De-centralized Coordination
Centralized:Each node has global view on database (directly of viamaster)Central coordinator: initiator of query/transaction → knowsall participating nodesProvides typical DBS properties (ACID, resultcompleteness, etc.)Applications: distributed and parallel DBS
Limited availability, fault-tolerance, scalabilty
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Criterion: Centralized vs. De-centralized Coordination/2
De-centralized:No global view on schema → peer knows only neighborsAutonomous peers; global behavior depends on localinteractionCan not provide typical DBMS propertiesApplication: P2P systems
Advantages: availability, fault-tolerance, scalabilty
Distributed Data Management
Introduction
Classification of Multi-Processor DBMS
Comparison
Parallel Distributed FederatedDBS DBS DBS
High TXN rates ↑ →↗ →Intra-TXN-Parallelism ↑ →↗ ↘→Scalability ↗ →↗ →Availability ↗ ↗ ↘Geogr. Distribution ↘ ↗ ↗Node Autonomy ↘ → ↗DBS-Heterogeneity ↘ ↘ ↗Administration → ↘ ↘↓