cse441w database systems · database systems 45 peer-to-peer topology this is the direct evolution...
Post on 19-Nov-2020
5 Views
Preview:
TRANSCRIPT
A.R. Hurson323 CS Buildinghurson@mst.edu
CS5300Database Systems
Database System Architecture
Database Systems
Note, this unit will be covered in threelectures. In case you finish it earlier, thenyou have the following options:
1) Take the early test and start CS5300.module102) Study the supplement module
(supplement CS5300.module9)3) Act as a helper to help other students in
studying CS5300.module9Note, options 2 and 3 have extra credits as noted in courseoutline.
2
Database Systems
Database SystemsGlossary of prerequisite topics
Familiar with the topics?No Review
CS5300.module9background
Yes
Remedial actionYes
NoTake the Module
Take Test
Pass?
Glossary of topics
Familiar with the topics?Yes
Pass?
Take Test
Yes
Options
Lead a group of students in this module (extra credits)?
Study more advanced related topics (extra credits)?
Study next module?
No
{
Extra Curricular activities
Enforcement of background
{Current Module
At the end: take exam, record the
score, impose remedial action if not
successful
No
3
Database Systems
You are expected to be familiar with:Centralized database configuration
If not, you need to studyCS5300.module9.background
4
Database Systems
Database Systems
In this module, we will:Define a simple database space,Base on the parameters of this space, define:Centralized data bases,Client-server environment,Peer-to-peer configuration,Distributed databases, andParallel databases
5
Database Systems
Database Systems
6
Different parameters can be used to classifythe architecture of data base systems.
We classify data base systems along thefollowing three parameters: Physical infrastructure Services Distribution
Database Systems
Database Systems
7
Physical infrastructureThis dimension refers to the underlying platform
composed of access devices(homogeneous/heterogeneous) interconnectedthrough different communication medium:Processing devices
– Powerful Machines– Portable Devices
Network Architecture– Land-based Connection– Wireless Connection
Database Systems
8
ServicesAlong this dimension we can distinguish
two approaches:There is no distinction between services,Distinction between User processes and Data
Processes.For example, in a client-server model, some
tasks are executed on the server system andsome tasks are executed on client systems.
Database Systems
9
DistributionAlong this dimension we can distinguish :Distribution of dataDistribution of control, andDistribution of processes
Database Systems
Physical infrastructureAs noted earlier, two elements constitute this
dimension;Underlying computing platform, andThe communication medium that allows
computers to communicate with each other.
10
Database Systems
11
Physical Infrastructure: Networking Networking represents the
interconnectivity among the elements ofthe system. It also shows the division ofwork.
Database Systems
12
Physical Infrastructure: Parallelism Parallelism within a system allows activities to be
sped up ─ faster response time, higher throughput. Note, here we used term “parallelism” as ageneric term that refers to the ability of executingmore than one tasks at a time, also note that, wedid not define the granularity of the task at thispoint.
Requests can be processed in a way that exploitsthe parallelism offered by the underlying system.
Database Systems
13
Increased use of parallelism anddata/processing distribution are the importanttrends in database design and implementation.There are several motivations for this:Performance,Distributed access to data,Increased availability,Increased reliability.
Database Systems
14
Performance metricsResponse Time (Execution time, Latency) ─
The time elapse between the start and thecompletion of an event.Throughput ─ The number of tasks that can
be completed during a given time interval.
Database Systems
15
Performance metricsScaleup ─ Handling large task by increasing the
degree of parallelism. It is the ability to processlarger tasks in the same amount of time byproviding more resources.Speedup ─ Running a task in less time by
increasing the degree of parallelism.
S = Execution time on Original MachineExecution time on Larger Machine
Database Systems
16
Performance metricsPower Consumption ─ Becomes an important
performance metric when we use mobile wirelessaccess devices.Network Connectivity ─ Becomes of interest when
connectivity is through wireless medium.Data reliability and integrity ─ Becomes even more
of concern at the presence of mobility and wirelesscommunication.
Database Systems
17
DistributionAlong this dimension we can distinguish distribution of:
Data, Processing, and Controlspanning over multiple geographically separated sources.
Distribution of control is also referred to as the autonomy.Data resides where it is generated or needed most:
Distributed data should be accessible by other sites. Data distribution also implies data duplication/data replication.
Database Systems
18
Centralized database systemsCentralized database systems are those that
run on a single computer platform and do notinteract with other computer systems.The underlying platform could range from a
single-user database system running onpersonal computer to high-performancedatabase system running on high-end serversystems.
Database Systems
19
Centralized database systemsWithin the scope of a centralized database systems
we distinguish two configurations:Single-user configurationMulti-user configuration
Database systems designed for single-userconfiguration do not provide many facilities neededfor a multi-user system ─ e.g., concurrency control,security, privacy.
Database Systems
20
Client-Server topologyClient-Server topology is the direct result of the
advances in technology and a step towardsdistributed topology.It is a two-level topology based on a simple general
idea: distinguish the needed functionalities anddivide them into two coarse groups:Server functions (The back end functions) ─ Query
processing, Query optimization, concurrency control,recovery.Client functions (The front end functions) ─ Report
writer, Graphical User Interface facilities.
Database Systems
Client-Server topologyIn comparison to the centralized configuration:The personal computer (Client) assumes the user-
interface functionality.Centralized system (Sever) satisfies requests generated
by the clients.
21
Database Systems
22
Client-Server topologyClient-Server topology has functionality
split between a server and multiple clients.Client
Server(s)
Client Client• • •
Network
Database Systems
The client parses a query and decomposes itinto a number of independent site queries.Each site query is sent to the appropriate serversite.
Each server processes the local query and sendsthe result to the client site.
The client site combines the results of the sub-queries to generate the final result.
23
Database Systems
24
Client-Server topologyPopularity of this topology is due to many
factors, including:Simplicity of implementation ─ distinct
separation of functionality,Higher degree of hardware utilization at the
server side, andOffering a user friendly environment.
Database Systems
Summary (last lecture)Database spacePerformance metricsCentralized DatabasesClient-server platform
Read Database System Architecture — A Walk throughTime: From Centralized Platform to Mobile Computing(Keynote Address -ISSADS05)
25
Database Systems
26
Client-Server topologyIt can be further grouped into:Multiple client-single server, andMultiple client-multiple server
– A client is communicating with a unique“home server”, or
– A client manages its communication to theappropriate server(s).
Database Systems
27
Client-Server topologyServer system can be categorized as:Transaction Server ─ thin client, andData Server ─ fat client
Database Systems
Choice between thick client and thin clientThe very same application may run at many client
sites,Large amount of trust is required between the
server and the client,Scalability:
number of clients,
number of databases.Trend of technology (discussion and justification).
28
Database Systems
29
Client-server topology ─ Transaction serversA typical transaction server consists of multiple
processes accessing data in shared memory:Server processes ─ these are processes that receive user
queries, execute them, and send the result back.Lock manager process ─ this process implements lock
manager functionality, lock grant, lock release, anddeadlock detection.Database writer process ─ These are processes that
output modified buffer blocks back to disk.
Database Systems
30
Client-server topology ─ Transaction serversLog writer process ─ this process outputs log
records from the log record buffer to a stablestorage.Checkpoint process ─ this process performs
periodic checkpoints.Process monitor process ─ this process monitors
other processes and if any of them fails, takesrecovery actions for the process.
Database Systems
31
Client-server topology ─ Transaction serversThe shared memory contains all shared data:Buffer poolLock TableLog bufferCached query plans
Since multiple server processes may access sharedmemory, mutual exclusion must be ensured on thelock table.If a lock cannot be obtained, the lock request code
keeps monitoring the lock table to check when thelock has been granted.
Database Systems
32
Client-server topology ─ Data Servers
This configuration is used, where:There is a high-speed connection between
clients and server (why?) ─ Local area network,The client machines are powerful, andTask being executed are computation intensive.
Note, this configuration requires full backend functionality at the client side.
Database Systems
33
Client-server topology ─ Data Servers
In this configuration communicationcost between clients and server is notmuch higher than memory referencesin transaction server configuration.This brings out several interestingissues:
Database Systems
34
Client-server topology ─ Data Servers
Data granularity ─ coarse vs. fine granularity– For communication– For locking
Data Caching ─ Cache coherency
Lock Caching
Database Systems
35
Client-server topology ─ Data Servers
Data Granularity ─ Size of datacommunicated between clients andserverCommunication overhead of message passing
justifies coarse data granularity. Specially, inapplications with high data locality.Coarse data granularity may have adverse effect
on throughput ─ Lock on a coarse data blockmay block other clients, unnecessarily.
Database Systems
36
Client-server topology ─ Data ServersData Caching ─ data that are transferred to a
client can be cached for future use. As aresult, successive transactions at the sameclient may be able to make use of the cacheddata.Data Caching brings out the issue of cache
coherency. As a result, it should beguaranteed that all copies of the same dataitems are synchronized.
Database Systems
37
SummaryParameters influencing architecture of a
database.Performance metricsCentralized databases.Client/server modelsThin clientFat clientComparative analysis
Database Systems
Single-tierTwo-tier client-server architectureThree-tier client-server architecture
38
Database Systems
Single-tierApplication typically runs on a main frame
and users access it through “dumbterminals”.It is easy to maintain by a central
administrator, however, it is not scalable.
39
Database Systems
Two-tier client-server architectureAt the client site then there is no “dumb
terminal”.
40
Client
Server
Client Client• • •
Network
Database Systems
Three-tier client-server architecture
41
Gui, Web interface Client
Application programs,Web pages
Database Management
System
Application serverOr
Web server
Database server
Database Systems
Three-tier client-server architectureClient tier (presentation tier) – natural
interface with user (thin client),Middle tier – application logic executes
here,Database server tier (Data management tier)
– data base management system resides.
42
Database Systems
In an internet shopping scenario:The customer should be able to browse the catalog and
make a purchase,Before the sale, the customer has to go through several
steps: Add an item into shopping cart, Provide shipping address and credit card number, Confirm the sale. Controlling the flow among these steps and remembering already
executed steps is done at the middle tier level.
43
Database Systems
Advantages of three-tier architecture:Accommodates heterogeneous systems,Supports thin clients (technology),Allows integrated data access,It is scalable with the number of clients.
44
Database Systems
45
Peer-to-Peer topologyThis is the direct evolution of the client-server
topology. Note that in a Client-server topologyfunctionality is split into user processes and dataprocesses.User processes handle interaction with the user and
data processes handle interaction with data.In a Peer-to-Peer topology, one should expect to
find both class of processes placed on everymachines.
Database Systems
46
Peer-to-Peer topologyFrom a data logical perspective, Client-
server topology and Peer-to-Peer topologyprovide the same view of data ─ datadistribution transparency. The distinctionlies in the architectural paradigm that is usedto realize this level of transparency.
Database Systems
47
Parallel SystemsParallel configurations are aimed at improving the
processing and I/O speeds by using multiple processingunits and I/O devices in parallel.
Distribute task among several processors at a finergranularity, say relative to client-server paradigm.
Within the scope of parallelism, we can talk about:Coarse grain parallelismFine grain parallelism
Database Systems
48
Parallel SystemsIncrease throughput by processing many
small tasks in parallel,Decrease response time by breaking out a
task into subtasks and parallel execution ofsubtasks.
Database Systems
49
QuestionsDefine linear speedup.Define sublinear speedup.Define linear scaleup.Define sublinear scaleup.Is there any relationship between speedup and
scaleup?For a database application, which one is of more
important speedup or scaleup.
Database Systems
50
Parallel Systems
Resources
Spee
dup
Linear speedup
Sublinear speedup
Database Systems
51
Parallel Systems
Problem size
Linear scaleup
Sub-linear scaleup
Database Systems
52
Parallel SystemsThere are several architectural models
for parallel systems:Shared Memory (tightly coupled)Shared Disk (loosely coupled)Shared nothingHierarchical
Database Systems
53
Parallel Systems ─ Shared MemoryAll processors share a common global memory
P
P
P
P
P
M
Database Systems
54
Parallel Systems ─ Shared MemoryProcessors and disks have access to a common
memory, via a bus or an interconnection network.In general, each processor has a private cache.Efficient communication between processors, via
common memory address space.Not scalable, communication network becoming the
bottleneck.
Database Systems
55
Parallel Systems ─ Shared DiskAll processors share a common set of disks
P
P
P
P
P
M
M
M
M
M
Database Systems
56
Parallel Systems ─ Shared DiskProcessors have direct access to all disks via an
interconnection network.Each processor has its own private memory.Relative to shared memory organization, this
configuration offers:More fault tolerance andHigher memory bandwidth
Disk subsystem can become more fault tolerance andfaster by application of RAID architecture.
Database Systems
57
Parallel Systems ─ Shared DiskSystem is more scalable than the shared
memory configuration.Degree of scalability is limited due to the
bottleneck at the interconnection networkbetween disk subsystem and processor.Communication among processors is slow ─
message passing.
Database Systems
Summary (last lecture)Different classes of client-server topologyPeer-to-Peer topologyParallel systemsShared MemoryShared Disk
Final exam
58
Database Systems
59
Parallel Systems ─ Shared Nothing The processors share neither a common memory nor common
disks
PP
P
MM
M
P M
Database Systems
60
Parallel Systems ─ Shared NothingThis configuration is scalable.Communication among processors and non-
local disk accesses are expensive.
Database Systems
61
Parallel Systems ─ HierarchicalA hybrid of the other models
PPPPP
MPPPPP
MPPPPP
M
Database Systems
62
Parallel Systems ─ HierarchicalAt higher level system acts as a shared nothing
organization.Each node of the system at the lower level can
be a shared memory and/or shared disk system.
Database Systems
63
Parallel Database Management SystemsDatabase management systems developed on
parallel systems are called parallel databasemanagement system.
Database Systems
Parallel database system seeks to improveperformance through parallelization ofoperations. In another words, paralleldatabase system is motivated byperformance improvement.
64
Database Systems
65
Distributed SystemsDistributed databases bring the advantages of
distributed computing to the databasemanagement domain.A distributed system is a collection of
processors, not necessarily homogeneous,interconnected by a computer network.
Database Systems
66
Distributed Database SystemsDatabase is stored on several computers and
computers communicate with each otherthrough various communication media.Computers do not share resources ─ disks,
memory, processor, …
Database Systems
67
Distributed Systems
PP M
Site A
PM
Site C
PPPPP
M
Site B
network
Database Systems
68
Distributed Database Systems A distributed database is a collection of multiple logically
interrelated databases distributed over a computer network ─related data sources: Are closer to the application domain (s) that uses it.Might be replicated ─ to improve performance. Are split (Fragmented), horizontally and/or vertically, and distributed─ to balance the load and improve performance.
A distributed database management system is a software systemthat manages a distributed database while making the distributiontransparent to the user.
Database Systems
69
QuestionsWhat is the RAID?What are the major differences between shared nothing
and distributed configurations.Distributed systems are typically geographically separated, are
separately administrated, have slower interconnection, andthere is a distinction between local processes and globalprocesses.
With respect to the databases, in distributed databases, data isdistributed while in shared nothing configuration, data is notdistributed.
Database Systems
70
Distributed Database SystemsIn comparison to parallel systems in which
processors are tightly coupled and constitute asingle data base system, a distributed data basesystem is a collection of loosely coupledsystems that share no physical components.In general distributed databases can be
classified as:Homogeneous databasesHeterogeneous databases
Database Systems
71
Distributed Database SystemsThere are several reasons for building
distributed database systems:Sharing dataAutonomyIncreased reliability and availabilityImproved performanceEase of expansion
Database Systems
72
Distributed Database SystemsData distribution is an effort to improve
performance:To reduce communication costs and hence to reduce
response time,To maintain more control and enforce better
security,To improve quality of service in case of network
failure.
Database Systems
In a distributed database system, data is physicallystored across several sites and each site is typicallymanaged by a database management systemcapable of running independent of the other sites.
Data distribution is motivated by:Increased availability, andDistributed access to data – locality in access patterns
73
Database Systems
Distributed Database System :Increased reliability and availability –
reliability means probability that a system isrunning at a certain time point, availabilitymeans probability that a system is continuouslyavailable during a time interval.Improved performance, andEase of expansion.
74
top related