databasesystemer e2002lene pries-hejedata structure, storage and processing architectures...

83
Lene Pries-Heje Data Structure, Storage and Processing Architectures Databasesystemer E2002 Databasesystemer Data Structure, Storage and Processing Architectures

Post on 20-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Databasesystemer

Data Structure, Storage and Processing Architectures

Page 2: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Learning objectives

• Be able to explain what a database architecture is and what goals the design strives to achieve.

• Know different data storage structures and storage devices, and when to use them.

• Know the 4 basic architectures, and the differences between them.

Page 3: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Database Architectures and Implementations

We shape our buildings: thereafter they shape us

Winston Churchill

Page 4: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Database Architectures

• Database architecture is a design for the storage and processing of Data.

Page 5: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Goals

• An architecture should– Respond to queries in a timely manner– Minimize the cost of processing data– Minimize the cost of storing data– Minimize the cost of data delivery

• These objectives can be conflicting

Page 6: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

ANSI SPARC

Storage view

Conceptuallevel

Internallevel

Externallevel

Database designer's view

Userview

Userview

Userview

Page 7: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Data Structures

• The goal is to minimize disk accesses• Disks are relatively slow compared to main

memory– Writing a letter compared to a telephone call

• Disks are a bottleneck• Appropriate data structures can reduce disk

accesses

Page 8: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Database access

DBMSFile

managerDisk

manager

Recordrequest

Pagerequest

Readpage

command

Pageread

Pagereturned

Recordreturned

Page 9: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Disks

• Data stored on tracks on a surface• A disk drive can have multiple surfaces • Rotational delay

– Waiting for the physical storage location of the data to appear under the read/write head

– Around 5 msec for a magnetic disk– Set by the manufacturer

• Access arm delay– Moving the read/write head to the track on which the storage

location can be found.– Around 10 msec for a magnetic disk

Page 10: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

How can you minimize data access times?

• Rotational delay is fixed by the manufacturer

• Access arm delay can be reduced by storing files on– The same track– The same track on each surface

• A cylinder

Page 11: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Clustering

• Records that are often retrieved together should be stored together

• Intra-file clustering– Records within the one file

• A sequential file

• Inter-file clustering– Records in different files

• A nation and its stocks

STOCK

*stock codefirm namestock price

stock quantitystock dividend

stock PE

NATION

*nation codenation name

exchange rate

Page 12: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

A disk

Disk armDisk head

Arm movementRotation

CylinderTracks, bloks and sectors

Page 13: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Disk manager

• Manages physical I/O

• Sees the disk as a collection of pages

• Has a directory of each page on a disk

• Retrieves, replaces, and manages free pages

DISK

*diskid

PAGE

*pageid

Page 14: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

File manager

• Manages the storage of files• Sees the disk as a collection of stored files

• Each file has a unique identifier• Each record within a file has a unique record

identifier

FILE

*fileid

RECORD

*recordid

DISK

*diskid

Page 15: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

File manager's tasks

• Create a file

• Delete a file

• Retrieve a record from a file

• Update a record in a file

• Add a new record to a file

• Delete a record from a file

Page 16: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Sequential retrieval

• Consider a file of 10,000 records each occupying 1 page

• Queries that require processing all records will require 10,000 accesses– e.g., Find all items of type 'E'

• Many disk accesses are wasted if few records meet the condition

Page 17: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Indexing• An index is a small file that has data for one field of a

file

• Indexes reduce disk accesses

CCCCEEFNNN

ITEMTYPEINDEX

ITEMTYPE ITEMNOITEM

ITEMNAME ITEMTYPE ITEMCOLOR12345678910

EENNNCCCCF

Pocket knife–NilePocket knife–ThamesCompassGeo positioning systemMap measureHat–polar explorerHat–polar explorerBoots–snakeproofBoots–snakeproofSafari chair

BrownBrown–––RedWhiteGreenBlackKhaki

Page 18: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Querying with an index

• Read the index into memory

• Search the index to find records meeting the condition

• Access only those records containing required data

• Disk accesses are substantially reduced when the query involves few records

Page 19: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Maintaining an index

• Adding a record requires at least two disk accesses– Update the file– Update the index

• Trade-off– Faster queries– Slower maintenance

Page 20: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Using indexes

• Sequential processing of a portion of a file– Find all items with a type code in the range 'E' to 'K'

• Direct processing– Find all items with a type code of 'E' or 'N'

• Existence testing– Determining whether a record meeting the criteria

exists without having to retrieve it

Page 21: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Multiple indexes

• Find red items of type 'C'– Both indexes can be searched to identify records to

retrieveITEMCOLORINDEX

ITEMCOLOR Diskaddress

Black d9Brown d1Brown d2Green d8Khaki d10Red d6White d7– d3– d4– d5

ITEMTYPEINDEX

ITEMTYPE Diskaddress

C d6C d7C d8C d9E d1E d2F d10N d3N d4N d5

Page 22: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Multiple indexes

• Indexes are also called inverted lists– A file of record locations rather than data

• Trade-off– Faster retrieval– Slower maintenance

Page 23: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Sparse indexes• Taking advantage of the physical sequence of a file• Assume 2 records per page

• Tradeoffs– Fewer disk accesses required to read the index – Existence tests not possible

246810

ITEMNOINDEX

ITEMNO ITEMNOITEM

ITEMNAME ITEMTYPE ITEMCOLOR12345678910

EENNNCCCCF

Pocket knife–NilePocket knife–ThamesCompassGeo positioning systemMap measureHat–polar explorerHat–polar explorerBoots–snakeproofBoots–snakeproofSafari chair

BrownBrown–––RedWhiteGreenBlackKhaki

page p

page p + 1

page p + 2

page p + 3

page p + 4

Page 24: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

B-tree

• A form of inverted list• Frequently used for relational systems• Basis of IBM’s VSAM underlying DB2• Supports sequential and direct accessing• Has two parts

– Sequence set– Index set

Page 25: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

B-tree (B+ tree)

• Sequence set is a single level index with pointers to records

• Index set is a tree-structured index to the sequence set

1 4 5 6 19 20 26 28 29 32 33 34 40 42 46 50 54 57 63 67 82 86 93 94 95 96 98

•• • •• • •• •

•• •29

5 20

57

34 46 82 94

Index set

Sequence set

Page 26: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

B+ tree• The combination of index set (the B-tree) and the sequence

set is called a B+ tree• The number of data values and pointers for any given node

are not restricted• Free space is set aside to permit rapid expansion of a file

• Tradeoffs– Fast retrieval when pages are packed with data values

and pointers– Slow updates when pages are packed with data values

and pointers

Page 27: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Hashing

• A technique for reducing disk accesses for direct access

• Avoids an index

• Number of accesses per record can be close to one

• The hash field is converted to a hash address by a hash function

Page 28: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Hashing

hash address = remainder after dividing SSN by 10000

417-03-4356532-67-4356891-55-4356

043-15-1893

281-27-1502

417-03-4356 532-67-4356

891-55-4356

Disk address

4356

1893

1502

SSN

Synonym chain

043-15-1893

281-27-1502

Overflow areaFile space

• •

}

}

}

Page 29: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Shortcomings of hashing

• Different hash fields convert to the same hash address– Synonyms

– Store the colliding record in an overflow area

• Long synonym chains degrade performance• There can be only one hash field• The file can no longer be processed sequentially

Page 30: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Linked list

• A structure for inter-file clustering

• An example of a parent/child structure

IndooroopillyRuby

•NarembeenPlum

•QueenslandDiamond

•MinnesotaGold

•GeorgiaPeach

Australia•

USA•• •

Page 31: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Linked lists

• There can be two-way pointers, forward and backward, to speed up deletion

• Each child can have a pointer to its parent

Page 32: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Bit map indexes

• Uses a single bit, rather than multiple bytes, to indicate the specific value of an field– Color can have only three values, so use three

bitsItemcode Color Code Disk

addressRed Green Blue A N

1001 0 0 1 0 1 d1

1002 1 0 0 1 0 d2

1003 1 0 0 1 0 d3

1004 0 1 0 1 0 d4

Page 33: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Bit map indexes

• A bit map index saves space and time compared to a standard index

Itemcode Color

Char(8)

Code

Char(1)

Disk address

1001 Blue N d1

1002 Red A d2

1003 Red A d3

1004 Green A d4

Page 34: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Join indexes

• Speed up joins by creating an index for the primary key and foreign key pairNATIONINDEX

STOCKINDEX

NATCODE Disk address NATCODE Disk addressUK d1 UK d101USA d2 UK d102

UK d103USA d104USA d105

JOIN INDEXNATIONdisk address

STOCKdisk address

d1 d101d1 d102d1 d103d2 d104d2 d105

Page 35: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

R-trees• Used to store n-dimensional data (n>=2)

– Minimum bounding rectangle concept

A

BC

D

EX Y

D E Sequence set

Index set

A B C

X Y

Page 36: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

R-tree searching

• Search for the object covered by the shaded region

A

BC

D

EX Y

D E Sequence set

Index set

A B C

X Y

Page 37: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Data storage devices

• What data storage device will be used for– On-line data

• Access speed• Capacity

– Back-up files• Security against data loss

– Archival data• Long-term storage

Page 38: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Key variables

• Data volume

• Data volatility

• Access speed

• Storage cost

• Medium reliability

• Legal standing of stored data

Page 39: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Magnetic technology

• Up to 50% of IS hardware budgets are spent on magnetic storage

• A $50 billion market

• The major form of data storage

• A mature and widely used technology

• Strong magnetic fields can erase data

• Magnetization decays with time

Page 40: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Fixed disks

• Sealed, permanently mounted

• Highly reliable

• Access times of 4-10 msec

• Transfer rates as high as 160 Mbytes per second

• Capacities of Gbytes to Tbytes

Page 41: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

RAID

• Redundant arrays of inexpensive or independent drives

• Exploits economies of scale of disk manufacturing for the personal computer market

• Can also give greater security• Increases a systems fault tolerance• Not a replacement for regular backup

Page 42: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Mirroring

Data

Parity

Page 43: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Mirroring• Write

– Identical copies of a file are written to each drive in an array

• Read– Alternate pages are read simultaneously from each drive– Pages put together in memory– Access time is reduced by approximately the number of disks in the array

• Read error– Read required page from another drive

• Tradeoffs– Reduced access time– Greater security– More disk space

Page 44: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Striping

Data

Parity

Page 45: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

StripingThree drive model

• Write– Half of file to first drive– Half of file to second drive– Parity bit to third drive

• Read– Portions from each drive are put together in memory

• Read error– Lost bits are reconstructed from third drive’s parity data

• Tradeoffs– Increased data security– Less storage capacity than mirroring– Not as fast as mirroring

Page 46: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

RAID levels

• All levels, except 0, have common features

• The operating system sees a set of physical drives as one logical drive

• Data are distributed across physical drives

• Parity is used for data recovery

Page 47: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

RAID levels• Level 0

– Data spread across multiple drives– No data recovery when a drive fails

• Level 1– Mirroring– Critical non-stop applications

• Level 3– Striping

• Level 5– A variation of striping– Parity data is spread across drives– Less capacity than level 1– Higher I/O rates than level 3

Page 48: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

RAID 5

Data

Parity

Page 49: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Magnetic technology

• Removable magnetic disk

• Floppy disk

• Magnetic tape

• Magnetic tape cartridge

• Mass storage

Page 50: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Solid State

• Arrays of memory chips

• 10 times faster than magnetic storage

• $3 per Mbyte– Magnetic disk is about 1 cents per Mbyte

• Stock trading and video-streaming applications

Page 51: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Optical technology

• A more recent development

• Use a laser for reading and writing data

• High storage densities

• Low cost

• Direct access

• Long storage life

• Not susceptible to head crashes

Page 52: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Optical technology

Optical storage

WORMwrite once–ready many

CD-ROMwrite once–read many

Magneto-opticalwrite many–read many

DVDmultiple formats

Page 53: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Magneto-optical disk

• High capacity read-write medium• 3.5" disk can store up to 256 M bytes• Not as fast as fixed disk

– 10 msec access time

• Compact• Reliable• Suitable for data transfer, backup, and

archival purposes

Page 54: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Digital Versatile Disc (DVD)

• The same physical size as a CD-ROM but up to 28 times the capacity (i.e., 17 Gbytes)

• DVD drives are likely to have transfer rates of around 2.76 M bytes/sec and access times of 150 msec.

• DVD-ROM drive will play both audio CD's and CD-ROM's.• Read-only versions

– DVD-Video (movies)– DVD-ROM (software)– DVD-Audio (songs)

• DVD-R– Recordable (write once, read many)

• DVD-RAM– Erasable (write many, read many)

Page 55: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

SAN

• Storage area network• Supports dynamic sharing of large amounts of

data, regardless of operating system or application• Communicates via pipelines that consist of an

interface called Fibre Channel– A high speed data connection between computer

devices

• Prices vary from $20-30,000 to 5 million

Page 56: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Storage life

PermanentHigh qualityNewspaper

PaperArchival quality (silver)

Medium-term filmMicrofilm

CD-R (recordable)CD-ROM (read only)

Optical disk Quarter-inch tape

VHS tapeHalf-inch tape cartridge

Half-inch reel-to-reelMagnetic tape

1 10 100 500

Storage life in years of high quality brands

Page 57: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Data Processing Architectures

The difficulty is in the choice

George Moore, 1900

Page 58: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Architecture

• ANSI/SPARC architecture was before personal computers, now there are options for where data are stored and processed

Page 59: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

The 4 basic Architectures

Remotejob

entry

Host/terminal

Client/server

Personaldatabase

Dataprocessing

Remote

Local

Local Remote

Data storage

Page 60: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Remote job entry• Local storage

– Often cheaper– Maybe more secure

• Remote processing• Useful when a personal computer is:

– too slow– has insufficient memory– software is not available

• Some local processing– Data preparation

Page 61: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Personal database• Local storage and processing• Advantages

– Personal computers are cheap– Greater control– Friendlier interface

• Disadvantages– Replication of applications and data– Difficult to share data– Security and integrity are lower– Disposable systems– Misdirection of attention and resources

Page 62: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Host/terminal

• Remote storage and processing

• Associated with mainframe computers

• All shared resources are managed by the host

• Upgrades are in large chunks

Page 63: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Host/terminal

DCmanager

Application#1

Application#2

DBMS

Operating system

Terminal #1

Terminal #2

Terminal #3

Host

Page 64: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

LAN architectures

• A LAN connects computers within a geographic area

• Transfer speeds of up to 1,000 Mbits/sec

• Permits sharing of devices

• A server is a computer that provides and controls access to a shareable resource

Page 65: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

File/server

• A central data store for users attached to a LAN• Files are stored on a file/server• Data is processing on users’ personal computer• Entire files are transmitted on the LAN• Can result in heavy LAN traffic• File is locked when retrieved for update• Limited to small files and low demand

Page 66: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

File/server

Application #3

DCmanager

Operating system

DBMSApplication #2

Operating system

DCmanager

Application#2

Operating system

DBMS

DCmanager

Filemanager

DCmanager

Application#1

Operating system

DBMSLAN

Page 67: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

DBMS/server

• A server runs a DBMS• Only necessary records are transmitted on the

LAN• Less LAN traffic than file/server• Back-end program on the server handles retrieval• Front-end program on the client handles

processing and presentation• More sharing of processing than file/server

Page 68: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

DBMS/server

Application #3

Operating system

Application #2

Operating system

Application#2

Operating system

DCmanager

DBMSApplication

#1

Operating system

LANDC

manager

DCmanager

DCmanager

Page 69: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Client/server

• File/server and DBMS/server are examples of client/server

• Objective is to reduce processing costs by splitting processing between clients and the server

• Client is typically a GUI microcomputer• Savings

– Ease of use / fewer errors– Less training

Page 70: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Client/server

• Costs lowered if – Some processing can be shifted from server to clients

– GUI gives productivity gains

• Cost increases– Shift from terminals to personal computers

– Rewriting software

• Client/server may not be viable for some large scale transaction processing systems

Page 71: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Client/Server - 2nd Generation

DCmanager DBMS

Operating system

DCmanager Application

Application server Data server

Operating system

DCmanager

Operating system

DCmanager

Browser

Thin client

Operating system

DCmanager

Browser

Operating system

LAN

Page 72: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Two-tier versus three-tier

Type of client Fat ThinTechnology LAN WebApplication logic Mostly on the client Mostly on the serverNetwork load Medium LowData storage Server ServerServer intelligence Medium High

Page 73: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Advantages of the three-tier model

• Security

• Performance

• Access to systems

Page 74: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Evolution of client/server computing

Architecture DescriptionTwo-tier Processing is split between client PC and

server, which also runs the DBMS.Three-tier Client PC does presentation, processing is

done by the server, and the DBMS is on aseparate server.

N-tier Client PC does presentation. Processingand DBMS can be spread across multipleservers. A distributed resourcesenvironment.

Page 75: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Distributed database

• Communication charges are a key factor in total processing cost

• Transmission costs increase with distance– Local processing saves money

• A database can be distributed to reduce communication costs

Page 76: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Distributed database

• Database is physically distributed as semi-independent databases

• There are communication links between each of the databases

• Appears as one database

Page 77: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

A hybrid

• Architecture evolves– Old structures cannot be abandoned– New technologies offer new opportunities

• Ideally, the many structures are patched together to provide a seamless view of organizational databases

• Distributed database principles apply to this hybrid architecture

Page 78: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Fundamental principles

• Transparency

• No reliance on a central site

• Local autonomy

• Continuous operation

• Distributed query processing

• Distributed transaction processing

Page 79: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Fundamental principles

• Replication independence

• Fragmentation independence

• Hardware independence

• Operating system independence

• Network independence

• DBMS independence Independence

Page 80: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Distributed database access

• Remote Request

• Remote Transaction

• Distributed Transaction

• Distributed Request

Page 81: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Distributed database design

• Horizontal Fragmentation

• Vertical Fragmentation

• Hybrid Fragmentation

• Replication

Page 82: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Horizontal fragmentation

C1 C2 C3 C4 C6C5

C1 C2 C3 C4 C6C5

C1 C2 C3 C4 C6C5

C1 C2 C3 C4 C6C5

Server 3

Server 2

Server 1

Page 83: Databasesystemer E2002Lene Pries-HejeData Structure, Storage and Processing Architectures Databasesystemer Data Structure, Storage and Processing Architectures

Lene Pries-Heje Data Structure, Storage and Processing Architectures

Databasesystemer E2002

Vertical fragmentation