week08 - physical design

Upload: muhammad-asghar-khan

Post on 09-Apr-2018

249 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Week08 - Physical Design

    1/24

    1

    Database I

    Methodology

    Physical Design

  • 8/8/2019 Week08 - Physical Design

    2/24

    2

    Physical Database Design

    Throughout the processes of conceptual and

    logical database designs and the

    normalization, the primary objective has beenthe storage efficiency and the consistency of

    the database

    In the physical database design, however,

    the focus shifts from storage efficiency to theefficiency in execution

  • 8/8/2019 Week08 - Physical Design

    3/24

    3

    Physical Database Design

    (Cont.)

    The physical DB design involves:

    Transforms logical DB design into technical

    specifications for storing and retrieving data

    Does not include practically implementing the

    design however tool specific decisions are

    involved

    The Physical design requires the followinginput

    Normalized relations

    Definitions of each attribute (means the purpose

    or objective of the attributes)

  • 8/8/2019 Week08 - Physical Design

    4/24

    4

    Physical Database Design

    (Cont.)

    Descriptions of data usage (how and by whomdata will be used)

    Requirements for response time, data security,

    backup etc. Tool to be used

    Decisions that are made during this processare: Choosing data types

    Deciding file organizations

    Selecting structures

    Preparing strategies for efficient access

  • 8/8/2019 Week08 - Physical Design

    5/24

    5

    De-normalization

    De-normalization is a technique to move from higher

    to lower normal forms of database modeling in order

    to speed up database access

    De-normalization process is applied for deriving a

    physical data model from a logical design

    In logical design we group things logically related

    through same primary key

    In physical database design fields are grouped, as

    they are stored physically and accessed by DBMS

  • 8/8/2019 Week08 - Physical Design

    6/24

    6

    De-normalization (Cont.)

    We should be aware that each new RDBMS

    release usually bring enhanced performance

    and improved access options that mayreduce the need for De-normalization

    A fully normalized database schema can fail

    to provide adequate system response time

    due to excessive table join operations

  • 8/8/2019 Week08 - Physical Design

    7/24

    7

    De-normalization (Cont.)

    De-normalization Situation 1:

    Merge two Entity types into one with one to one

    relationship

    Even if one of the entity type is optional, so joining

    can lead to wastage of storage, however if two

    accessed together very frequently their merging

    might be a wise decision

    So those two relations must be merged for better

    performance, which have one to one relationship

  • 8/8/2019 Week08 - Physical Design

    8/24

    8

    De-normalization (Cont.)

    De-normalization Situation 2:

    Many to many binary relationships mapped to three

    relations

    Queries needing data from two participating relationsneed joining of three relations that is expensive

    Join is an expensive operation from execution point of

    view

    Consider the many to many relationship b/w EMP,PROJ and WORK

    EMP (empID, eName,pjId,Sal)

    PROJ (pjId,pjName)

    WORK (empId.pjId,dtHired,Sal)

  • 8/8/2019 Week08 - Physical Design

    9/24

    9

    De-normalization (Cont.)

    So now if we by de-normalizing these relations

    and merge the WORK relation with PROJ relation

    But in this case it is violating 2NF and anomalies

    of 2NF would be there

    But there would be only one join operation

    involved by joining two tables, which increases

    the efficiency

    EMP (empID, eName,pjId,Sal)

    PROJ (pjId,pjName, empId,dtHired,Sal)

  • 8/8/2019 Week08 - Physical Design

    10/24

    10

    De-normalization (Cont.)

    De-normalization Situation 3: In 1:M situation when the ET on side does not

    participate in any other relationship, then many sideET is appended with reference data rather than theforeign key

    In this case the reference table should be merged withthe main table

    Consider STUDENT and HOBBY relations

    One student can have one hobby and one hobby canbe adopted by many students

    Here hobby can be merged with the student relation

    Thus redundancy of data would be there, but therewould not be any joining of two relations, which willhave a better performance

  • 8/8/2019 Week08 - Physical Design

    11/24

    11

    Partitioning

    Partitioning splits same relation into two

    Aims of data partitioning in database are to

    Reduce workload (e.g. data access,communication costs, search space)

    Balance workload

    Speed up the rate of useful work (e.g. frequently

    accessed objects in main memory)

    There are two types of partitioning:

    Horizontal Partitioning

    Vertical Partitioning

  • 8/8/2019 Week08 - Physical Design

    12/24

    12

    Partitioning (Cont.)

    Horizontal Partitioning

    Table is split on the basis of rows, which means a

    larger table is split into smaller tables

    The advantage of this is that time in accessing the

    records of a larger table is much more than a

    smaller table

    Range Partitioning In this type of partitioning range is imposed on any

    particular attribute

    For Example for those students whose ID is from 1-

    1000 are in partition 1 and so on

  • 8/8/2019 Week08 - Physical Design

    13/24

    13

    Partitioning (Cont.)

    Hash Partitioning

    A particular algorithm is applied and DBMS knows that

    algorithm

    So hash partitioning reduces the chances ofunbalanced partitions to a large extent

    List Partitioning

    In this type of partitioning the values are specified for

    every partition So there is a specified list for all the partitions

  • 8/8/2019 Week08 - Physical Design

    14/24

    14

    Partitioning (Cont.)

    Vertical Partitioning

    Vertical partitioning is done on the basis of

    attributes

    Same table is split into different physical records

    depending on the nature of accesses

    Primary key is repeated in all vertical partitions of

    a table to get the original table Consider the Student relation

    STD (stId, sName, sAdr, sPhone, cgpa, prName,

    school, mtMrks, mtSubs, clgName,

    intMarks, intSubs, dClg, bMarks, bSubs)

  • 8/8/2019 Week08 - Physical Design

    15/24

    15

    Partitioning (Cont.)

    We can partition this relation vertically as

    under

    STD (stId, sName, sAdr, sPhone, cgpa,prName)

    STDACD (sId, school, mtMrks, mtSubs,

    clgName, intMarks, intSubs,

    dClg, bMarks,bSubs)

  • 8/8/2019 Week08 - Physical Design

    16/24

    16

    Data Storage Concepts

    Physical Storage Media Storage media are

    classified according to following characteristics:

    Speed of access

    Cost per unit of data

    Reliability

    RAID Redundant Array of Inexpensive Disks

    Many disk that look as a single disk to OS but have better

    performance and betterreliability RAID disk drives are used frequently on servers

    RAID have the property that the data are distributed over

    the drives to allow parallel operations

  • 8/8/2019 Week08 - Physical Design

    17/24

    17

    Data Storage Concepts (Cont.)

    Fundamental to RAID is "striping", a methodof concatenating multiple drives into onelogical storage unit

    Striping involves partitioning each drive'sstorage space into stripes which may be assmall as one sector (512 bytes) or as large asseveral megabytes

    The type of application environment, I/O ordata intensive, determines whether large orsmall stripes should be used

  • 8/8/2019 Week08 - Physical Design

    18/24

    18

    Data Storage Concepts (Cont.)

    RAID-0

    Simple Striping

    Virtual single disk is divided up into strips of ksectors each

    Since no redundant information is stored,

    performance is very good, but the failure of

    any disk in the array results in data loss

  • 8/8/2019 Week08 - Physical Design

    19/24

    19

    Data Storage Concepts (Cont.)

    1

    5

    9

    2

    6

    10

    3

    7

    11

    4

    8

    12

    Note: This example is a basic virtual drive where

    each element depicted as a disk is a physical disk

  • 8/8/2019 Week08 - Physical Design

    20/24

    20

    Data Storage Concepts (Cont.)

    RAID-1

    RAID Level 1 provides redundancy by writing all

    data to two or more drives

    The performance of a level 1 array tends to be

    faster on reads and slower on writes compared to

    a single drive, but if either drive fails, no data is

    lost

    This level is commonly referred to as mirroring

  • 8/8/2019 Week08 - Physical Design

    21/24

    21

    Data Storage Concepts (Cont.)

    1

    2

    3

    1

    2

    3

    1

    2

    3

  • 8/8/2019 Week08 - Physical Design

    22/24

    22

    Data Storage Concepts (Cont.)

    RAID-2,3 For reliability simple parity check code is used

    Parity bit is stored on separate disk

    RAID-4

    RAID Level 4 stripes data at a block level acrossseveral drives, with parity stored on one drive

    The performance of a level 4 array is very goodfor reads (the same as level 0)

    Writes, however, require that parity data beupdated each time

  • 8/8/2019 Week08 - Physical Design

    23/24

    23

    Data Storage Concepts (Cont.)

    RAID-5 RAID Level 5 is similar to level 4, but distributes parity

    among the drives

    This can speed small writes in multiprocessing systems,since the parity disk does not become a bottleneck

    RAID-0 is the fastest and most efficient array typebut offers no fault-tolerance

    RAID-1 is the array of choice for performance-

    critical, fault-tolerant environments RAID-2 is seldom used today since ECC is

    embedded in almost all modern disk drives

  • 8/8/2019 Week08 - Physical Design

    24/24

    24

    Data Storage Concepts (Cont.)

    RAID-3 can be used in data intensive or single-user

    environments which access long sequential records

    to speed up data transfer. However, RAID-3 does

    not allow multiple I/O operations to be overlapped RAID-4 offers no advantages over RAID-5 and does

    not support multiple simultaneous write operations

    RAID-5 is the best choices in multi-use

    environments which are not write performancesensitive. However, at least three and more typically

    five drives are required for RAID-5 arrays