hitachi uspv architecture and concepts.pdf

Upload: dv

Post on 24-Feb-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    1/103

    Hitachi Universal Storage Platform V

    Architecture and Concepts

    A White Paper

    By Alan Benway (Performance Measurement Group, Technical Operations)

    Confident ial Hitachi Data Systems Internal Use Only

    June 2009

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    2/103

    Notices and DisclaimerCopyright 2009 Hitachi Data Systems Corporation. All rights reserved.

    The performance data contained herein was obtained in a controlled isolated environment. Actual

    results that may be obtained in other operating environments may vary significantly. While HitachiData Systems Corporation has reviewed each item for accuracy in a specific situation, there is noguarantee that the same or similar results can be obtained elsewhere.

    All designs, specifications, statements, information and recommendations (collectively, "designs") inthis manual are presented "AS IS," with all faults. Hitachi Data Systems Corporation and its suppliersdisclaim all warranties, including without limitation, the warranty of merchantability, fitness for aparticular purpose and non-infringement or arising from a course of dealing, usage or trade practice.In no event shall Hitachi Data Systems Corporation or its suppliers be liable for any indirect, special,consequential or incidental damages, including without limitation, lost profit or loss or damage todata arising out of the use or inability to use the designs, even if Hitachi Data Systems Corporationor its suppliers have been advised of the possibility of such damages.

    Universal Storage Platform

    is a registered trademark of Hitachi Data Systems, Inc. in the UnitedStates, other countries, or both.

    Other company, product or service names may be trademarks or service marks of others.

    This document has been reviewed for accuracy as of the date of initial publication. Hitachi DataSystems Corporation may make improvements and/or changes in product and/or programs at anytime without notice.

    No part of this document may be reproduced or transmitted without written approval from HitachiData Systems Corporation.

    WARNING: This document can only be used as HDS internal documentation for informationalpurposes only. This documentation is not meant to be disclosed to customers or discussed withouta proper non-disclosure agreement (NDA).

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    3/103

    Document Revision Level

    Revision Date Description

    1.0 December 2007 Initial Release

    1.1 April 2008 Updates

    2.0 February 2009 Major revision - concepts additions, updates

    2.1 June 2009 Changes to MP Workload Sharing, eLUNs, ePorts, Cache Mode, and

    Concatenated Array Groups discussions, additional tables

    Reference

    Hitachi Universal Storage Platform V Performance Summary 08212008

    ContributorsThe information included in this document represents the expertise, feedback, and suggestions of anumber of skilled practitioners. The author would like to recognize and thank the following reviewersof this document:

    Gil Rangel, Director of Performance Measurement Group - Technical Operations

    Dan Hood, Director, Product Management, Enterprise Arrays

    Larry Korbus, Director, Product Management, HDP, UVM, VPM features

    Ian Vogelesang, Performance Measurement Group - Technical Operations

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    4/103

    Table of Contents

    Int roduct ion .......................................................................................................................................................................... 7

    Glossary ........................................................................................................................................ 7

    Overview of Changes ......................................................................................................................................................... 10

    Software Overview ...................................................................................................................... 10

    Hardware Overview .................................................................................................................... 11

    Processor Upgrade ................................................................................................................................................ 12

    Cache and Shared Memory ................................................................................................................................... 12

    Features and PCBs ................................................................................................................................................ 13

    Architecture Detail s ............................................................................................................................................................ 14

    Summary of Installable Hardware Features ................................................................................ 15

    Logic Box Details ........................................................................................................................ 15

    Memory Systems Details ............................................................................................................ 16

    Shared Memory (SMA) .......................................................................................................................................... 17

    Cache Switches (CSW) and Data Cache (CMA) .................................................................................................... 18

    Data Cache Operations Overview .......................................................................................................................... 20

    Random I/O Cache Operations .............................................................................................................................. 20

    Sequential I/O Cache Operations and Sequential Detect ...................................................................................... 22

    BED and FED Local RAM ...................................................................................................................................... 23

    Front-End Director Concepts ...................................................................................................... 23

    FED FC-8 port, ESCON, and FICON: Summary of Features ................................................................................. 25

    FED FC-16 port Feature......................................................................................................................................... 25

    I/O Request Limits and Queue Depths (Open Fibre) ............................................................................................. 26

    MP Distributed I/O (Open Fibre) ............................................................................................................................. 27

    External Storage Mode I/O (Open Fibre)................................................................................................................ 27

    Front-End Director Board Details ................................................................................................ 28

    Open Fibre 8-Port Feature ..................................................................................................................................... 28

    Open Fibre 16-Port Feature ................................................................................................................................... 29

    ESCON 8-port Feature ........................................................................................................................................... 30

    FICON 8-port Feature ............................................................................................................................................ 30

    Back-end Director Concepts ....................................................................................................... 31

    BED Feature Summary .......................................................................................................................................... 31

    Back-End-Director Board Details ................................................................................................ 32

    BED Details ............................................................................................................................................................ 32

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    5/103

    Back End RAID level Organization ......................................................................................................................... 32

    Universal Storage Platform V: HDU and BED Associations by Frame ....................................... 33

    Disk Details ................................................................................................................................. 35

    SATA DISKS .......................................................................................................................................................... 36

    HDU Switched Loop Details ........................................................................................................ 38

    Universal Storage Platform V Configuration Overviews ................................................................................................. 39

    Small Configuration (2 FEDs, 2 BEDs) ....................................................................................... 39

    Midsize Configuration (4FEDs, 4 BEDs) ..................................................................................... 40

    Large Configuration (8 FEDs, 8 BEDs) ....................................................................................... 41

    Provisioning Managing Storage Volumes ..................................................................................................................... 42

    Traditional Host-based Volume Management ............................................................................. 42

    Traditional Universal Storage Platform Storage-based Volume Management ............................ 43

    Dynamic Provisioning Volume Management .............................................................................. 45

    Usage Overview ..................................................................................................................................................... 46

    Hitachi Dynamic Provisioning Pools............................................................................................ 46

    V-VOL Groups and DP Volumes ................................................................................................ 48

    Usage Mechanisms ................................................................................................................................................ 50

    DPVOL Features and Restrictions ......................................................................................................................... 51

    Pool Page Details ................................................................................................................................................... 51

    Pool Expansion ...................................................................................................................................................... 53

    Miscellaneous Hitachi Dynamic Provisioning Details ............................................................................................. 53

    Hitachi Dynamic Provisioning and Program Products Compatibility ....................................................................... 54

    Universal Storage Platform V Volume Flexibility Example ..................................................................................... 55

    Storage Concepts ............................................................................................................................................................... 56

    Understand Your Customers Environment ................................................................................ 56

    Disk Types .................................................................................................................................. 56

    RAID Levels ................................................................................................................................ 57

    Parity Groups and Array Groups ................................................................................................. 59

    RAID Chunks and Stripes ........................................................................................................... 59

    LUNS (host volumes) .................................................................................................................. 60

    Number of LUNs per Parity Group .............................................................................................. 60

    Port I/O Request Limits, LUN Queue Depths, and Transfer sizes .............................................. 60

    Port I/O Request Limits .......................................................................................................................................... 60

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    6/103

    Port I/O Request Maximum Transfer Size .............................................................................................................. 61

    LUN Queue Depth .................................................................................................................................................. 61

    Mixing Data on the Physical Disks .............................................................................................. 62

    Workload Characteristics ............................................................................................................ 62

    Selecting the Proper Disk Drive Form Factor .............................................................................. 63

    Mixing I/O Profiles on the Physical Disks .................................................................................... 63

    Front-end Port Performance and Usage Considerations ............................................................ 63

    Host Fan-in and Fan-out ........................................................................................................................................ 63

    Mixing I/O Profiles on a Port ................................................................................................................................... 64

    Summary ............................................................................................................................................................................. 64

    Appendix 1. Universal Storage Platform V (Frames, HDUs, and Ar ray Groups). ......................................................... 65

    Appendix 2. Open Systems RAID Mechanisms ............................................................................................................... 66

    Appendix 3. Mainframe 3390x and Open-x RAID Mechanisms ...................................................................................... 68

    Appendix 4. BED: Loop Maps with Ar ray Group Names ................................................................................................. 71

    Appendix 5. FED: 8-Por t Fib re Channel Maps wi th Processors .................................................................................... 72

    Appendix 6. FED: 16-Por t Fibre Channel Maps w ith Processors .................................................................................. 73

    Appendix 7. FED: 8-Por t FICON Maps w ith Processors ................................................................................................. 74

    Appendix 8. FED: 8-Port ESCON Maps w ith Processo rs ............................................................................................... 75

    Appendix 9. Veri tas Volume Manager Example .......................................................................................................... 76

    Appendix 10. Pool Details ................................................................................................................................................. 77

    Appendix 11. Pool Example .............................................................................................................................................. 78

    Appendix 12. Disks - Physical IOPS Details .................................................................................................................... 79

    Appendix 13. File Systems and Hitachi Dynamic Provis ioning Thin Provi sioning ..................................................... 82

    Appendix 14. Putting SATA Drive Performance into Perspect ive ................................................................................. 83

    Appendix 15. LDEVs, LUNs, VDEVs and More ................................................................................................................. 87

    Internal VDEV ........................................................................................................................................................ 92

    External VDEV ....................................................................................................................................................... 92

    CoW VDEV ............................................................................................................................................................ 93

    Dynamic Provisioning VDEV .................................................................................................................................. 93

    Layout of internal VDEVs on the parity group ........................................................................................................ 94

    Appendix 16. Concatenated Par ity Groups ...................................................................................................................... 98

    Advantages of concatenated Parity Groups ......................................................................................................... 100

    Disadvantages of concatenated Parity Groups: ................................................................................................... 101

    Using LDEVs on concatenated Parity Groups as Dynamic Provisioning pool volumes ....................................... 101

    Summary of the characteristics of concatenated Parity Groups (VDEV interleave): ............................................ 102

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    7/103

    RESTRICTEDCONFIDENTIAL Page 7

    Hitachi Universal Storage Platform V

    Architecture and Concepts

    A White Paper

    By Alan Benway (Performance Measurement Group, Technical Operations)

    IntroductionThisdocumentcoversthearchitectureandconceptsoftheHitachiUniversal Storage Platform V.The

    conceptsandphysicaldetails regardingeachhardware featureare covered in the following sections.

    However,thisdocumentisnotintendedtocoveranyaspectsofprogramproducts,databases,customer

    specificenvironments,

    or

    new

    features

    available

    by

    the

    second

    general

    release.

    Areas

    not

    covered

    by

    thisdocumentinclude:

    TrueCopy/ShadowImage/UniversalReplicatorDisasterRecoverySolutions

    Host LogicalVolumeManagementGeneralguidelines

    HitachiDynamicLinkManager(HDLM)Generalguidelines

    UniversalVolumeManagement(UVM)virtualizationofexternalstorageforData

    LifecycleManagement(DLM)

    VirtualPartitionManagement(VPM)Generalguidelinesforworkloadmanagement

    OracleGeneralguidelinesforstorageconfiguration

    MicrosoftExchangeGeneralguidelinesforstorageconfiguration

    Thisdocument is intendedtofamiliarizeHitachiDataSystemssalespersonnel,technicalsupportstaff,

    customers,andvalueaddedresellerswiththefeaturesandconceptsoftheUniversalStoragePlatform

    V.Theusersthatwillbenefitfromthisdocumentarethosewhoalreadypossessanindepthknowledge

    oftheHitachiTagmaStoreUniversalStoragePlatformarchitecture.

    GlossaryThroughout thispaper the terminologyusedbyHitachiDataSystems notHitachi willbenormally

    used.Assomestorageterminology isuseddifferently inHitachidocumentationorbytheusers inthe

    field,herearesomedefinitionsasusedinthispaper:

    ArrayGroupThetermusedtodescribeasetoffourphysicaldiskdrivesinstalledasagroupin

    thesubsystem.

    When

    aset

    of

    one

    or

    two

    Array

    Groups

    (four

    or

    eight

    disk

    drives)

    is

    formatted

    usingaRAID level,theresultingformattedentityiscalledaParityGroup. Althoughtechnically

    thetermArrayGroupreferstoagroupofbarephysicaldrives,andthetermParityGrouprefers

    to something thathasbeen formattedasaRAID level and therefore actuallyhasparitydata

    (hereweconsideraRAID10mirrorcopyasparitydata),beawarethatthistechnicaldistinction

    isoftenlostandyouwillseethetermsParityGroupandArrayGroupusedinterchangeably.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    8/103

    RESTRICTEDCONFIDENTIAL Page 8

    BED Backend Director feature; the pair of Disk Adapter (DKA) PCBs providing 4 pairs of

    backendFibreChannelloops.Pairsoffeatures(4PCBs,or8pairsofloops)arealwaysinstalled.

    CHAHitachisnamefortheFED.

    CMACacheMemoryAdapterboard (81064MB/secports,4banksofRAMusing16DDR2

    DIMMslots)

    ConcatenatedParityGroup AconfigurationwheretheVDEVscorrespondingtoapairofRAID

    10 (2D+2D)orRAID5 (7D+1P)ParityGroups,oraquadofRAID5 (7D+1P)ParityGroups,are

    interleavedonaRAIDstripebyRAIDstripe,round robinbasisontheirunderlyingdiskdrives.

    ThishastheeffectofdispersingI/Oactivityovertwiceorfourtimesthenumberofdrives,butit

    doesnotchangethenumber,names,orsizeofVDEVs,andhenceitdoesn'tmakeitpossibleto

    assignlargerLDEVs. NotethatweoftenrefertoRAID10(4D+4D),butthisisactuallytwoRAID

    10(2D+2D)ParityGroupsinterleavedtogether. Foramorecomprehensiveexplanationseethe

    ConcatenatedParityGroupsectionintheappendix.

    CSWCacheSwitchboard(161064MB/secports),a2DversionofaHitachiSupercomputer3D

    crossbarswitch

    with

    nanosecond

    latencies

    (port

    to

    port)

    DKCHitachisname forthebasecontrol frame (orrack)wheretheLogicBoxandupto128

    disksarelocated.

    DKAHitachisnamefortheBED.

    DPVOL aDynamicProvisioning (DP)Volume; theVirtualVolume from anHDPPool. Some

    documentsrefertothisasaVVOL.ItisamemberofaVVOLGroup.EachDPVOLhavingauser

    specifiedsizebetween8GBand4TB.

    Featureaninstallablehardwareoption(suchasFED,BED,CSW,CMA,SMA)thatincludestwo

    PCBs(oneperpowerdomain).NotethattwoBEDfeaturesmustbeinstalledatatime.

    FEDFrontendDirectorfeature;thepairofChannelAdapter(CHA)PCBsusedtoattachhosts

    tothestoragesystem,providingOpenFC,FICON,orESCONattachment.

    HDU(HardDiskUnit)the64diskcontainerinaframethathas32diskslotsonthefrontside

    and32moreontheback.TheHDU isfurthersplit into leftandrighthalves,eachonseparate

    powerdomains.ThereareuptofourHDUsperexpansionframe,andtwointhecontrolframe.

    LDEV(LogicalDevice)A logicalvolume internaltothesubsystemthatcanbeusedtocontain

    customer data. LDEVs are uniquely identified within the subsystem using a six hex digit

    identifierintheformLDKC:CU:LDEV. LDEVsarecarvedfromaVDEV(seeVDEV),andthusthere

    arefourtypesofLDEVs internalLDEVs,externalLDEVs,COWVVOLs,andDPVOLs. LDEVsmay

    bemapped

    to

    ahost

    as

    aLUN,

    either

    as

    asingle

    LDEV,

    or

    as

    aset

    of

    up

    to

    36

    LDEVs

    combined

    in

    the form of a LUSE. Note: what is called an LDEV in HDS enterprise subsystems like the

    UniversalStoragePlatformV iscalledanLUorLUN inHDSmodularsubsystems like theAMS

    family,andwhatiscalledaLUNinHDSenterprisesubsystemsiscalledanHLUNinHDSmodular

    subsystems.

    LUN(LogicalUnitNumber)thehostvisibleidentifierassignedbytheusertoanLDEVtomake

    itvisibleonahostport.AninternalLUNhasanominalQueueDepthlimitof32,andanexternal

    (virtualized)LUNhasaQueueDepthlimitof2128(Adjustable).

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    9/103

    RESTRICTEDCONFIDENTIAL Page 9

    eLUN(externalLUN)anexternalLUN Isonewhich is located inanotherstoragearraythat is

    attachedvia twoormoreePortsonaUniversalStoragePlatformVandaccessedby thehost

    throughaUniversalStoragePlatformV.

    ePort (externalPort)AnexternalarrayconnectsviatwoormoreUniversalStoragePlatformV

    FibreChannelFEDportsontheUniversalStoragePlatformV instead toahost.TheFEDports

    usedin

    this

    manner

    are

    changed

    from

    ahost

    target

    into

    an

    initiator

    port

    (or

    external

    Port)

    by

    useoftheUniversalVolumeManagerSoftwareproduct.TheUniversalStoragePlatformVwill

    discover any exported LUNson each ePort, and thesewillbe configured as eLUNsusedby

    hostsattachedtotheUniversalStoragePlatformV.

    LUSE (LogicalUnitSizeExpansion)Aconcatenation (spillandfill)of2to36LDEVs(uptoa

    60TBlimit)thatisthenpresentedtoahostasasingleLUN.ALUSEwillnormallyperformatthe

    levelofjustoneoftheseLDEVs.

    MP(microprocessor)theCPUusedontheFEDsandBEDs.AlsocalledCHP(FED)orDKP(BED)

    ParityGroupasetofoneortwoArrayGroups(asetof4or8diskdrives)formattedasaRAID

    level,either

    as

    RAID

    10

    (often

    referred

    to

    as

    RAID

    1in

    HDS

    documentation),

    RAID

    5,

    or

    RAID

    6.

    TheUniversal StoragePlatformV'sParityGroup types areRAID10 (2D+2D),RAID5 (3D+1P),

    RAID5(7D+1P),andRAID6(6D+2P). InternalLDEVsarecarvedfromtheVDEV(s)corresponding

    to the formatted space inaParityGroup, and thus themaximum sizeof an Internal LDEV is

    determinedbythesizeoftheVDEVitiscarvedfrom. ThemaximumsizeofaninternalVDEVis

    approximately2.99TB. IftheformattedspaceinaParityGroupisbiggerthan2.99TB,thenas

    manymaximumsizeVDEVsaspossibleareassigned,andthentheremainingspace isassigned

    as the last, smaller VDEV. Note that there actually is no 4+4 Parity Group type see

    ConcatenatedParityGroup.

    PCBprintedcircuitboard;aninstallableboard(adapter).TherearetwoPCBsperFeature.

    PDEV(Physical

    DEVice)

    aphysical

    internal

    disk

    drive.

    SMA SharedMemory Adapter board (64 150MB/sec ports, 2 banks of RAM using 8DDR2

    DIMMslots,upto8GB)

    SVPServiceProcessor(aPCrunningWindowsXP)installedinthecontrolrack

    VDEVThelogicalcontainerfromwhichLDEVsarecarved. TherearefourtypesofVDEVs:

    o InternalVDEV(2.99TBmax):mapstotheformattedspacewithinaparitygroupthatis

    availabletostoreuserdata. LDEVscarvedfromaparitygroupVDEVarecalledinternal

    LDEVs.

    o External

    storage

    VDEV

    (2.99TB

    max):

    maps

    to

    a

    LUN

    on

    an

    external

    (virtualized)

    subsystem. LDEVscarvedfromexternalVDEVsarecalledexternalLDEVs.

    o CopyonWrite (CoW)VDEV (2.99TBmax): called a "VVOL group", and LDEVs carved

    fromaCoWVVOLgrouparecalledaCoWVVOLs

    o DynamicProvisioning(DP)VDEV(4TBmax):calledaVVOLgroup,andLDEVscarved

    fromaDPVVOLgrouparecalledaDPVOLs(DynamicProvisioningVolumes).

    VVOLGrouptheorganizationalcontainerofeitheraDynamicProvisioningVDEVoraCopyon

    WriteVDEV.WithDynamicProvisioning,itisusedtoholdoneormoreDPVols.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    10/103

    RESTRICTEDCONFIDENTIAL Page 10

    Overview of ChangesExpandingontheprovenandsuperiorHitachiTagmaStoreUniversalStoragePlatform technology,the

    Universal Storage PlatformV offers a new levelof Enterprise Storage, capableofmeeting themost

    demanding ofworkloadswhilemaintaining great flexibility. TheUniversal Storage Platform V offers

    muchhigherperformance,higherreliability,andgreaterflexibilitythananycompetitiveofferingtoday.

    Theseare

    the

    new

    features

    that

    distinguish

    the

    completely

    revamped

    Universal

    Storage

    Platform

    V

    from

    thepreviousUniversalStoragePlatformmodels:

    Software

    o HitachiDynamicProvisioning(HDP)volumemanagementfeature.

    Hardware

    o EnhancedSharedMemorysystem(upto24GBand256paths@150MB/s).

    o Faster,newergeneration800MHzNECRISCprocessorsonFEDs(ChannelAdapters)

    andBEDs(DiskAdapters).

    o FEDsMPshaveaMPWorkloadSharingfunction(perPCB)

    o BEDshave4Gbit/sbackenddiskloops.

    o SwitchedFCALLoopinterfacebetweenBEDsanddisks.

    o HalfsizedPCBsusednow(exceptforSharedMemoryPCBs),allowingformoreflexibleFEDconfigurationcombinations.

    o Supportforinternal1TB7200RPMSATAIIdisks.

    o SupportforFlashDrives.

    Software OverviewTheUniversalStoragePlatformVsoftware includesHitachi Dynamic Provisioning,amajornewOpen

    Systemsvolumemanagement feature thatwill allow storagemanagersand systemadministrators to

    moreefficientlyplanandallocatestoragetousersorapplications.Thisnewfeatureprovidestwonew

    capabilities: thin provisioning and enhanced volume performance. Hitachi Dynamic Provisioning

    providesfor

    the

    creation

    of

    one

    or

    more

    Hitachi

    Dynamic

    Provisioning

    Pools

    of

    physical

    space

    (each

    Pool

    assignedmultipleLDEVsfrommultipleParityGroupsofthesamedisktypesandRAIDlevel),andforthe

    establishment of DP volumes (DPVOLs, or virtual volumes) that are connected to a single Hitachi

    DynamicProvisioningPool.

    Thin provisioning comes from the creation of DPVOLs of a userspecified logical size without any

    correspondingallocationofphysicalspace.Actualphysicalspace(allocatedas42MBPoolpages)isonly

    assigned to aDPVOL from the connectedHitachiDynamic Provisioning Pool as thatDPVOLs logical

    spaceiswrittenbythehosttoovertime.ADPVOLdoesnothaveanyPoolpagesassignedtoitwhenitis

    firstcreated.Technically, itneverdoes thepagesareloanedout from itsconnectedPool to that

    DPVOLuntilthevolume isdeletedfromthePool.Atthatpoint,allofthatDPVOLsassignedpagesare

    returnedtothePoolsFreePageList.CertainindividualpagescanbefreedorreclaimedfromaDPVOL

    usingfacilitiesoftheUniversalStoragePlatformV.

    ThevolumeperformancefeatureisanautomaticresultfromthemannerinwhichtheindividualHitachi

    Dynamic Provisioning Pools are created. A Pool is created using 21024 LDEVs (Pool Volumes) that

    providethephysicalspace,andthePoolallocates42MBPoolpagesondemandtoanyoftheDPVOLs

    connectedtothatPool.Eachindividual42MBPoolpageisconsecutivelylaiddownonawholenumber

    ofRAID stripes fromonePoolVolume.Otherpagesassignedover time to thatDPVOLwill randomly

    originatefromthenextfree42MBPagefromoneoftheotherPoolVolumesinthatPool.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    11/103

    RESTRICTEDCONFIDENTIAL Page 11

    Asanexample,assumethatthereare12LDEVsfromtwelveRAID10(2D+2D)ParityGroupsassignedto

    aHitachiDynamicProvisioningPool.All48disksinthatPoolwillcontributetheirIOPSandthroughput

    powertoalloftheDPVOLsconnectedtothatPool.IfmorerandomreadIOPShorsepowerwasdesired

    for that Pool, then it couldhavebeen, forexample, createdwith 16 LDEVs from 16RAID5 (7D+1P)

    ParityGroups,thusproviding128disksofIOPSpowertothatPool.

    Asup

    to

    1024

    LDEVs

    may

    be

    assigned

    as

    Pool

    Volumes

    to

    asingle

    Pool,

    this

    would

    provide

    a

    considerable amountof I/Opower to thatPoolsDPVOLs.This typeofaggregationofdiskswasonly

    possiblepreviouslybytheuseofoftenexpensiveandsomewhatcomplexhostbasedvolumemanagers

    (suchasVERITASVxVM)oneachof theattachedservers.On theUniversalStoragePlatform (and the

    Universal Storage Platform V), the only alternative would be to build Striped Parity Groups

    (ConcatenatedParityGroupsseeAppendix16)usingtwoorfourRAID5(7D+1P)ParityGroups.This

    wouldprovideeither16or32disksunder thevolumes (VDEVs)created there.There isalso theLUSE

    option, but that is merely a simple concatenation of 236 LDEVs, a spillandfill capacity (not

    performance)configuration.

    AcommonquestionisHowdoestheperformanceofDPVOLsdifferfromtheuseofStandardVolumes

    when using the same number of disks?Consider thisexample.Say thatyouareusing32disks in8

    ParityGroups

    as

    RAID

    10

    (2D+2D)

    with

    8LDEVs,

    all

    used

    as

    Standard

    Volumes

    over

    4host

    paths.

    ComparethistoaHitachiDynamicProvisioningPoolwiththosesame8LDEVsasPoolVolumes,with8

    DPVOLs configured against that Pool andused over 4 hostpaths. If the hosts are applying aheavy,

    uniform, concurrent workload to all 8 Standard Volumes, then the server will see about the same

    aggregateIOPScapacityaswouldbeavailablefromthe8DPVOLswiththesameworkload.However,if

    theworkloads per volume (StandardorDPVOL) arenotuniform,or see intermittentworkloadsover

    time,theDPVOLswillalwaysdeliveraconstantandmuchhigherIOPScapacitythanwillthe individual

    StandardVolumes.Ifonlyfourvolumes(StandardorDPVOL)weresimultaneouslyactive,thenthefour

    StandardVolumeswouldonlyhavea16disk IOPScapacitywhilethefourDPVOLswouldalwayshave

    the full32disk IOPScapacity.Anotherwaytoseethis isthattheuseofStandardVolumescaneasily

    leadtohotspots,whereastheuseofDPVOLswillmostlyavoidthem.

    Hardware Overview

    Figure 1. Frame and HDU Map

    HDU-4L HDU-4R HDU-4L HDU-4R HDU-2L HDU-2R HDU-4L HDU-4R HDU-4L HDU-4R16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front

    16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear

    HDU-3L HDU-3R HDU-3L HDU-3R HDU-1L HDU-1R HDU-3L HDU-3R HDU-3L HDU-3R16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front

    16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear

    HDU-2L HDU-2R HDU-2L HDU-2R HDU-2L HDU-2R HDU-2L HDU-2R

    16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front

    16 HDD rear 16 HDD r ear 16 HDD r ear 16 HDD rear LOGIC BOX 16 HDD rear 16 HDD r ear 16 HDD r ear 16 HDD rear

    HDU-1L HDU-1R HDU-1L HDU-1R HDU-1L HDU-1R HDU-1L HDU-1R16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front 16 HDD front

    16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear 16 HDD rear

    DKU-R1 DKU-R2DKU-L2 DKU-L1 DKC

    R2 FrameL2 Frame L1 Frame Control Frame R1 Frame

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    12/103

    RESTRICTEDCONFIDENTIAL Page 12

    WhiletherearemanyphysicallyapparentchangestotheUniversalStoragePlatformVchassisandPCB

    cards from thepreviousUniversalStoragePlatformmodel, therearealsoanumberofnotsoevident

    internalconfigurationchangesthatanSEmustbeawareofwhenlayingoutasystem.

    The Universal Storage Platform V is a collection of frames, HDUs, PCB cards in a Logic Box, FCAL

    switchesanddisks (seeFigure1).The frames include thecontrol frame (DKC)and thediskexpansion

    frames(DKUs).

    Disks

    are

    added

    to

    the

    64

    disk

    HDU

    containers

    (up

    to

    18

    such

    HDUs)

    in

    sets

    of

    four

    (the

    ArrayGroup).ArrayGroupsareinstalled(followingacertainupgradeorder)intospecificHDUdiskslots

    oneitherthe leftorrighthalf (suchasHDU1RorHDU1L)and frontandrearofanHDUbox.Setsof

    HDUsarecontrolledbyoneoftheBEDs.

    Processor Upgrade

    Theprocessor(MP)usedontheFEDs(theMPsarealsocalledCHPs)andBEDs(theMPsarealsocalled

    DKPs) has been improved and its clock speed has been doubled. The quantities of processors

    representedinTable1arevaluesperPCBbyfeature.AfeatureisdefinedasapairofPCBswhereeach

    board is locatedona separatepowerboundary.As thereare twiceasmanyFED/BED features in the

    UniversalStorage

    Platform

    V

    as

    compared

    to

    the

    Universal

    Storage

    Platform,

    the

    overall

    processor

    count

    forFEDsandBEDsremainsthesamebuttheavailableprocessingpowerhasbeendoubled.

    Table 1. Universal Storage Platform V Processor Enhancements

    MPsperPCB

    Feature USP USPV

    FEDESCON8port 2x400MHz

    FEDESCON16port 4x400MHz

    FEDFICON8port 4x800MHz

    FEDFICON16port 8x400MHz 4x800MHz

    FEDFC8port 4x800MHz

    FEDFC16port 4x400MHz 4x800MHz

    FEDFC32port 8x400MHz

    BEDFCAL8port 4x800MHz

    BEDFCAL16port 8x400MHz

    Cache and Shared Memory

    TheDataCachesystemcarriesoverthesamepathspeeds(1064MB/s)andcounts(upto64paths)from

    theUniversalStoragePlatform,withapeakwirespeedbandwidthof68GB/s.TheSharedMemory(or

    ControlMemory)

    system

    has

    been

    significantly

    upgraded

    over

    the

    Universal

    Storage

    Platform,

    with

    256

    paths (up from192)operatingat150MB/s (up from83MB/s),withapeakbandwidthof38.4GB/s (up

    from15.9GB/s).OntheUniversalStoragePlatform,theFEDPCBshad8SharedMemorypaths,andthe

    BEDshad16.NowallofthePCBshave16SharedMemorypaths.

    NotethattheDataCachesystemcontainsonlytheactualuserdatablocks.TheSharedMemorysystem

    holdsallofthemetadataaboutthe internalParityGroups,LDEVs,externalLDEVs,andruntimetables

    forvarioussoftwareproducts.Therecanbeupto512GBofDataCacheand24GBofSharedMemory.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    13/103

    RESTRICTEDCONFIDENTIAL Page 13

    Features and PCBs

    Logicboards(PCBs)are installed inthefrontandrearslots intheLogicBox intheControlFrame.The

    logicboardtypesfortheUniversalStoragePlatformVincludethefollowing(asfeatures,eachapairof

    PCBs):

    CSWCacheSwitch1to4features(2to8PCBs)

    CMACacheMemory1to4features(2to8PCBs)

    SMASharedMemory 1or2features(2to4PCBs)

    FEDFrontendDirector(orChannelAdapter)1to14features(2to28PCBs)

    BEDBackendDirector(orDiskAdapter)2,4,6,or8features(4,8,12,16PCBs)of4Gb/secFibreChannelloops

    TheUniversalStoragePlatformVsnew halfsized PCBs (now installed inupperand lower, frontand

    rear slots in the LogicBox,described later) allow for a less costly,more incrementalexpansionof a

    system.Also,theBEDslotswillalsosupporttheuseofadditionalFEDPCBs(but2BEDfeaturesmustbe

    installedataminimum).Forexample,therecouldbe14FEDfeatures(28PCBs)installedinaUniversal

    Storage

    Platform,

    and

    they

    could

    be

    any

    mixture

    of

    Open

    Fibre,

    ESCON,

    FICON,

    and

    iSCSI.

    However,

    this

    gaveyoualargenumberofportsofasingletypethatyoumaynotneed,withasubstantialreductionof

    otherporttypesthatyoumayneedtomaximize.Withthenewhalfsizedcards,youcanhave18FED

    features(orupto14FEDfeaturesifthenumberofBEDfeaturesisreducedtojust2),usinganymixture

    (perfeature)ofthe interfacetypesasbefore.Now,astherearehalfasmanyportsperboard,smaller

    numbersoflesserusedporttypesmaybeinstalled.FeaturesarestillinstalledaspairsofPCBcardsjust

    aswiththeUniversalStoragePlatform.

    Table 2. Summary of Limits, Universal Storage Platform to Universal Storage Platform V

    Limits USP USPV

    DataCache(GB) 128 512

    RawCacheBandwidth 68GB/sec 68GB/sec

    SharedMemory(GB) 12 24

    SharedMemoryPaths(max) 192 256

    RawSharedMemoryBandwidth 15.9GB/sec 38.4GB/sec

    FibreChannelDisks 1152 1152

    SATADisks 1152

    LogicalVolumes 16k 64k

    MaxInternalVolumeSize 2TB 2.99TB

    MaxCoWVolumeSize 2TB 4TB

    MaxExternalVolumeSize 2TB 4TB

    IORequestLimitperFCFEDMP 2048 4096

    NominalQueue

    Depth

    per

    LUN

    16

    32

    HDPPools 128

    MaxPoolCapacity 1.1PB

    MaxcapacityofallPools 1.1PB

    LDEVsperPool 1024

    DPVolumesperPool 8192

    DPVolumeSizeRange 46MB 4TB

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    14/103

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    15/103

    RESTRICTEDCONFIDENTIAL Page 15

    Forquickreference,Figure2abovedepictsthefullyoptionedUniversalStoragePlatformVarchitecture.

    OtherUniversalStoragePlatformVconfigurationsaredetailedfurtherdown.Thisarchitectureincludes

    64x1064MB/secDatapaths representing68GB/secofdatabandwidth and256x150MB/secControl

    paths representing38.4GB/secofmetadata and controlbandwidth.When comparing theUniversal

    StoragePlatformVtoothervendorsmonolithiccachesystems(suchasEMCDMX4),theaggregateof

    theUniversal Storage PlatformVsData +Controlbandwidthsmustbeused for an applestoapples

    comparison.The

    throughput

    aggregate

    for

    the

    fully

    optioned

    system

    is

    afully

    usable

    106.4GB/s.

    This

    includes anyoverhead forblockmirroringoperations thatoccur inboth the Data Cache and Shared

    Memorysystems(discussedfurtherdown).

    Summary of Installable Hardware FeaturesThetablesbelowshowoverviewsoftheavailablefeaturesforboththeUniversalStoragePlatformand

    theUniversalStoragePlatformV.ExceptfortheSharedMemoryPCBs,alloftheotherPCBsarenowhalf

    sized. The lower two tables show the tradeoff between installedUniversal Storage PlatformV BED

    featuresandavailableUniversalStoragePlatformVFEDfeatures.Ifsomeoneneedsa largenumberof

    ports(and

    reduced

    amounts

    of

    internal

    disks

    and

    performance)

    for

    virtualization,

    additional

    FED

    featuresmaybeinstalledintheplaceofupto6BEDfeatures.

    Table 3. Comparison: Universal Storage Platform and Universal Storage Platform V Features

    Features USP USP V

    FEDs 17 114

    BEDs 14 2,4,6,8

    Cache 2 4

    SharedMemory 2 2

    CacheSwitches 2 4

    Logic Box DetailsTheLogicBoxchassis(mainDKCframe) iswhereallofthedifferenttypesofPCBsforthefeaturesare

    installedinspecificslots(seeFigure3).ThislayoutisverydifferentfromthepreviousUniversalStorage

    Platformmodel.AllofthePCBsarenowhalfsized,andthereareupperand lowerslots inadditionto

    BED PAIRS FED #1 FED #2 FED #3 FED #4 FED #5 FED #6 FED #7 FED #8 FED #9 FED #10 FED #11 FED #12 FED #13 FED #14

    Installed To ta l Por ts To tal Por ts Total Por ts Total Po rt s Tota l Po rt s Total Port s Total Port s Total Por ts Total Por ts Total Por ts To ta l Por ts To ta l Por ts Total Po rt s Total Port s

    1 & 2 16 32 48 64 80 96 112 128 144 160 176 192 208 224

    3 & 4 16 32 48 64 80 96 112 128 144 160 176 192

    5 & 6 16 32 48 64 80 96 112 128 144 160

    7 & 8 16 32 48 64 80 96 112 128

    FED 16-Port Fibre Channel Kits Installed

    BED PAIRS FED #1 FED #2 FED #3 FED #4 FED #5 FED #6 FED #7 FED #8 FED #9 FED #10 FED #11 FED #12 FED #13 FED #14

    Installed To ta l Por ts To tal Por ts Total Por ts Total Po rt s Tota l Po rt s Total Port s Total Port s Total Por ts Total Por ts Total Por ts To ta l Por ts To ta l Por ts Total Po rt s Total Port s

    1 & 2 8 16 24 32 40 48 56 64 72 80 88 96 104 112

    3 & 4 8 16 24 32 40 48 56 64 72 80 88 965 & 6 8 16 24 32 40 48 56 64 72 80

    7 & 8 8 16 24 32 40 48 56 64

    FED 8-Port FICON or ESCON Channel Kits Installed

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    16/103

    RESTRICTEDCONFIDENTIAL Page 16

    thefrontandback layout.TheassociationsamongBEDs,FEDsandtheCSWs(cacheswitches)arealso

    different.The factorynamesofthePCBtypesandtheiroptionnumbersareshownaswell.Notethat

    FEDupgradefeaturesmayconsumeuptosixpairsofunusedBEDslots.

    Figure 3. Logic Box Slots (BED slots 3-8 can be used for FEDs if desired)

    Memory Systems DetailsTheUniversalStoragePlatformVmemory subsystemsareuniqueamongallenterprisearrayson the

    market.Allotherdesignsuseasingleglobalcache space thatcontrols theoperationof thearray.All

    accessesfordata,controltables,metadata,replicationsoftwaretablesandsuchallgoagainstthesame

    commoncachesystem.Allof theseactivitiescompetewithoneanotheroverthesame internalpaths

    forcachebandwidth.Assuch,cacheaccessisaprimarychokepointoncompetingdesigns.

    TheUniversalStoragePlatformVhasfourparallelhighperformancememorysystemsthatisolateaccess

    todata,metadata,controltablesandarraysoftware.Theseinclude:

    SharedMemory

    (Control

    Memory)

    system

    for

    metadata

    and

    control

    tables

    Datacache(datablocksonly)

    LocalRAMpooloneachFEDandBEDPCB(upto32suchPCBsinanarray)forusebythefour

    MP processors on each board (128 processors in a full array). This RAM is used for the

    workspaceforeachMP,localdatablockcachingandlocalLUNmanagementtables.

    NVRAMregionforeachMPthatholdsthemicrocodeandoptionalsoftwarepackages.

    2XU 2WU 2VU 2UU 2TU 2CC 2SB 2CG 2RU 2QU 2PU 2NU 2MU

    Opt 5 Opt 6 Opt 2 Opt 6 Opt 5 Opt 1 Opt 2 Opt 3 Opt 2 Opt 1 Basic Opt 2 Opt 1

    2XL 2WL 2VL 2UL 2TL 2CD 2SD 2CH 2RL 2QL 2PL 2NL 2ML

    Opt 7 Opt 8 Opt 3 Opt 8 Opt 7 Opt 2 Opt 1 Opt 4 Opt 4 Opt 3 Opt 1 Opt 4 Opt 3

    1LU 1KU 1JU 1HU 1GU 1CE 1SA 1CA 1FU 1EU 1DU 1BU 1AU

    Opt 5 Opt 6 Opt 2 Opt 6 Opt 5 Opt 3 Opt 1 Opt 1 Opt 2 Opt 1 Basic Opt 2 Opt 1

    1LL 1KL 1Jl 1HL 1GL 1CF 1SC 1CB 1FL 1EL 1DL 1BL 1AL

    Opt 7 Opt 8 Opt 3 Opt 8 Opt 7 Opt 4 Opt 2 Opt 2 Opt 4 Opt 3 Opt 1 Opt 4 Opt 3

    LOWER

    USP-V Rear (Cluster-2)

    USP-V Front (Cluster-1)

    UPPER

    UPPER

    LOWER

    BED-1

    BED-1

    FED-4

    FED-3

    CSW-1

    BED-3

    FED-2

    FED-1

    CSW-0

    BED-2

    FED-4

    FED-3

    CSW-1

    BED-4

    BED-4

    BED-3

    FED-2

    FED-1

    CSW-0

    BED-2

    CMA-3

    SMA-1

    CMA-1

    CMA-4

    SMA-2

    CMA-2

    CMA-1

    SMA-2

    CMA-3

    CMA-2

    SMA-1

    CMA-4

    FED-5

    BED-7

    BED-8

    CSW-3

    FED-8

    FED-7

    BED-5

    BED-6

    CSW-2

    FED-6

    FED-5

    BED-7

    BED-8

    CSW-3

    FED-8

    FED-7

    BED-5

    BED-6

    CSW-2

    FED-6

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    17/103

    RESTRICTEDCONFIDENTIAL Page 17

    Shared Memory (SMA)

    The SMA subsystem is critical to achieving the Universal Storage Platform Vs very high array

    performance as all control andmetadata information is containedwithin this system.Access to and

    managementofalldatablocksintheDataCacheismanagedbytheSMAsystem.TheSMAsubsystemis

    uniqueamongallenterprisearraysonthemarket.Allotherdesignsuseasingleglobalcacheforalldata,

    metadata,controltables,andsoftware.Theseotherdesignscreateaserializationofaccess intocache

    forallofthesecompetingoperations.TheSMAdesignremovesallofthenondataaccesstoDataCache,

    allowingdatablockstobemovedunimpededatveryhighrates.

    TheSMAsubsystem issizedaccordingtothearrayconfiguration(hardwareandsoftware).AUniversal

    StoragePlatformVcanhaveupto24GBofSMAinstalledacrosstwofeatures(4PCBs).Asectionofeach

    boardismirroredontotheotherboardofthefeatureforredundancy.

    TheLogicBoxchassishasslotsforfourSMAPCBs.OneSMAfeatureisinstalledinthebasesystemand

    one more feature (2 PCBs) is an upgrade option. Due to the way the Shared Memory subsystem

    functions, theoptionalSMA featureshouldalwaysbe installed inasystem.Asdescribedbelow,each

    FEDandBEDPCBhas8 paths intotheSharedMemorysubsystem. It is importanttoknowthatthese

    pathsarehardwiredtospecificportsoneachSMAPCB.EachFEDorBEDPCBdirects2SMApathsto

    each installedSMAPCB.Withall4SMAPCBs, thismeans thatall8SMApathsperFED/BEDPCBare

    connected.

    Figure 4. Shared Memory PCB

    EachSMAPCBhas8DDR2333DIMMslots,organizedastwoseparatebanksofRAM(4slotseach).A

    singleboard

    can

    support

    up

    to

    8GB

    of

    RAM

    when

    using

    either

    1GB

    or

    4GB

    DIMMs.

    The

    pair

    of

    boards

    for

    eachSMAfeatureisinstalledindifferentpowerdomainsintheLogicBox(Cluser1andCluster2).Each

    PCB has 64 150MB/s SMA (control) paths. This provides for 9.6GB/s of bandwidth (wire speed) per

    board.EachbankofRAMhasapeakbandwidthof5.19GB/s,oraconcurrent totalof10.38GB/sper

    board.DuetotheveryhighspeedoftheRAMcomparedtotheindividualSMAports,withportbuffering

    andportinterleaving,theRAMallowsthe64SMApathstooperateatfullspeed.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    18/103

    RESTRICTEDCONFIDENTIAL Page 18

    Foreach SMA feature, there is a small region that ismirroredonto theotherboard in thatpair for

    redundancy.(Note:TheregionusedforHitachiDynamicProvisioningcontrolisbackedupontoaprivate

    diskregiononsomePoolVolumes.)TheCluster1SMAboardsareassociatedwiththeCacheAregionof

    theDataCache(explainedbelow),whiletheCluster2boardsareassociatedwiththeCacheBregion.

    The basic configuration information for the array is located in the lower memory address range of

    SharedMemory.

    This

    is

    also

    backed

    up

    on

    the

    NVRAM

    located

    on

    each

    FED

    and

    BED

    PCB

    in

    the

    system.

    Themetadataand tables foroptionalsoftwareproducts (suchasHitachiDynamicProvisioning) inthe

    other (higher)memory address regions in SharedMemory arebackedup to the internaldisk in the

    Serviceprocessor(SVPPC)atarraypowerdown.

    Cache Switches (CSW) and Data Cache (CMA)

    EachUniversalStoragePlatformVhasup to8 Data Cacheboards (CMA)and8 Cache Switchboards

    (CSW).ThecacheswitchesconnecttheindividualcacheboardstothecacheportsoneachFEDandBED

    board.TheLogicBoxchassishasslotsfor8CacheMemory(CMA)and8CacheSwitch(CSW)PCBs.One

    CMA featureandoneCSW featureare installed in thebase systemand threemore features (6PCBs

    each)of

    each

    type

    may

    be

    added

    as

    upgrades.

    The

    eight

    1064MB/s

    cache

    ports

    on

    each

    CMA

    board

    are

    attachedtohalfoftheportsonthecachesideofcentralCacheSwitches.Theprocessorsideports

    on theCache Switches are connected to FED andBED cacheports.Due to theway theDataCache

    subsystemisinterconnectedthroughtheCacheSwitchsystem,anyCMAportonanyFEDorBEDboard

    canaddressanycacheboard.Asdescribedbelow,eachFEDandBEDPCBhas2paths into theCache

    Switchsubsystem.

    CSW - Cache Switch

    EachCacheSwitchboard(CSW)has16bidirectionalports(8tocacheports,8toFED/BEDports),each

    portoperatingat1064MB/s.Thesecacheswitchesareveryhighperformancecrossbarswitchesasused

    in high performance supercomputers and servers. In fact, these CSW boards are a 2D scaled down

    versionof

    a3D

    switch

    used

    in

    Hitachi

    supercomputers.

    The

    CSW

    was

    designed

    by

    the

    Hitachi

    SupercomputerandHitachiNetworksDivisions.Theinternalporttoportlatencyofthisswitchisinthe

    nanosecondrange.

    ThenumberofCSWfeaturesneededdependspurelyonthenumberofCHA/DKAfeaturesinstalled.

    AddingmoreCSWfeaturestoanexistingconfigurationonlypermitsmoreFED/BEDfeaturestobeinstalled

    AddingmoreCSWfeaturesdoesnotaffecttheperformanceofexistingFED/BEDfeatures.

    EachCSWboardcansustain81064MB/stransfersbetweenthe8FED/BEDpathsandthe8cacheports

    foranaggregateof8.5GB/sperboard.AstheremaybeuptofourCSWfeaturesinstalledpersystem(8

    CSW

    PCBs),

    there

    can

    be

    64

    cache

    side

    ports

    switching

    among

    64

    FED/BED

    processor

    ports.

    This

    provides for a total of 68GB/s per array of nonblocking, dataonly bandwidth to cache from the

    FED/BEDboards atextremely small (nanosecond) latency rates.Noother vendorhas anonblocking

    design(despitesomeunsupportableclaimstothecontrary),noraretheyabletoprovidesustaineddata

    onlybandwidthatanywherenearthisrate.Thisbandwidth isnotconstrainedbythetypeofI/O(read

    versuswrite)sinceallcachepathsarefullspeedandbidirectional.

    Figure 5 illustrates aUniversal Storage Platform V arraywith only two CSW and two CMA features

    installed.IfallfourCSWandCMAfeatures(8boardseach)were installed,theneachcachesidepath

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    19/103

    RESTRICTEDCONFIDENTIAL Page 19

    onaCSWPCBwouldgotooneoftheeightCSAPCBs.EveryCSWhasat leastonepathtoeveryCMA.

    Farther down in this document aremaps of associations of FEDs and BEDs to the CSWs. There is a

    certainrelationshipfromtheCSWprocessorsidetotheFEDandBEDCMAports.

    Figure 5. Universal Storage Platform V with Two Cache Features and Two Cache Switch Features Installed.

    CMA - Data Cache

    Each CMA cache board has 16 DDR2400 DIMM slots, organized as four banks of RAM (thus 32

    independentbanksinallfourfeatures).EachCMAboardcansupport864GBofRAMusingasinglesize

    ofDIMMacrossall installedCMAboards.The sameamountofRAMmustbe installedoneachCMA

    board.Thepairofboards foraCMA feature is installed indifferentpowerdomains in the LogicBox

    (Cluster1andCluster2).

    EachCMAhas81064MB/sbidirectionaldatapaths.Thisprovidesfor8.5GB/sofreadwritebandwidth

    (wirespeed)perboard.EachbankofRAMhasapeakbandwidthof6.25GB/s,or25GB/sperboard.Due

    to the very high speed of the RAM compared to the CMA ports, with port buffering and port

    interleaving,theRAMallowsall8CMAcachepathsperboardtooperateatfullspeed.

    Figure 6. Data Cache Board (one of a feature pair)

    Cache Cache Cache Cache

    Cache Side

    Processor Side

    CSW

    Cache Side

    Processor Side

    CSW

    Cache Side

    Processor Side

    CSW

    Cache Side

    Processor Side

    CSW

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    20/103

    RESTRICTEDCONFIDENTIAL Page 20

    Data Cache Operations Overview

    TheDataCacheisorganizedintotwocontiguousaddressrangesnamedCacheAandCacheB.TheRAM

    located across the cluster1CMA boards is a single address range called CacheA,while that of the

    cluster2boardsiscalledCacheB.TheaddressrangeofCacheBimmediatelyfollowstheendofCacheA

    (illustratedbelow).Thisentirespaceisaddressableasunitsof2KBcacheblocks. AdatablockforanI/O

    readrequestcanbeplaced ineitherCacherangebytheBEDMP (orFEDMP forexternalstorage I/O

    operations).Writeoperationsaresimultaneouslyduplexed intobothCacheAandCacheBbytheFED

    MPprocessingtherequest. Notethatonlytheindividualwriteblocksaremirrored,nottheentirecache

    space.

    Figure 7. Cache Address Space - Cache-A and Cache-B

    Thedatacacheislogicallydividedupintological256KBcacheslotsacrosstheentirelinearcachespace

    (theconcatenationofCacheAandCacheB). It isa singleuniformaddress space rather thana setof

    isolatedpartitionedspaceswhereonlycertaindevicesaremappedtoindividualregionsonindependent

    cacheboards.

    Whencreated,eachUniversalStoragePlatformVLDEV(apartitionfromaParityGroup)isallocatedan

    initial setof 16 cache slots forRandom I/Ooperations.A separate setof 24 cache slots is allocated

    whenaFEDMPdetects sequential I/Ooperationsoccurringon thatLDEV.The initial setof16cache

    slots(per

    LDEV)

    for

    use

    for

    processing

    Random

    I/O

    requests

    can

    be

    temporarily

    expanded

    by

    one

    or

    moreadditional setsof16 random I/O256KB slotsasneeded (ifavailable from theassociatedcache

    partitionsfreespace).

    Since individualLDEVscandynamicallygrowtovery largesizes intermsoftheircachefootprintbased

    ontheworkloads,theuseofCachePartitioning(usingtheVirtualPartitionManagementpackage)can

    be used to fence off the cache space usable by selected Parity Groups (with all of their LDEVs) to

    managetheirmaximumallocatedrandomslotsets.

    Random I/O Cache Operations

    Each

    logical

    256KB

    cache

    slot

    contains

    four

    logical

    64KB

    cache

    segments.

    For

    each

    cache

    slot,

    two

    of

    these64KB segmentsareonlyused for readoperations, and two areonlyused for writeoperations

    (moreaboutthisdistinctionbelow). Furthermore,eachofthe64KBsegmentsisfurthersubdividedinto

    four logical 16KB subsegments. Eachof these is thendivided into eightphysical 2KB Cache Blocks.

    Therefore,each256KBcacheslotcontains642KBcacheblocksforrandomreadsandanother642KB

    blocksforrandomwrites.AnycacheblockinCacheAandCacheBcanbeusedtobuildthelogical16KB

    subsegments(managedbytablesinSharedMemory).

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    21/103

    RESTRICTEDCONFIDENTIAL Page 21

    InitialRandomCacheAllocationPerLDEV

    256KBCacheSlot 16

    64KBReadSegments 32

    64KBWriteSegments 32

    16KBReadSubsegments 128

    16KBWritesubsegments 128

    2KBReadCacheBlocks 512

    2KBWriteCacheBlocks 512

    Thetableaboveshowstheoverallbreakdownofelementsforasetof16randomI/Ocacheslotsforone

    LDEV.Figures810belowshowtherelationshipamongtheseelements.

    Figure 8. Random I/O: Logical Cache Slot and Segments Layout

    Figure 9. Random I/O: Logical Segment and Sub-segment Layout

    Figure 10. Random I/O: Logical Sub-segments and Physical Cache Blocks Layout

    Itisbecauseofthesesmall2KBcacheblockallocationunitsthattheUniversalStoragePlatformVdoes

    sowellwithvery smallblock sizesand randomworkloads. Ingeneral, randomworkloadsareusually

    associatedwithapplicationblocksizesof2KBto32KB.Othervendorsuseasinglefixedcacheallocation

    size (suchas32KBand64KB forEMCDMX) for individual I/Ooperations.Hence,a2KBhost I/Owill

    waste30KB

    or

    62KB

    of

    each

    EMC

    DMX

    cache

    slot.

    On

    the

    IBM

    DS8000

    series,

    the

    cache

    allocation

    unit

    is

    4KB,therebyavoidingwastingcachespace.ButtheDS8000arraysareactuallygeneralpurposeservers

    runningtheAIXoperatingsystemusingthelocalsharedRAMonindividualprocessorbooks.Onboth

    EMCandIBM,theirglobalcachecontainstheoperatingsystemspace,allstoragecontrolandmetadata,

    all software,andalldatablocks.On theUniversalStoragePlatformV, theCMA cache system isonly

    usedforuserdatablocks.

    Write Operations

    Inthecaseofhostwrites,theindividualdatablocksaremirrored(duplexed)toanotherCMAboardona

    different power boundary (i.e. CacheA and CacheB). Themirroring is only for the actualuser data

    blocks,unlikethestaticmirroringof50%ofcachetotheother50%ofcacheasisusedbyEMCforDMX

    3and

    DMX

    4.

    Forexample,ifahostapplicationwritesan8KBblockofdata,that8KBblockwillbewrittentotwoCMA

    boards,using four2KB cacheblocks from a64KBWrite segment inCacheAand another setof four

    cacheblocks fromCacheB.TheFEDMP thatowns theportonwhich thehost requestarrived is the

    processorthatperformsthisduplexingtask.Afterthewritehasbeenprocessedanddestagedtodisk,

    the 8KB of mirrored blocks will be deleted, and the other 8KB will remain in cache (after being

    remappedtoaReadsubsegmentseebelow)forapossiblefuturereadhitonthatdata.

    64KB Read segment 64KB Read segment 64KB Write Segment 64KB Write Segment

    256KB Cache slot (random)

    16KB Sub-seg 16KB Sub-seg 16KB Sub-seg 16KB Sub-seg64KB Segment

    2KB block 2KB block 2KB block 2KB block 2KB block 2KB block 2KB block 2KB block

    16KB Sub-Segment

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    22/103

    RESTRICTEDCONFIDENTIAL Page 22

    NotethatareadcachehitondatanotyetdestagedtodiskfromaWriteSegmentisnotpossible.Infact,

    nohost readsaredirectedagainstWriteSegments.A read request forablockwhich stillhasawrite

    pendingconditionwillforceadiskdestagefirst,andallpendingwritesinthatsame64KBsegmentwill

    beincludedinthisforceddestage.Similarly,awritecachehitisnotpossibleatanytimeifawritehitis

    meantasoverwritingapendingwriteblock.OntheUniversalStoragePlatformV,allblockswrittento

    cachemustbedestagedtodiskintheordertheyarrived.Thereisnooverwritingofdirtyblocksnot

    yetwritten

    to

    disks

    as

    is

    the

    case

    with

    amidrange

    array.

    On

    the

    Universal

    Storage

    Platform

    V,

    awrite

    hit

    actuallymeansthatthereisemptyspaceinoneofthatLDEVsexistingwritesegmentsalreadyavailable

    foranewI/Orequest.Awritemisswouldthenmeanthatnosuchexistingspaceisavailable,andeither

    anothersetof16 random I/Ocacheslotswillbeallocated foruseby thatLDEV,or (ifcacheslotsare

    tight)allofthependingwrites inthatLDEVsallocated64KBWriteSegmentswillbedestaged,freeing

    upallofitsrandomwritespace.

    Write Pending Limit

    There isawritepending limittriggerof30%and70%percachepartition (thebasepartition ifothers

    havenotbeencreatedusingtheVirtualPartitionManagerproduct).Whenthe30%limitishit,thearray

    beginsadjusting the internalpriorityofwriteoperationshigher than those for reads.When the70%

    limitis

    hit,

    the

    array

    goes

    into

    an

    urgent

    level

    of

    data

    destage

    (writes

    take

    precedence

    over

    reads)

    to

    diskfromthose64KBrandomsegmentsusedforwrites.IfusingVPMandcachepartitions(andaglobal

    modesettingof454=ONforthearray), inmostcasesonlythe64KBWriteSegmentswithinapartition

    see thisurgentdestage (themode switch creates and averagingmechanism topreventallpartitions

    from being similarly affected). Other partitions are generally isolated with their own write pending

    triggers. Inmostcasesa70%writepending limit isan indicationoftoo fewdisksabsorbingthewrite

    operations.

    Read versus Write Segments

    There isan interestingeventthathappensafteraWriteSegment isprocessedbyadestageoperation.

    Allcacheslots,aswellastheirReadandWrite64KBsegments,subsegments,and2KBcacheblocksare

    actuallydispersed

    across

    the

    entire

    cache

    space

    managed

    at

    the

    2KB

    cache

    block

    level.

    These

    2KB

    cache

    blocksaredynamicallyremappedtovariouscachesegments.AWritesegmentcannotbeusedforreads.

    Sohowdoesa freshwritethathasbeendestagedbecomereadableasacachehit for followonread

    requests?

    Suppose8KBofdatawaswrittentoeight2KBcacheblocks(fourinoneWriteSegment,withfourmore

    inaWriteSegmentmirror).Afterthediskdestageoperationoccurs,fourofthoseblocksaremarkedas

    free (from themirroredWriteSegment)and fourare remapped intoaReadSegment (thusbecoming

    readableforacachehit).ThesefoursurrenderedblocksfromthatWriteSegmentsaddressspaceare

    replacedby fourunused2KB cacheblocks.Theuserdatawasnot copied,but the logical locationof

    thoseblocksdidchange.AllofthisismappingmanagedinthecachemapinSharedMemorysystemand

    occurs

    at

    RAM

    speeds.

    In

    fact,

    nearly

    everything

    in

    the

    Universal

    Storage

    Platform

    V

    (internal

    LDEVs,

    external LDEVS, HitachiDynamic Provisioning pages, etc.) is amapped entitywith a high degree of

    flexibilityonhowtomanageit.

    Sequential I/O Cache Operations and Sequential Detect

    A separate setof 24 cache slots is allocated to an individual LDEVwhen a sequential I/Opattern is

    detectedagainstthatLDEVonaportownedbyaFEDMP(knownassequentialdetect).Thissetof24

    slots is released when a sequential I/O pattern is no longer detected. This is one reason a proper

    sequentialdetectstate isimportantontheUniversalStoragePlatformV.IfsequentialI/O isbrokenup

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    23/103

    RESTRICTEDCONFIDENTIAL Page 23

    acrossseveralportsondifferentFEDs(forexample),itwilllooklikerandomI/Ototheseveralindividual

    MPscontrollingthosehostports.Asanexample,theuseofahostbasedLogicalVolumeManagerthat

    createsalargeRAID0volumebystripingacrossseveralLDEVsonseveraldifferenthostportscandefeat

    thesequentialdetectionmechanism.Sequentialdetect ismanagedforeachLDEVcontrolledbyanMP

    onaFEDfortheoneortwoFEDportsthatitowns.

    Unlikethe

    cache

    slots

    assigned

    for

    random

    I/O,

    each

    256KB

    cache

    slot

    is

    used

    as

    asingle

    segment

    of

    256KBforsequentialI/O. ThismatchestheRAIDchunksizeof256KBonthedisks.Insequentialmode,

    anentire256KBchunk isread fromorwrittentoeachdisk.Sequentialprefetchwillreadseveralsuch

    chunks intocacheslotswithoutbeing instructed todosoby thecontrollingMPon theFED involved.

    Sequentialwriteswillbeperformedasfullstripewrites toaParityGroup, thusminimizing thewrite

    penalty for RAID levels 5 and 6. The data in these sequential cache slots can be reusable for a

    subsequentcachehit.Ingeneral,oncethedatahasbeenprocessed,thesedynamicallyallocatedcache

    slotsarereleasedforthenextsequentialI/OoperationagainstthatLDEVorarereturnedtothecache

    poolonce sequentialdetect isno longerpresent (for someperiodof time) for that LDEV.However,

    certain lab testsdo show ahigh cachehit ratewherea smallnumberof LUNsunder testhavehigh

    sequentialreadratiosanduselargeblocksizes(suchas1MB).

    Figure 11. Sequential I/O Read Slot

    Figure 12. Sequential I/O Write Slot

    BED and FED Local RAM

    ThereisapoolofRAMinstalledoneachFEDorBEDPCB.ThisRAMissharedbythefourMPprocessors

    per

    PCB.

    This

    is

    where

    the

    LUN

    management

    tables

    (such

    as

    MP

    Workload

    Sharing),

    sequential

    detect

    histograms and limited local caching forhighly reuseddatablocks are located.There is alsoNVRAM

    associatedwitheachMPforholdingthesystemmicrocodeandoptionalsoftwarepackages.Notethat

    otherdesigns locatebothof these systems in amonolithicglobal cache systemandeveryaccess for

    these elements interfereswithuserdatablockmovement.On theUniversal Storage PlatformV the

    degreeofmemoryparallelismremovesthisburdenfromthedatacachesystem.

    Whennew firmware is tobe flashed into theNVRAMownedby eachMP, thenew software is first

    copied into the local RAM (over the internal MPtoSVP network), and then, onebyone, each MP

    suspends itsactivities,copies thenewmicrocode into itsNVRAM,and then reboots.ThenextMPon

    thatFEDorBEDboardwillthenfollowthesamestepsuntilallfourMPsareupdated.

    Front-End Director ConceptsThe Frontend Directorsmanage theaccessand schedulingof I/O requests toand from theattached

    servers. TheFEDscontainthehostportsthatinterfacewiththeserversor(whenusingOpenFibreFEDs)

    attachtoexternalstorage.TherearefourtypesofFEDfeaturesavailable: 8portOpenFibre,16port

    OpenFibre,8portFICON,and8portESCON.Thevarioustypesof interfaceoptionscanbesupported

    simultaneouslybymixingFEDfeatureswithintheUniversalStoragePlatformV.

    256KB Cache slot (sequential)

    256KB Read segment

    256KB Write segment

    256KB Cache slot (sequential)

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    24/103

    RESTRICTEDCONFIDENTIAL Page 24

    Table 4. Comparison: Universal Storage Platform V FED Features

    FEDOptions Features TotalPorts

    USPV USPV

    OpenFibre 014 0224

    or

    ESCON

    08

    0

    64

    or

    FICON 08 064

    RefertotheFigurebelowforthefollowingboardcomponentsdiscussion.

    The FED processors (CHPs, more commonly called MPs) manage the host I/O requests, the

    Shared Memory and Cache areas, and execute the microcode and the optional software

    featuressuchasDynamicProvisioning,UniversalReplicator,VolumeMigrator,andTrueCopy.

    The DX4 chips are Fibre Channel encoderdecoder chips that manage the Fibre Channel

    protocolonthecablestothehostports.

    TheDataAdapter(DTA)chipisaspecialASICforcommunicationwiththeCSWs.

    TheMicroprocessorAdapter(MPA)isanASICforcommunicationwiththeSMAPCBs.

    AllI/O,whetheritisReadsorWrites,internalorexternalstorage,mustpassthroughtheCache

    andSharedMemorysystems.

    Figure 13. Example of a Universal Storage Platform V FED Board (16-port Open Fibre Feature)

    MPA

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    25/103

    RESTRICTEDCONFIDENTIAL Page 25

    FED Microcode Updates

    TheUniversalStoragePlatformVmicrocodeisupdatedbyflashingeachMPsNVRAMintheFEDPCBs.A

    copyofthismicrocodeissavedinthelocalRAMoneachFEDPCB,andeachMPonthePCB(fourMPs)

    willperformahotupgrade ina rolling fashion.Whilehostportsare still live,eachMPwill take itself

    offline,flashthenewmicrocodefromthe localFEDRAM,andthenreboot itself. Itthenreturnstoan

    onlinestate

    and

    resumes

    processing

    I/O

    requests

    and

    executing

    the

    optional

    installed

    software.

    The

    typicaltimeforthisprocessis30secondsperMP.ThishappensindependentlyoneachFEDPCBandin

    parallel.

    FED FC-8 port, ESCON, and FICON: Summary of Features

    Table5showstheUniversalStoragePlatformVoptionsforthe8portFibreChannel,8portFICON,and

    8portESCONfeatures.Thefrontendportcounts,theFED(CHA)PCBnames,andtheassociatedCSWs

    arealsoindicated.

    Table 5. 8-port FED Features

    FC- 8 por t, FICON, ESCON Features

    Pair FED Boards Ports CSW

    Feature1 FED1 FED00 FED08 8 0

    Feature2 FED2 FED02 FED0A 16 0

    Feature3 FED3 FED01 FED09 24 1

    Feature4 FED4 FED03 FED0B 32 1

    Feature5 FED5 FED04 FED0C 40 2

    Feature6 FED6 FED06 FED0E 48 2

    Feature7 FED7 FED05 FED0D 56 3

    Feature8 FED8 FED07 FED0F 64 3

    FED FC-16 port Feature

    Table 6 shows theUniversal Storage PlatformVoptions for the 16port FibreChannel features. The

    frontendportcounts,theFEDPCBnames,andtheassociatedCSWsarealsoindicated.Thisportcount

    canbeincreasedupto224(usingupto14FEDfeatures)ifthenumberofBEDfeaturesisminimal(two)

    inordertoallowformoreFEDfeatures(bytaking12BEDslots).ThisFEDexpansionwouldbeusedfor

    adding a large amountofexternal storage to theUniversal StoragePlatformV,where less than 256

    internaldiskswouldbeconfigured.

    Table 6. Universal Storage Platform V FED FC-16 Feature

    FC- 16 port Feature

    Pair FED Boards Ports CSW

    Feature1 FED1 FED00 FED08 16 0

    Feature2 FED2 FED02 FED0A 32 0

    Feature3 FED3 FED01 FED09 48 1

    Feature4 FED4 FED03 FED0B 64 1

    Feature5 FED5 FED04 FED0C 80 2

    Feature6 FED6 FED06 FED0E 96 2

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    26/103

    RESTRICTEDCONFIDENTIAL Page 26

    Feature7 FED7 FED05 FED0D 112 3

    Feature8 FED8 FED07 FED0F 128 3

    Feature9 FED9 (BED3) 144 1

    Feature10 FED10 (BED4) 160 1

    Feature11 FED11 (BED5) 176 2

    Feature

    12

    FED

    12

    (BED6)

    192

    2

    Feature13 FED13 (BED7) 208 3

    Feature14 FED14 (BED8) 224 3

    I/O Request Limits and Queue Depths (Open Fibre)

    Every fibre channel path between the host and the storage array has a specificmaximum capacity,

    knownastheMaximumI/ORequestLimit.Thisisthelimittotheaggregatenumberofrequestsbeing

    directed against the individual LUNs. For theUniversal Storage PlatformV, this limit is 4096, and is

    associatedwitheachFEDMP.Onthe16portfeature,wheretherearetwoportsmanagedbyeachMP,

    4096 is theaggregate limitacross those twoports. [This ishandleddifferently forFICON,where the

    MPslimit is480forOpenExchangemessaging.]However,whenaport(actuallytheassociatedMP)is

    placedin

    external

    storage

    mode

    (becomes

    an

    initiator),

    the

    port

    I/O

    Request

    Limit

    drops

    to

    256.

    QueueDepthisthemaximumnumberofoutstandingI/OrequestsperLUN(internalorexternal).Thisis

    separatefromtheMPsportI/ORequestLimit.NotethatthereisnoconceptofQueueDepthforESCON

    orFICONvolumes.

    OntheUniversalStoragePlatformV,theperLUNnominalqueuedepth is32butcanbemuchhigher

    (up to4096) in theabsenceofotherworkloadson thatMPsports for internal LUNs. In the caseof

    external (Virtualized)LUNs,thequeuedepthcanbesetto2to128 intheUniversalVolumeManager

    GUIasshownbelow. IncreasingthedefaultqueuedepthvalueforexternalLUNs from8upto32can

    oftenhaveaverypositiveeffect(especiallyResponseTime)onOLTPlikeworkloads.Theactuallimitwill

    dependonhowmanyotherconcurrentlyactiveexternalLUNsthereareonthatport.

    Figure 14. Storage Navigator Screen Showing External Port Queue Depth Control

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    27/103

    RESTRICTEDCONFIDENTIAL Page 27

    MP Distributed I/O (Open Fibre)

    TheOpenFibreFEDs(notavailableonFICONorESCONFEDs)ontheUniversalStoragePlatformVhavea

    newfeatureknownasMPDistributedI/O.SomeoftheworkpertainingtoanindividualI/Oonasingle

    portmaybeprocessedbyanotherMPonthesamePCBratherthantheMPthatnormallycontrolsthat

    port. [See Appendix 8 for the portMP mappings.] A heavily loaded MP can hand off some tasks

    pertaining to data, metadata, software (SI, TC, HUR), and Hitachi Dynamic Provisioning

    PooladministrativetaskstoanotherMPthat is lessbusy. MPscanbeconfiguredtobe inoneoffour

    modes:target(host),HURA(Send),HURB(receive),andinitiator(external)mode.MPsmustbeinthe

    samemodeinordertoparticipateinDistributedI/O.Forexample,twoMPs(onthesameboard)whose

    portsareconfiguredininitiator(external)modewillworktogether.

    TheDistributedI/OfeatureislimitedtothefourMPsonanindividualPCBboard.Someofthistechnique

    wasactuallybegunintheUniversalStoragePlatformwithcode levelsV08andhigher,butmostofitis

    unique to the Universal Storage Platform V with its much higher powered MPs and faster Shared

    Memory system.Thedegree towhichDistributed I/O isavailable isdependenton thepatternof I/O

    requestsfromthehostaswellastheinternalstatusofworkalreadyinthesystem,backendprocessing

    ofpendingrequests,andsoforth.

    Distributed I/Oonan individualMPbeginswhen itreachesasustainedaverage50%busyrate,and is

    managed on a roundrobin basis with the other available MPs on the same FED board, taking into

    accounttheircurrentbusystate. Atableiskeptinlocalmemorytoassistinthisworkloaddispatching.

    AnMPthatreceivesanI/Ocommandoverahostpathandiscurrentlymorethan50%busywilllookfor

    anotherMPonthatboardtoassistwiththeoperation.Thisoffloaddoesnotrepresent loadbalancing

    among host ports,just a degree ofparallelprocessing of I/O commands to the backend based on

    availablecyclesamongtheMPsonaboard.[Note:WhentwoormoreMPsonaFEDboardareoperated

    inexternalmode,wehaveobservedthatinsomecasesthereisnominimumpercentbusythreshold

    requiredtotriggertheDistributedI/Omode.]

    IfanMPisdoingprocessingforotherMPsandthenreceivesitsownhostI/Ocommand,thisworkload

    couldbe

    distributed

    among

    the

    other

    MPs

    on

    the

    board.

    As

    the

    processors

    all

    get

    busy,

    the

    degree

    of

    sharingrapidlydropsoff.MPsthatcurrentlyhavenohostI/Ooftheirowncanworkuptoa100%busy

    rateonDistributedI/O loads.InthepresenceofhostI/OonaportassociatedwithanMP,thesumof

    thatMPshost I/O+distributed I/Omustbe lessthan50%.Oncehost I/OonanMPexceedsthe50%

    busyrate,thereisnofurtheracceptanceofDistributedI/ObythatMP.

    When Fibre Channel FED ports are configured for external storage, the associated owningMP no

    longer participates in the hostfacing Distributed I/O mode. That MP, and the one or two ports it

    manages,isplacedintoinitiatormode.(Note:EveryportonanMPisoperatedinthesamemode.)If

    twoormoreMPsonaPCBareusedforexternalattachment,thentheywillengagetheDistributedI/O

    function.AtleasttwoMPsmustbeinexternalmode(thusalloftheirports)inordertogetDistributed

    I/OforexternalloadsonaFEDboard.

    External Storage Mode I/O (Open Fibre)

    Whenusing theOpen Fibre FEDs, eachMP (and the one or twoports itowns)on the PCBmay be

    configured as one of four modes: a target (accepting host requests), an initiator (driving storage

    requests), as TC/HURA (Copyproducts send) or as TC/HURB (Copyproducts receive). The initiator

    mode ishowotherstoragearraysareattachedtoFEDportsonaUniversalStoragePlatformV.Front

    end ports from the external (secondary) array are attached to some number of Universal Storage

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    28/103

    RESTRICTEDCONFIDENTIAL Page 28

    PlatformVFEDportsas iftheUniversalStoragePlatformVwasaserver. Ifusingthe16port feature,

    thenbothportsmanagedbyanMPareplacedintothismode.Inessence,thatFEDMP(andtheoneor

    twoportsitowns)isoperatedasthoughitwasaBEDMP.LUNsvisibleontheexternalarraysportsare

    thenremappedbytheUniversalStoragePlatformVouttoFEDportsattachedtohosts.

    AsI/OrequestsarriveoverotherhostattachedFEDportsontheUniversalStoragePlatformVforthose

    LUNs,the

    normal

    I/O

    operations

    within

    the

    FED

    occur.

    The

    request

    is

    managed

    by

    Shared

    Memory

    tablesandthedatablocksgointothedatacache.ButtherequestisthenroutedtotheFEDMP(nota

    BEDMP) thatcontrolstheexternalpathswherethatLUN is located.Theexternalarrayprocessesthe

    requestasthoughitweretalkingtoaserverinsteadoftheUniversalStoragePlatformV.

    Cache Mode Settings

    Theexternalport(andallLUNspresenton it)maybeassignedwitheitherCacheMode=ONorOFF

    when it is configured in theUniversalStoragePlatformV.The ON settingprocesses I/OandCache

    behavioridenticaltointernalLDEVs.TheOFFsettingdirectstheUniversalStoragePlatformVtowait

    forWrite I/Ocompletionuntilthedata isacceptedbytheexternaldevice.TheOFFsettingdoesnot

    changeothercachehandlingbehaviorsuchasreadstotheexternalLUNs;howeverthischangemakesa

    significantdifference

    when

    writing

    to

    slower

    external

    storage

    that

    presents

    arisk

    for

    high

    write

    pending

    casestodevelop.TheuseofCacheMode=ONwillreportI/Ocompletiontothehostforwritesonce

    theblocksarewrittentoUniversalStoragePlatformVcache.CacheMode=ONshouldnormallynot

    beusedwhenhighwriteratesareexpected.Ingeneral,theruleofthumbistouseCacheMode=OFF.

    Other External Mode Effects

    RecallfromabovethattheQueueDepthforanexternalLUNcanbemodifiedtobebetween2and128

    in theUniversalVolumeManagerGUIas shownabove. Increasing thedefaultvalue from8up to32

    (probably the best choice overall), 64, or perhaps even 128 can often have a very positive effect

    (especially Response Time) on OLTPlike workloads. Along with this, recall that the maximum I/O

    RequestLimitforanexternalportis256(not4096).

    Whenports

    on

    aPCB

    are

    configured

    for

    use

    as

    external

    attachment,

    then

    their

    owning

    MP

    is

    operated

    ininitiatormode;therewillbenoDistributedI/OmodeuntilatleastoneotherMPissoconfigured.

    Theexternalportrepresents itselfasaWindowsserver.Therefore,whenconfiguringthehosttypeon

    theportsonthevirtualizedstoragearray,thenmustbesettoWindowsmode.

    Front-End Director Board DetailsThissectiondiscusses thedetailsof the four typesofFED featuresavailableon theUniversalStorage

    PlatformV.

    Open Fibre 8-Port FeatureThe8portOpenFibrefeatureconsistsoftwoPCBs,eachwithfour4Gb/secOpenFibreports.EachPCB

    hasfour800MHzChannelHostInterfaceProcessors(CHP)andtwoTachyonDX4dualportedchips.The

    SharedMemory (MPA)portcount isnoweightpathsperPCB (double theUniversalStoragePlatform

    PCB). SeeAppendix5forports,names,andMPassociations.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    29/103

    RESTRICTEDCONFIDENTIAL Page 29

    Figure 15. Universal Storage Platform V 8-port Fibre Channel Feature (showing both PCBs)

    Open Fibre 16-Port Feature

    The16portOpenFibre featureconsistsoftwoPCBs,eachwitheight4Gb/secOpenFibreports.Each

    PCBhasfour800MHzChannelHostInterfaceProcessors(CHP)andfourTachyonDX4dualportedchips.

    ThereareeightSharedMemory(MPA)pathsperPCB(doubletheUniversalStoragePlatformPCB).See

    Appendix6forports,names,andMPassociations.

    Figure 16. Universal Storage Platform V 16-port Fibre Channel Feature (showing both PCBs)

    MPA MPA

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    30/103

    RESTRICTEDCONFIDENTIAL Page 30

    ESCON 8-port Feature

    The8portESCONfeatureconsistsoftwoPCBs,eachwithfour17MB/sESCONFibreports. EachPCBhas

    two 800MHzChannelHost Interface Processors (CHP) andone ESA0 interface. There are twoCache

    paths and eight Shared Memory (MPA) paths per PCB. See Appendix 7 for ports, names, and MP

    associations.

    Figure 17. Universal Storage Platform V 8-port ESCON Feature (both PCBs shown)

    FICON 8-port Feature

    The8portFICONfeatureconsistsoftwoPCBs,eachwithfour4Gb/secFICONports. EachPCBhasfour

    800MHzChannelHost InterfaceProcessors (CHP) and twoHTP interface chips. There are two cache

    SwitchpathsandeightSharedMemory(MPA)pathsperPCB. SeeAppendix8forports,names,andMP

    associations.

    Figure 18. Universal Storage Platform V 8-port FICON Feature (both PCBs shown)

    400MH

    z

    400MH

    zCHP 400MH

    z

    400MH

    zCHP

    ESA0

    2 X 1064 MB/sec

    DATA Only

    8 X 150 MB/sec

    Meta-DATA Only

    DTADTA MPA

    400MH

    z

    400MH

    zCHP 400MH

    z

    400MH

    zCHP

    ESA0

    2 X 1064 MB/sec

    DATA Only

    8 X 150 MB/sec

    Meta-DATA Only

    DTADTA MPA

    MPA MPA

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    31/103

    RESTRICTEDCONFIDENTIAL Page 31

    Back-end Director ConceptsTheBackendDirectorsmanageaccessandschedulingofrequeststoandfromthephysicaldisks. The

    BEDsalsomonitorutilizationofthe loops,ParityGroups,processors,andstatusofthePCBs inapair.

    TheUniversalStoragePlatformVBEDfeature(2PCBs)hasfourpairsof4Gb/secloopssupportingupto

    128disks.EachPCBhasfour800MHzDKPprocessorsandfourDRRRAIDprocessors.

    The BED features control the Fibre Channel Loops that interfacewith the internal disks (but not to

    virtualizedexternalstorage).Every I/Ooperation to thediskswillpass through theCacheandShared

    Memory subsystems.Table7 lists theBED features, loopcountsanddiskcapacities for theUniversal

    StoragePlatformV system.Note thatBEDsmustbe installed aspairsofOptions (four PCBs)due to

    provide8pairsof loops to support the8diskParityGroups, thesebeingRAID5 (7D+1P)andRAID6

    (6D+2P).

    Notethat,whilethefirstpairofBEDfeaturescanactuallysupport384disks(the128disksintheControl

    FrameareattachedtoBED1andBED2seeFigure21below),HDShaslimitedthefieldconfigurationto

    support256disksuntilthesecondpairofBEDfeatureshasbeeninstalled.

    Table 7. Universal Storage Platform V BED Features

    BEDFeatureCount BackendLoops MaxDisks

    (2PCBs) Available Configurable

    1BED

    2BED 8 256(PMlimit)

    3BED

    4BED 16 640

    5BED

    6BED 24 896

    7BED

    8BED

    32

    1152

    BED Microcode Updates

    TheUniversalStoragePlatformVmicrocodeisupdatedbyflashingtheMPsintheBEDPCBs.Acopyof

    thismicrocode is saved in the localRAMoneachBEDPCB,andeachMPon thePCB (fourMPs)will

    performahotupgrade ina rolling fashion.While thedisk loopsare still live,eachMPwill take itself

    offline,flashthenewmicrocodefromthe localBEDRAM,andthenreboot itself. Itthenreturnstoan

    online state and resumesprocessing I/O requests to thediskson its loops. The typical time for this

    processis30secondsperMP.ThishappensindependentlyoneachBEDPCBandinparallel.

    BED Feature Summary

    Table8 shows theUniversalStoragePlatformVoptions for theBED feature.The totalbackend loop

    counts,thenamesoftheBEDPCBs,andtheassociatedCSWsarealsoindicated.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    32/103

    RESTRICTEDCONFIDENTIAL Page 32

    Table 8. BED Features

    BED Feature

    Pair BED Boards Loop PairsCSWUsed

    Basic BED1 BED10 BED18 0

    BED2 BED12 BED1A 8 0

    Feature2 BED3 BED11 BED19 1

    BED4 BED13 BED1B 16 1

    Feature3 BED5 BED14 BED1C 2

    BED6 BED16 BED1E 24 2

    Feature4 BED7 BED15 BED1D 3

    BED8 BED17 BED1F 32 3

    Back-End-Director Board Details

    BED DetailsFigure19isahighleveldiagramofthepairofPCBsfortheUniversalStoragePlatformVBEDfeature.

    Figure 19. Universal Storage Platform V 8-port BED Feature (both PCBs shown)

    Back End RAID level Organization

    Figure20isaviewoftheUniversalStoragePlatformVsBEDtodiskorganization. Thisisalsohowthe

    Lightning 9900V was organized. These two BED features (4 PCBs) can support 256 disks, with the

    exceptionbeingBED1andBED2,whichalsocontrolthe128disksintheDKCcontrolframe(hence384

    disksoverall).IttakestwoBEDoptionstosupportalloftheParityGrouptypesduetothesmaller4loop

    PCBs.Whilethesecondloopfromtheothercontrollerwouldtakeovertheloadfromthefailedpartner

    loop,thesubsequentfailureofthatalternateloopwouldtakedownalloftheRAID10(4D+4D)orRAID

    5(7D+1P)ParityGroups.Becauseofthis,pairsofBEDs(4PCBs)mustalwaysbeinstalled.

  • 7/25/2019 Hitachi USPV Architecture and Concepts.pdf

    33/103

    RESTRICTEDCONFIDENTIAL Page 33

    Figure 20. Universal Storage Platform V Back-end Disk Layout. (2 BED features)

    Universal Storage Platform V: HDU and BED Associations by FrameFigures21and22illustratethelayoutsofthe64diskcontainers(HDU),theFrames,andtheassociated

    BEDownershipsoftheUniversalStoragePlatformV.ThenamesoftherangesofArrayGroupsarealso

    shown.Therearetwoviewspresented:aregularfrontalviewandaviewofthebackasseenfromthe

    back.

    Forexample,lookingatFigure21,thebottomofFrameDKUR1showsthefrontoftheHDUwhosedisks

    arecontrolledbytheeight4GbitloopsonBED1(pg3,yellow)andBED2(pg4,orange).TheHDUissplit

    inhalfbypowerdomains,wherethe32disksonthelefthalf(16onthefrontand16onthebackofthe

    Frame)areattachedtoBED2andthe32ontherighthalfgotoBED1.Notethediagonaloffsetsofthe

    HDUhalves

    from

    front

    to

    rear

    of

    the

    two

    associated

    16

    disk

    groups

    by

    HDU

    power

    domains.

    Figure 21. Front View of the Universal Storage Platform V Frames, HDUs, with BED Ownership

    bed8 bed7 bed8 bed7 bed2 bed1 bed4 bed3 bed4 bed3

    16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD

    pg18 pg17 pg10 pg9 pg2 pg1 pg6 pg5 pg14 pg13

    bed8 bed7 bed8 bed7 bed2 bed1 bed4 bed3 bed4 bed3

    16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD

    pg18 pg17 pg10 pg9 pg2 pg1 pg6 pg5 pg14 pg13

    bed6 bed5 bed6 bed5 bed-5 bed-1 bed2 bed1 bed2 bed1

    16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD 16 HDD

    pg16 pg15 pg8 pg7 bed-6 bed-2 pg4 pg3 pg12 pg11

    bed6 bed5 bed6 bed5