hepix storage task force roger jones lancaster chep06, mumbai, february 2006
TRANSCRIPT
HEPiX Storage Task ForceHEPiX Storage Task Force
Roger JonesLancaster
CHEP06, Mumbai, February 2006
February 13th 2005
MandateMandate
– Examine the current LHC experiment computing models.
– Attempt to determine the data volumes, access patterns and required data security for the various classes of data, as a function of Tier and of time.
– Consider the current storage technologies, their prices in various geographical regions and their suitability for various classes of data storage.
– Attempt to map the required storage capacities to suitable technologies.
– Formulate a plan to implement the required storage in a timely fashion.
February 13th 2005
MembershipMembership
• -o- Roger Jones, Lancaster, ATLAS [email protected]• -o- Andrew Sansum, RAL, [email protected]• -o- Bernd Panzer/ Helge Meinhard, CERN,
[email protected]• -o- David Stickland (latterly) (CMS)• -o- Peter Malzacher GSI Tier-2, Alice, [email protected]• -o- Andrei Maslennikov,CASPUR,
[email protected]• -o- Jos van Wezel GridKA, HEPiX, [email protected]
• Shadow 1 [email protected]• Shadow 2 [email protected]
• -o- Vincenzo Vagnoni Bologna, LHCb, [email protected]
• -o- Luca dell’Agnello• -o- Kors Bos, NIKHEF by invitation
Thanks to all members!
February 13th 2005
Degree of SuccessDegree of Success
• Assessment of Computing Model– RJ shoulders the blame for this area!– Computing TDRs help – see many talk at this conference– Estimates of contention etc rough; toy simulations are
exactly that, and we need to improve this area beyond the lifetime of the task force.
• Disk– Thorough discussion of disk issues– Recommendations, prices etc
• Archival media– Less complete discussion– Final reporting here in April HEPiX/GDB meeting in Rome
• Procurement– Useful guidelines to help tier 1 and tier 2 procurement
February 13th 2005
OutcomeOutcome
• Interim document available through the GDB• Current High Level Recommendations
– It is recommended that a better information exchange mechanism be established between (HEP) centres to mutually improve purchase procedures.
– An annual review should be made of the storage technologies and prices, and a report made publicly available.
– Particular further study of archival media is required, and tests should be made of the new technologies emerging.
– A similar regular report is required for CPU purchases. This is motivated by the many Tier-2 centres now making large purchases.
– People should note that the lead time from announcement to effective deployment of new technologies is up to a year.
– It is noted that the computing models assume that archived data is available at the time of attempted processing. This implies that the software layer allows pre-staging and pinning of data.
February 13th 2005
InputsInputs
• Informed by C-TDRs and computing model documents– Have tried to estimate contentions etc, but this
requires much more detailed simulation work– Have looked at data classes and associated
storage/access requirements, but his could be taken further
• E.g. models often provide redundancy on disk, but some sites assume they still need to back disk to tape in all cases
– Have included bandwidths to MSS from LHC4 exercise, but more detail would be good
February 13th 2005
Storage ClassesStorage Classes
1) tape, archive, possibly offline (vault), access > 2 days, 100 MB/s
2) tape, on line in library, access > 1 hour, 400 MB/s
3) disk, any type, in front of tape caches4) disk, SATA type optimised for large files,
sequential Read only IO 5) disk, SCSI/FC type optimised for small files,
Read/Write random IO6) disk, high speed and reliability RAID 1 or 6
(catalogues, home directories etc)
February 13th 2005
DiskDisk
• Two common disk types– SCSI/FibreChannel
• Higher speed and throughput• Little longer lifetime (~4 years)• More expensive
– SATA (II)• Cheaper• Available in storage arrays• Lifetime >3 years (judging by warrantees!)• RAID5 gives fair data security
– Could still have 10TB/1PB unavailable on any given day• RAID6 looks more secure
– Some good initial experiences– Care needed with drive and other support
• Interconnects– Today
• SATA (300 MB/s)– Good for disk to server, point to point
• Fibre channel (400 MB/s)– High speed IO interconnect, fabric
– Soon (2006)• Serial Attached SCSI (SAS – multiple 300 MB/s)• Infiniband (IBA 900 MB/s)
February 13th 2005
ArchitecturesArchitectures
• Direct Attached Storage– Disk is directly attached to CPU– Cheap but administration costly
• Network Attached Storage– File servers on Ethernet network– Access by file-based protocols
• Slightly more expensive but smaller number of dedicated nodes
• Storage in a box – servers have internal disks• Storage out of box – fiber or SCSI connected
• Storage Area Networks– Block not file transport– Flexible and redundant paths, but expensive
Clients
IP networknode n
node 1
node 1
node n
node 2
Storage Controller
February 13th 2005
Disk Data AccessDisk Data Access
• Access rates– 50 streams per RAID group or 2 MB/s per
stream on a 1 Gbit interface – Double this for SCSI
• Can be impaired by– Software interface/SRM– Non-optimal hardware configuration
• CPU, kernel, network interfaces
– Recommend 2 x nominal interfaces for read and 3 x nominal for write
February 13th 2005
Disk Disk RecommendationsRecommendations
• Storage in a box (DAS/NAS disks together with server logic in a single enclosure)– most storage for a fixed cost– more experience with large SATA + PCI RAID deployments
desirable– more expensive solutions may require less labour/be more
reliable (experiences differ)– high quality support may be the deciding factor
• Recommendation– Sites should declare the products they have in use
• A possible central place would be the central repository setup at hepix.org
– Where possible, experience with trial systems should be shared (Tier-1s and CERN have a big role here)
February 13th 2005
Procurement Procurement GuidelinesGuidelines
• These come from H Meinhard• Many useful suggestions for
procurement• May need to be modified to local rules
February 13th 2005
Disk PricesDisk Prices
• DAS/NAS: storage in a box (disks together with server logic in a single enclosure)– 13500-17800 € per usable 10 TB
• SAN/S: SATA based storage systems with high speed interconnect.– 22000-26000 € per usable 10 TB
• SAN/F: FibreChannel/SCSI based storage systems with high speed interconnect– ~55000 € per usable 10 TB
• These numbers are reassuringly close to those from Pasta reviews, but it should be noted there is a spread from geography and other situations
• Evolution (raw disks)– Expect Moore’s Law density increase of 1.6/year between
2006 and 2010– Also consider effect of increase at only 1.4/year– Cost reduction 30-40% per annum
February 13th 2005
Tape and ArchivalTape and Archival
• This area is ongoing and needs more work– Less frequent procurements
• Disk system approaches active tape system costs by ~2008• Note computing models generally only assume archive copies at the
production site
• Initial price indications similar to LCG planning projections– 40 CHF/TB for medium– 25MB/s effective scheduled bandwidth drive + server is 15kCHF -
35 kCHF– Effective throughput is much lower for chaotic usage– 6000 slot silo is ~500 kCHF
• New possibilities include spin on demand disk etc– Needs study by T0 and T1s, should start now– Would be brave to change immediately
February 13th 2005
PlansPlans
• The group is now giving more consideration to archival– Need to do more on archival media– General need for more discussion of storage
classes – More detail to be added on computing model
operational details
• Final report in April• Further task forces needed every year or
so