update on the spider ii file system

8
16 Sarp Oral | SC’13, DDN User Meeting Spider II Specs 1 SFA12K40 IB FDR 10 60Sdisk enclosure 560 2 TB NL SAS drives 36 SFA12K40 IB FDR 10 60Sdisk trays/couplet 560 2 TB NL SAS/couplet 20,160 drives 40 PB capacity (raw) > 1 TB/s performance Scalable Storage System Test and Development System 32 PB capacity (aker RAID) > 1 TB/s performance 288 Lustre OSS total 8 OSS per couplet 4 MDS and 2 MGS Configured in 4 rows 2x 108Sport FDR IB switches 36x 36Sport FDR IB switches 440 Lustre Titan LNET routers (432 for OSS, 8 for MDS) Facts

Upload: insidehpc

Post on 24-May-2015

926 views

Category:

Technology


1 download

DESCRIPTION

In this presentation from the DDN User Meeting at SC13, Sarp Oral provides an update on the Spider II file system at Oak Ridge National Laboratory. Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/

TRANSCRIPT

Page 1: Update on the Spider II File System

16 Sarp Oral | SC’13, DDN User Meeting

Spider%II%Specs%

1'SFA12K40'IB'FDR'10'60Sdisk'enclosure'560'2'TB'NL'SAS'drives'

36'SFA12K40'IB'FDR'10'60Sdisk'trays/couplet'560'2'TB'NL'SAS/couplet'20,160'drives'40'PB'capacity'(raw)'>'1'TB/s'performance''

Scalable'Storage'System'

Test'and'Development'System'

32'PB'capacity'(aker'RAID)'

>'1'TB/s'performance''

288'Lustre'OSS'total'

8'OSS'per'couplet'

4'MDS'and'2'MGS'

Configured'in'4'rows'

2x'108Sport'FDR'IB'switches'

36x'36Sport'FDR'IB'switches'

440'Lustre'Titan'LNET'routers'(432&for&OSS,&8&for&MDS)&

Facts'

Page 2: Update on the Spider II File System

17 Sarp Oral | SC’13, DDN User Meeting

Spider%II%R%Architecture%

Enterprise Storagecontrollers and large

racks of disks are connectedvia InfiniBand.

36 DataDirect SFA12K-40controller pairs with

2 Tbyte NL- SAS drives and 8 InifiniBand FDR connections per pair

Storage Nodesrun parallel file system software and manage incoming FS traffic.

288 Dell servers with

64 GB of RAM each

SION II Networkprovides connectivity

between OLCF resources and

primarily carries storage traffic.

1600 ports, 56 Gbit/secInfiniBand switch

complex

Lustre Router Nodesrun parallel file system

client software andforward I/O operations

from HPC clients.

432 XK7 XIO nodesconfigured as Lustre

routers on Titan

Titan XK7

Other OLCFresources

XK7 Gemini 3D Torus

9.6 Gbytes/sec per directionInfiniBand56 Gbit/sec

Serial ATA6 Gbit/sec

Page 3: Update on the Spider II File System

18 Sarp Oral | SC’13, DDN User Meeting

Spider%II%R%Facili5es%

•  Sits'on'a'36’’'raised'floor'and'forced'air'cooled'•  4'iden>cal'rows'in'hotSaisle/coldSaisle'configura>on'

–  9'racks'for'DDN'SFA12KS40'equipment'

–  1'infrastructure'rack'–  ColdSaisle'is'fully'contained'with'overhead'panels'and'sliding'doors'at'each'end'of'the'rows'•  Prevents'hotSair'coldSair'mixing'and'increases'cooling'efficiency'

•  25%'perforated'>les'used'to'provide'coldSair'to'coldSaisles'•  Fully'compliant'with'the'requisite'Na>onal'Fire'Protec>on'Associa>on'(NFPA)'codes'

•  Total'space'required'is'672'square'feet'

Page 4: Update on the Spider II File System

19 Sarp Oral | SC’13, DDN User Meeting

Spider%II%R%Facili5es%• Ran'series'tests'on'a'DDN'SFA12KS40'testbed'unit'under'various'I/O'mode'and'load'scenarios'–  9'kW'per'DDN'rack'nominal'load'

•  Total'file'system'load'including'infrastructure'racks'is'400'kW'and'total'cooling'load'is'114'tons'

•  Each'rack'is'fed'with'a'pair'of'208VAC'3Sphase'electrical'feeds,'protected'by'a'50A'10%Srated'breaker'–  Fed'from'two'different'transformer'sources'

–  DDN'SFA12K'power'distribu>on'system'is'both'load'balanced'and'supports'failSover,'OLCF'can'conduct'both'scheduled'and'unscheduled'maintenance'on'one'transformer'without'disrup>ng'the'file'system'opera>on'

–  Neither'electrical'connec>on'is'protected'by'UPS'

Page 5: Update on the Spider II File System

20 Sarp Oral | SC’13, DDN User Meeting

Integra5on%efforts%

•  Lustre'2.4'tes>ng'–  SmallSscale'

•  Round'the'clock'tes>ng'for'stability,'regression,'and'performance'on'a'single'cabinet'Cray'XK7'(Arthur)'

•  Home'built'Cray'Lustre'2.4'client'as'well'as'servers'

•  Early'detec>on'and'correc>on'of'problems'and'bugs'

–  LargeSscale'•  Weekly'tes>ng'on'Titan'

•  Iden>fied'some'number'of'problems'at'scale'

•  IB'FDR'tes>ng'on'Cray'–  Cray'and'Mellanox'

Page 6: Update on the Spider II File System

21 Sarp Oral | SC’13, DDN User Meeting

Schedule%•  System'infrastructure'delivery''

–  Completed'

•  Block'storage'delivery'–  Completed'

•  Block'acceptance'–  Completed'–  Achieved'1.3'TB/s'for'reads'and'1.2'TB/s'for'writes'at'the'blockSlevel'–  Need'to'reSvisit'for'a'few'items'Q1’14'

•  Lustre'support'with'Intel''–  Completed.'Level'1,'2,'and'3'support'with'Intel'

•  File'system'integra>on'–  Completed'

•  Rolling'into'produc>on'–  Completed'

•  Performance'tuning'–  On'going.'To'be'completed'by'Q1‘14.'

Page 7: Update on the Spider II File System

22 Sarp Oral | SC’13, DDN User Meeting

Page 8: Update on the Spider II File System

23 Sarp Oral | SC’13, DDN User Meeting

Ques>ons?''[email protected]'

23

The research and activities described in this presentation were performed using the resources of the National Center for Computational Sciences at

Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy

under Contract No. DE-AC0500OR22725.