clemson: solving the hpc data deluge

12

Click here to load reader

Upload: insidehpc

Post on 08-May-2015

363 views

Category:

Technology


1 download

DESCRIPTION

In this presentation from the Dell booth at SC13, Boyd Wilson from Clemson describes how Big Data gets handled for HPC at the University. "As science drives a rapidly growing need for storage, existing environments face increasing pressure to expand capabilities while controlling costs. Many researchers, scientists and engineers find that they are outgrowing their current system, but fear their organizations may be too small to cover the cost and support needed for more storage. Join these experts for a lively discussion on how you can take control and solve the HPC data deluge." Watch the video presentation: http://insidehpc.com/2013/12/03/panel-discussion-solving-hpc-data-deluge/

TRANSCRIPT

Page 1: Clemson: Solving the HPC Data Deluge

Clemson    HPC  Storage  Dell  Panel  SC13    Boyd  Wilson  So,ware  CTO  Clemson  University      

Page 2: Clemson: Solving the HPC Data Deluge

Outline  

• Palme9o  Cluster  • Wide  Area  Storage  Across  the  Innova@on  PlaAorm  • Collec@ve  Cluster    (Real-­‐Time  Data  Aggrega@on  and  Analy@cs  Cluster)    • Performance  Numbers  • Research  DMZ/Network    

Page 3: Clemson: Solving the HPC Data Deluge

Palmetto  Storage  

Primary  Research  Cluster  at  Clemson  •  1972  nodes  •  22928  Cores  •  998400  Cuda  Cores  •  396  TF  (only  benchmarked  newest  GPU  nodes)  •  ~120  +  TF  addi@onal  not  benchmarked.  •  Condominium  Model  •  Home  Storage  SAMQFS  backed  by  SL8500  (6PB)  •  Scratch  OrangeFS  

Page 4: Clemson: Solving the HPC Data Deluge

 SAM  QFS  Home  and  Archive  on  

SL8500  

Palmetto  Storage  

Scratch  •  32  R510  •  16  R720  •  512TB  OrangeFS  (v2.8.8)  

FDR  IB  Nodes  200  Nodes  

400  Nvidia  K20    396  TF  MX  Nodes  

1622  Nodes  96  TF  

FDR  IB  10G  MX  

NFS  Home/Archive  •  SAMQFS  over  NFS  •  120TB  Disk  •  6PB  Tape  

10G  Eth  

96  IB  Nodes  with    

Page 5: Clemson: Solving the HPC Data Deluge

 Innova@on  PlaAorm  

Data  Access  

 Campus  Data  Access  

Palmetto  Scratch    Next  Steps  

•  32  Dell  R720  •  520TB  Scratch  •  OrangeFS  •  WebDAV  to  OrangeFS  •  Hadoop  over  OrangeFS  with  MyHadoop  

FDR  IB  Nodes  200  Nodes  

400  Nvidia  K20  GPU  396  TF  

MX  Nodes  1622  Nodes  

96  TF  

FDR  IPoIB  10G  IPoMX  

WebDAV  

Mul@ple  10G  Eth  

ScienceDMZ  

Mul@ple  10G  Eth  /  100  G  

Page 6: Clemson: Solving the HPC Data Deluge

OrangeFS  Clients  •  File  Write  37Gb/s  

•  Server  Hw  problems  &  network  packet  loss  during  tests  •  Perfsonar  49Gb/s  ini@al  •  Later  retest  ~70Gb/s  with  tuning  •  Addi@onal  File  tes@ng  planned  

(Ini@al  tes@ng  systems  had  to  move  to  produc@on)  

Clemson  –  USC  100Gb  tests  

12  Dell  R720  OrangeFS  Servers  

Page 7: Clemson: Solving the HPC Data Deluge

OrangeFS  Clients  

SC13  Demo  

OrangeFS  Clients  

16  Dell  R720  OrangeFS  Servers  SC13  Floor  

•  Clemson  •  USC  •  I2  •  Omnibond  

Page 8: Clemson: Solving the HPC Data Deluge

 Innova@on  PlaAorm  

Data  Access  

 Campus  Data  Access  Social  Data  Input  

The  “Collective”  Cluster  •  12  R720  •  170TB  •  D3  based  Vis  Toolkit  called  SocalTap  

•  Social  Media  Aggrega@on  Via  GNIP  

•  Elas@c  Search  •  Hadoop  MapReduce  •  OrangeFS  •  WebDAV  to  OrangeFS  

Palme9o  

WebDAV  

Mul@ple  10G  Eth  

ScienceDMZ  

Page 9: Clemson: Solving the HPC Data Deluge

OrangeFS  on  Dell  R720s  

•  16  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet  •  32  Clients  reached  nearly  12GB/s  read  and  8GB/s  write  

#  Write  iozone  -­‐i  0  -­‐c  -­‐e  -­‐w  -­‐r  $RS  -­‐s  4g  -­‐t  $NUM_PROCESSES  -­‐+n  -­‐+m  $CLIENT_LIST  #  Read  iozone  -­‐i  1  -­‐c  -­‐e  -­‐w  -­‐r  $RS  -­‐s  4g  -­‐t  $NUM_PROCESSES  -­‐+n  -­‐+m  $CLIENT_LIST    

Page 10: Clemson: Solving the HPC Data Deluge

MapReduce  over  OrangeFS  

•  8  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet  •  Remote  Case  adds  an  additional  8  Identical  Servers  and  does  all  OrangeFS  work  Remotely  and  only  Local  work  is  done  on  Compute  Node  (Traditional  HPC  Model)  

•  *25%  improvement  with  OrangeFS  running  on  Separate  nodes  from  Map  Reduce    

Page 11: Clemson: Solving the HPC Data Deluge

MapReduce  over  OrangeFS  

•  16  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet  •  Remote  Clients  are  Dell  R720s  with  single  SAS  disks  for  local  data  (vs.  12  disk  arrays  in  the  previous  test).  

Page 12: Clemson: Solving the HPC Data Deluge

Clemson  Research  Network  

100Gig&Tagged&Trunk

Brocade(MLx32(Core((Router

Clemson

CLightCollaborator

F/W&(ACL)&and&Route&Filter

Science(DMZ(

Peer&Link

Perimeter&F/W

Dell&Z9000

Dell&S4810

Dell&S4810Dell&S4810

DMZ

I2&InnovaJon&PlaKorm

PerfSonar

PerfSonar

PerfSonar

PerfSonar

Host&Firewall

Internet/I2/NLR

PerfSonar

CC7NIE

Palme>oNet

Innova@on(PlaAorm

Internet

Campus

Top&of&RackSamQFS

Fibre(Channel