beyond monitoring: proactive server preservation in an hpc environment

25
Monitoring: Proactive Server Preservation in an HPC Environment Chad Feller University of Nevada, Reno 8 May 2012 Thesis Defense

Upload: shino

Post on 24-Feb-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Beyond Monitoring: Proactive Server Preservation in an HPC Environment. Chad Feller University of Nevada, Reno 8 May 2012 Thesis Defense. Acknowledgements. My wife, Veronica My kids My good friend, Derek Eiler My committee, Dr. Harris, Dr. Dascalu , Dr. Schlauch. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Beyond Monitoring:

Proactive Server Preservation in an HPC Environment

Chad FellerUniversity of Nevada, Reno

8 May 2012Thesis Defense

Page 2: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Acknowledgements

My wife, Veronica

My kids

My good friend, Derek Eiler

My committee, Dr. Harris, Dr. Dascalu, Dr. Schlauch

Page 3: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Background

Monitoring systems

Increasingly sophisticated

Still large holes in capabilities

Page 4: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/9/9

Page 5: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/9/9Power failure sequence kicks in

UPS caught outageGenerator started up

Temperature risingUPS only powers servers

Power switches to generators

Temperature still rising

Page 6: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/9/9

Page 7: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/10/9

Page 8: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/10/9

Page 9: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

HPC

Page 10: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Computing Density

Page 11: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment
Page 12: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

ILOM

Page 13: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/10/9

Page 14: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

9/10/9

Page 15: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

7/20/11

Page 16: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Environmental Considerations

ILOM/IPMI

Sun Grid Engine

Linux

Page 17: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Architecture

Page 18: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Frontend

Page 19: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

MaI

n Loop

Page 20: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Local Testing

Page 21: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Global Testing

Page 22: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Global Testing

Page 23: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

Demo

Page 24: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment

ConclusionDeveloped a temperature monitoring system

Local PerspectiveGlobal PerspectiveIntelligent ResponseDesigned for HPC & Enterprise servers

Modular ImplementationCan be easily adapted to other hardwareSoftware can be leveraged to other environments

Tested

Page 25: Beyond Monitoring:  Proactive Server Preservation in an HPC  Environment