sanger hpc infrastructure report (2007)

1. Sanger Institute Site Report Nov 2007 Guy Coates [email_address]

Funded by Wellcome Trust.

2 ndlargest research charity in the world.

~700 employees.

Large scale genomic research.

Sequenced 1/3 of the human genome (largest single contributor).

We have active cancer, malaria, pathogen and genomic variation studies.

All data is made publicly available.

Websites, ftp, direct database. access, programmatic APIs.

HEPIX Themes:

We have particle accelerators which throw out massive amounts of data.

We need lots of storage.

We need lots of compute.

Managing it is hard.

Different science, same problems.

We have exponential growth in storage and compute.

Storage doubles every 12 months.

We will haveat least2PB of disk next year.

New sequencing technologies are a huge challenge.

~50x increase in data production in the space of 6 months.

New sequencing tech is still growing.

Known unknowns:

Higher data output from our current machines.

More machines.

Unknown unknowns:

New big science projects are just a good idea away...

4x250 M 2Data centres.

2-4KW / M 2cooling.

3.4MW power draw.

Overhead aircon, power and networking.

Allows counter-current cooling.

More efficient.

Technology Refresh.

1 data centre is an empty shell.

Rotate into the empty room every 4 years.

Refurb one of the in-use rooms with the current state of the art.

Fallow Field principle.

SAN Fabric.

Dual Brocade fabric (27 switches per fabric).

~1PB in production today.

HP EVA 5000/8000 arrays.

Holds the bulk of our data (~1PB)

Dual controller, Fibre channel disks, ~50TB per array.

virtual raid5(effectively Raid 6).

Don't need to worry about raidset size being nice multiples of physical disk size etc.

Allows rapid allocation of storage to projects as required.

Storage is either directly attached, or used with cluster filesytems.

Bluearc Titan.

NFS serving for home directories and storage which needs concurrent windows / linux access.

EVA storage at the back end.

Backup.

Veritas netbackup to a Storagetek SL8500 library.

12 drives (LTO2&3), 1500 slots.

3800 cores in >1500 blades and rack mount servers.

Blades preferred due to ease of management, space and power efficiency.

Mostly x86_64 servers, some older x86 systems.

Single, dual and quad core.

Token ia64 for large memory machines.

(SGI Altix 350, 16 CPUs, 192GB memory).

We use Debian Linux as primary OS.

Badly burned by proprietary OS and file-systems.

We still have legacy Alpha / Tru64 / AdvFS data and apps which require migration to Linux.

99% of systems run Debian Sarge / Etch.

Run 64bit on x86_64 CPUs.

SLES9 on Oracle server to say inside support matrix.

Complex User-base.

~300 users, diverse workload.

Typically IO bound, integer intensive, single threaded.

Scales well on clusters (Apart from the IO bit).

Deployment.

Debian FAI automated installer. Integrated with blade management systems for fire-and-forget deployment.

~2 minutes for a complete OS install.

Updates.

cfengine, dsh.

Monitoring.

ganglia, nagios.

RequestTracker.

Used by many software development and science teams within the Institute as well as the System team.

External engineers and collaborators have access.

30k tickets per year.

Heartbeat 2.

2-8 node cluster for high-availability.

Mostly mysql + apache using SAN for storage failover.

Sequencing Trace archive.

Hold results from all DNA sequencing experiments, everywhere.

Mirrored with NCBI trace archive.

Currently ~60TB / 8 Billion traces.

Doubles in size every 12 months.

Originally data was on file-system, meta-data in oracle.

Billions of small files (20-80k).

File-system worst-case.

Hard to backup, hard to manage space.

All on Tru64 /advfs. (Dead architecture).

We decided to move everything into oracle.

How hard can it be?

Tera-scale databases are common (according to Oracle).

Primary.

4 node Oracle 10g RAC cluster (4 core x86_64, 16GB RAM).

60TB of EVA / fibre-channel storage, Oracle ASM clustered file-system.

Backup Strategy.

Replicate database to a secondary database with Oracle dataguard.

2 node RAC cluster with 60TB of MSA1000 storage (cheap-n-cheerful fibre-channel).

15 minute delay in replication to protect against finger trouble.

Secondary database is the primary backup (disk-to-disk, fast).

We can run off the secondary if we need to.

Dumps to tape taken from the secondary.

Big oracle is hard.

Oracle is not well tested (especially by Oracle!) on this scale.

How will we cope with future growth of the database?

Exclusively Blade.

588IBM HS20/LS20 (42 chassis), 128 HP BL460c (8 chassis).

2224 cores (mix of 32 and 64 bit), 2GB memory /core.

Raid 1 system disks.

Debian Sarge + custom kernel.

LSF used for job scheduling.

Typically 10k-100k jobs in the system.

Networking.

Extreme Networks.

1-2x GigE edge.

2-4x GigE trunks.

2x 10GigE core.

Systems distributed across data centres.

HP SFS / Lustre v2.1.1.

Based on CFS lustre 1.4.

In house client port to Debian.

Lustre for work / scratch areas.

10 OSS / 20 OST.

20 SFS20 arrays.

Dual tailed SCSI (highly available).

12x 250GB SATA disks.

Raid 6 + 1 hot spare.

35 TB usable storage.

Reliability sacrificed for performance.

We have NFS as well.

Lustre random access / meta-data performance is rubbish.

NFS for home directories.

Sustained 11-12 Gbit/s peak.

This is real work, not a benchmark.

We have 20 Illumina (n e Solexa) sequencing machines.

This will produce 20-30TBper day.

Machines will run 24x7.

We need to keep raw data for ~2 weeks for analysis and QC.

320 TB Lustre staging area.

8 EVA8000 arrays, 28 OSSs, 160 OSTs.

(8 luns per OSS limit required more OSS than planned).

3 x 100TB file-systemsfor production + 50TB file-system for development.

Smaller file-systems hedge against EVA failure.

Compute.

256 HP BL460c blades. 600 cores, mixture of dual / quad core.

Extreme networks Black Diamond 8810 switch (360 non-blocking GigE ports).

Scratch storage.

25TB SFS20 lustre scratch area for ad-hoc analysis.

Tim Cutts

Simon Kelley

Pete Clapham

Mark Flint

James Beal

Jon Nicholson

Russell Vincent

Dave Holland

Martin Burton

Eamonn O'Toole

Gavin Brebner

Phil Butcher