icer user meeting 3/26/10. agenda what’s new in icer (wolfgang) whats new in hpcc (bill) results...

iCER User Meeting

3/26/10

Agenda

• What’s new in iCER (Wolfgang)• Whats new in HPCC (Bill)• Results of the recent cluster bid• Discussion of buy-in (costs, scheduling)• Other

What’s New in iCER

New iCER Website

• Part of VPRGS– News– Showcased

Projects– Supported

Funding– Recent

Publications

http://icer.msu.edu

User Dashboard

• Common Portal to User Resources– FAQ– Documentation– Forums– Research

Opportunities– Known Issues

http://wiki.hpcc.msu.edu

Current Research Opportunities

• NSF Postdoc Fellowships for Transformative Computational Science using CyberInfrastructure

• Website– Proposals– Classes– Seminars– Papers– Jobs

http://wiki.hpcc.msu.edu

• 50/50 match from iCER for a postdoc for large grant proposals (multi-investigator, inter-disciplinary)

• Currently only three matches picked up– Titus Brown– Scott Pratt– Eric Goodman

• Several other matches promised, but grants not decided yet

• More opportunities!

Postdoc Matching

• New Hire!• Eric McDonald– System Programmer– Partnership with NSCL (Alex Brown et al.)

Personnel

• Interdisciplinary graduate education in high-performance-computing & science

• Big Data• Leads:– Dirk Colbry– Bill Punch

IGERT Grant Proposal

• NSF STC– Funded, starting in June– $5M/year for 5 years

• New joint space with iCER & HPCC– First floor BPS– Former BPS library space

BEACON

What’s New in HPCC

Graphics Cluster

32 node cluster• 2 x Quad 2.4GHz• 18GB ram• Two Nvidia M1060• no Infinband (Ethernet

only)

Result of a Buyin

• 21 of the nodes were purchased by funds from users

• Can be used by any HPCC user

Each nVidia Tesla M1060

• Number of Streaming Processor Cores 240• Frequency of processor cores 1.3 GHz• Single Precision peak floating point performance 933 gigaflops• Double Precision peak floating point performance 78 gigaflops• Dedicated Memory 4 GB GDDR3 • Memory Speed 800 MHz• Memory Interface 512-bit • Memory Bandwidth 102 GB/sec• System Interface PCIe

Example Script

#!/bin/bash –login#PBS –l nodes=1:ppn=1:gfx10,walltime=01:00:00#PBS –l advres=gpgpu.6364,gres=gpu:1

cd ${PBS_O_WORKDIR}module load cudamyprogram myarguments

CELL Processor

2 Playstation 3’s• running linux• for experimenting

with CELL• dev-cell08 and test-

cell08 (see the web for more details)

Green Restrictions

• The machine Green is still up an running, especially after having removed some problematic memory

• Mostly replaced by AMD fat nodes• On April 1st, it will be reserved for jobs

requesting 32 cores (or more) and/or 250 GB of memory (or more)

• Hope to help people running larger jobs

HPCC Stats

• Ganglia (off the main web page, Status) is back and working. Gives you a snapshot of the present system

• We are nearly done with a database of all run jobs that can be queried for all kinds of information. Should be up in the next couple of weeks.

Cluster Bid Results

How it was done

• HPCC submitted a Request for Quotes for a new cluster system.

• Targeted:– performance vs. power main concern– Inifinband– 3GB per core of memory– approximately $500K of cluster

Results

• Received 13 bids from 8 vendors• Found 3 options that were suitable for the

power, space, cooling and performance we were looking for.

• Looking for some guidance from you on a number of issues

Choice 1: Infiniband config

Two ways to configure Infiniband:• series of smaller switches configured in a

hierarchy (leaf switches)• one big switch (director)• leaf switches are cheaper, harder to

expand (requires reconfiguration), more wires, more points of failure

• director is more expandable, convenient, expensive

Choice 2: Buyin Cost

• buyin cost could reflect just the cost of the compute nodes itself, HPCC provides infrastructure (switches, wires, racks, etc.)

• buyin cost could reflect the total hardware cost

• obviously, subsidizing costs means cheaper buyin costs, fewer general nodes.

Remember

• HPCC is still subsidizing costs, even if hardware is not subsidized

• still must buy air-conditioning equipment, OS licenses, MOAB (scheduling) licenses, software licenses (Not to mention salaries, power)

• Combined, “other” hardware will run to about $75K

• scheduler about $100K for 3 years.

Some Issues

• 1 node = 8 cores, 1 chassis = 4 nodes. • Buyin will be at the chassis level (32 cores)

For 1024 cores

Vendor/config Total Per node/subsidized

Per node/full

Dell/leaf $418K $2,278 ($9,112)

$3,260 ($13,040)

HP/leaf $460K $2,482 ($9,928)

$3,594 ($14,376)

Dell/director $523K $2,278 ($9,112)

$4,086 ($16,344)

Scheduling

• We are working on some better scheduling methods. We think they have promise and would be very useful to the user base

• For the moment, it will be the Purdue model. We guarantee access to nodes within 8 hours of a request from a buyin user. Still a week max run time (though can be changed)

icer user meeting 3/26/10. agenda what’s new in icer (wolfgang) whats new in hpcc (bill) results...

Documents