21 may 2003 fermi linux server vendor qualification--steven timm timm@fnal.gov 1 fermi linux server...

Post on 22-Dec-2015

223 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

1

Fermi Linux ServerVendor Qualification

HEPiX

May 21, 2003

Steven C. Timm

For the Fermi Linux Vendor Qualification Taskforce

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

2

OUTLINE

Fermilab Hardware Procurement Strategy Goals of Qualification Procedures of Qualification Results of Qualification

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

3

SUMMARY

The 2003 Fermi Linux Server Vendor Qualification focused on 1U Intel servers.

First phase was a technical evaluation which identified 18 technically qualified vendors.

All these vendors participate in a price-performance bid—the top five make the vendor list. (Currently ongoing).

We remember all technically qualified vendors and rotate them in as necessary.

We are not making a new qualified desktop vendor list at this time Public web page: http://www-oss.fnal.gov/scs/public/qualify2003

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

4

Members of Fermi Linux Server Vendor Qualification Taskforce:

The taskforce involved personnel from five different departments plus key members of management. All major purchasers of server hardware were represented. Also represented were the computer room logistics staff.

Members: Steven Timm (chair), Margaret Greaney, Troy Dawson, Lance Weems, Hans Wenzel, Bruce Karrels, Don Holmgren, Phil Lutz, Stan Naymola, Mark Kaletka, Gerry Bellendir.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

5

Fermi Hardware Procurement Strategy

Buy a hardware solution fully integrated as possible, including installation

Identify vendors that know Fermilab requirements and are willing to work with Fermi Linux.

Replacement parts via 3 year warranty, service provided by Fermilab.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

6

Fermi Linux Vendor List--History

Two previous Fermi Linux qualifications, 1999 and 2001. 1999—desktops as farm workers, 5 vendors 2001—separate vendor lists for desktops and 2U

rackmount servers Also two special evaluations for 2U rackmounts and AMD. Vendor list used in all major Fermi acquisitions, ~1500

machines from 1999-2002. Also used by outside groups: KEK, INFN, Northwestern,

MIT, Geneva, Carnegie Mellon, Pittsburgh, Edinburgh, others

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

7

Evaluation: performance/price

Overriding goal has been to get the best performance possible at the lowest price.

We have succeeded well—From 1999 to 2002 Fermi cycles per dollar increased by a factor of 6—Moore’s law should have only given us a factor of four.

Users are happy with quantity of computing that they got for their money.

But still, in this evaluation, we are looking for better long term reliability, not race to the bottom for price only.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

8

Evaluation: Performance/price

Problem: One node not the best test of long-term price/performance by a company.

Small businesses best able to take time to follow directions of evaluation process and give support.

Small businesses not always able to deliver large orders in timely manner with good initial quality.

Single node prices not a good predictor of bid level on a real bid—and we shouldn’t be asking anyway.

Address by: getting technical qualification done first, then doing a price/performance bid.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

9

Evaluation: Vendor attrition

Some vendors on list have gone out of business Others disqualified for bad performance Others stopped bidding on their own, or bid ridiculously high Address by:

– Select vendor list on performance/price basis from all those technically qualified.

– Keeping track of all technically qualified vendors, add to list if necessary

– Supplement list if special hardware (AMD, blades, desktop) required.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

10

Evaluation: Initial quality

Problem: Going too low on the price curve: Sometimes vendors bid too low and try to deliver poor quality systems

Addressed, from the beginning, with tough 30-day acceptance test and “lemon law”

In various cases Fermilab has required vendors to do swaps on all units of PS, case, motherboard, disk drives, and racks.

Cost of Fermi labor to resolve the problem less than difference between the winning bid and the next highest bid.

All issues have been resolved through this process and the systems have all had productive lives.

NOW—also address with references and hard numbers on initial quality.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

11

Evaluation: Components

Problem: Rapidly changing components In commodity market, components change rapidly. From beginning of eval to issuance of purchase order—about six

months CPU speeds go up, cases change. Impossible to track for laptop, difficult to track for desktop. OK for server market but results in higher heat loads and current

draws. ADDRESS by thermal specs that are broad enough so that if

there are problems, vendor still has to fix.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

12

Goals

We want to identify vendors who are best capable to deliver rackmounted solutions– Competent in Linux– Build quality 1U Servers– Can integrate into rackmount environment with good

thermals in a timely and professional manner– Have high performance– Have good support and troubleshooting

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

13

Vendor Selection

Existing vendors on Fermi Linux list Sales to other Fermi Departments Advertisements at trade shows Survey of other DOE labs at HEPiX Vendor’s direct contact to Fermilab asking to

participate.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

14

Chronology

We made contact with 45 vendors in all. 29 vendors attended Jan 28. info meeting 24 vendors submitted acceptable configuration

on Feb. 4 21 vendors submitted acceptable benchmarks

and were cleared to ship unit on Mar. 4—all got it here by Mar 11.

18 vendors identified as technically qualified

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

15

Specifications

1U Dual Intel Xeon, 2.4 GHz or faster 400 MHz front side bus or faster 1 GB RAM (RDRAM or DDR SDRAM) Disks: 1 20Gb system 2 x 40Gb data 100Mbit Ethernet Video CDROM, Floppy

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

16

Why just 1U Xeon

AMD hardware shows high initial failure rate, high current, high heat.

1U is most challenging thermal case…if they can build 1U we believe they can build 2U.

Intel chips are supposed to be faster than AMD at the moment Intel chips supposed to run cooler, draw less current. Simplicity—a platform we already mostly understand, just one

from each vendor Space—we don’t have space to put so many 2U.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

17

Linux Competence

Vendor identifies hardware that’s compatible with Linux. (Much easier than it used to be).

Vendor loads Fermi Linux onto evaluation node Have to configure lm_sensors on the node Runs our supplied test to check and see if they

did it right. They are only allowed to ship the unit to

Fermilab if it is right.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

18

Electrical

Electric current measured with ammeter at startup, idle, and full CPU load.

Current draw ranges: 2.4GHz, 1.6-2.0A, 2.8 GHz, 2.0-2.3A, 3.06GHz, 2.1-2.35A

Likely that with purchase of 2.8 or 3.06GHz machines we can only have seven machines per circuit, not eight as in the past.

Those with higher current draw also tend to have more fans and be better internally cooled.

Bright side—This current similar to 750MHz machines bought 3 years ago, 2.5x the performance for the same current.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

19

Thermal

Measured T from front to back of unit for all. Used internal temperature probes on each unique type of case. All units in evaluation much cooler than the 1U units bought in

FY2002. Due to better thermal characteristics of Intel chip and many more

added internal fans and blowers. “Northbridge” chipset chips in some machines ran hotter than the

CPU’s. Important to watch size of heatsink on these chips. Still analyzing the data we took but confident that all units are

acceptable.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

20

Thermals continued

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

21

Quality 1U Servers

Open each machine to verify quality of construction

Run burn-in on each machine for two weeks Thermal measurements in real rack situation Electrical current measurements Verify all components meet specs.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

22

Integration capabilities contd.

Vendors are asked to submit sample proposal for full rack of systems

Standard Fermi rack configuration is base of proposal but they can suggest extras.

Goal is to (1) learn if they can integrate and (2) get new ideas on how to improve our setup.

Also they must submit info on clusters they have installed before, with real temperature and reliability numbers.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

23

Performance

Vendors are supplied CD-ROM of CDF and D0 Benchmark

Performance measured in Fermi Cycles where PIII 1 GHz=1000 Fermi Cycles.

We repeat test when machine gets here QCD benchmark, seti@home, tiny also run. Would be ideal to use SPEC CPU2000—but published

results not repeatable with compilers used by Fermi. Price doesn’t enter in technical evaluation.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

24

Performance

3 CPU speeds measured, 2.4, 2.8, 3.06 GHZ, 1000 FermiCycles=PIII 1 GHz. Average performance, 1779, 2041, 2223 Fermi Cycles

respectively. 400MHZ vs 533 MHz front side bus is 2.5% effect for

farms software, much bigger for QCD. AMD MP2200+ --1771 Fermi Cycles Performance is projected to faster clock speeds in

anticipation that some vendors will bid faster chips.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

25

Support and Troubleshooting

Each vendor gets software call—related to the configuration of Fermi Linux, solvable by E-mail or phone

Each vendor gets hardware call—designed to trigger an on-site service call.

We manufacture one if necessary. Points for prompt response, correct response.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

26

Conclusions

18 technically qualified vendors—in alphabetical order Ace, Angstrom, APPRO, ASA, Aspen, Atipa,

Concentric, Dell, HP, IBM, Koi, Penguin, Promicro, PSSC, Rackable, Racksaver, Richardson, Western Scientific

Price/performance bid will weed them down to five. 21 vendors is too many to bring in, will be more

discriminating next time.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

27

Component issues:

Boards OK: Intel SE7501 series, Supermicro X5DPx series, Tyan 2721, Tyan 2723

Both Tyan S2721-533 (Thunder i7501 Pro) and Tyan S2723 (Tiger i7501) had issues with 10/100 ethernet…resolved by changing resistor value on the board

Some manufacturers offer cold-swap and hot-swap capabilities on drives, very nice.

Issues in Intel E7501 chipset—slower disk throughput than some earlier chipsets, but adequate for our needs.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

28

Price/performance bid

All vendors who pass our technical requirements are participating in a price/performance bid on a small number of nodes (48)

Top five will be the Fermi Linux Qualified Vendors We will keep track of all technically qualified vendors to replenish

the list if– A vendor goes out of business– A vendor stops bidding, or bids consistently very high on Fermi

RFP’s– A particular RFP requires special capacities—Myrinet, AMD,

blade servers, desktop

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov

29

Future Plans

Blade server evaluation coming up.– Requires change in install philosophy…no floppy,

CDROM, serial console available.– Essential to address power and space concerns in

Feynman and elsewhere.

top related