vincenzo vagnoni lhcb real time trigger challenge meeting cern, 24 th february 2005

7
ion configurat OS and booting Diskless Vincenzo Vagnoni Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting LHCb Real Time Trigger Challenge Meeting CERN, 24 CERN, 24 th th February 2005 February 2005

Upload: moris-dixon

Post on 24-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

ionconfigurat OS and booting Diskless

Vincenzo VagnoniVincenzo Vagnoni

LHCb Real Time Trigger Challenge MeetingLHCb Real Time Trigger Challenge Meeting

CERN, 24CERN, 24thth February 2005 February 2005

Page 2: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

Vincenzo Vagnoni LHCb RTTC Meeting, 24th February 2005 2

Diskless bootingDiskless booting 4 ways (to my knowledge) to operate a linux diskless machine4 ways (to my knowledge) to operate a linux diskless machine

Removable device booting (e.g. a la Knoppix on cdrom) Not flexible enough, option discarded!

High-reliability Mini-Drive or Disk-On-Chip booting Interesting… “firmware”-oriented approach, like modern x-terminals Never tried, adopting such a solution would depend on the availability of low cost

devices of this kind Not a real option at the moment, but to be kept in mind…

“Classic” network boot with “root over NFS” Used for example for old x-terminals, for CETIA motherboards, etc… Used in production for 4 years in Bologna for the data analysis farm and also for

two years for the Bologna MC production farm Works fine

Network boot with root filesystem on ramdisk I’m not aware of other people using it apart us in the Bologna L1&HLT testbed Root filesystem downloaded at boot time together with kernel via network Application software directories mounted via NFS (or other network filesystem

protocols) Works fine

Page 3: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

Vincenzo Vagnoni LHCb RTTC Meeting, 24th February 2005 3

Classic network bootClassic network boot Requires few basic servicesRequires few basic services

PXE, DHCP, TFTP servers on a control PC Requires root directories to be exported by a control PC

Each machine will have its root filesystem as a specific directory on the control PC

Installation of a new node just requires the update of the DHCP configuration, the copy of a template directory and export of the copied directory via the network filesystem

One drawbackOne drawback Reliability depends on the reliability of the network filesystem

In case the network filesystem hangs, the root filesystem is frozen, the machine is not reachable anymore in no way (the kernel frezees and waits for the root filesystem to come back online)

A potential problem? For example, NFS at CNAF has shown some serious problems: in a complex network environment with large (unwanted!) variable latencies, deadlocks of the (linux implementation) protocol had shown up.

However, we should’n t have such non-controlled latencies in our online network, otherwise the trigger is dead…

Page 4: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

Vincenzo Vagnoni LHCb RTTC Meeting, 24th February 2005 4

Ramdisk network bootRamdisk network boot Requires the same services as the “root over NFS”Requires the same services as the “root over NFS”

PXE, DHCP, TFTP servers on a control PC Installation of a new node just requires the update of the DHCP

configuration Kernel and ramdisk with root filesystem image is downloaded at boot time The root filesystem is memory resident

AdvantageAdvantage The machine is always operative and reachable (unless the memory doesn’t

break or a bit flip is triggered by a cosmic ray… but ECC memories are protected against single bit flip )

Unwanted corruption (mistakes) of the filesystem files is automatically restored at reboot (changes to the filesystem are just temporary and lost)

DrawbacksDrawbacks The ramdisk eats memory, typically order of 200 MB for a “normal” root

filesystem (not a real problem however to loose just 200 MB) Requires recompiled kernel with large ramdisk size (not a real problem

anyway) Application software is too large in any case and should be mounted via

network

Page 5: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

Vincenzo Vagnoni LHCb RTTC Meeting, 24th February 2005 5

Control PCControl PC Scalability of the network boot and of the NFS Scalability of the network boot and of the NFS

exports shouldn’t be an issueexports shouldn’t be an issue A control PC serves just a few subfarms However, it is better the control PC doesn’t live too far (“networkly”

speaking) the served subfarms

An issue will be to keep up-to-date and syncronized An issue will be to keep up-to-date and syncronized all the Control PCs operating systems, the all the Control PCs operating systems, the application software served, the operating application software served, the operating system(s) served, etc.system(s) served, etc.

The control PCs are “regular” “disked” machines, and have on their disks the core of the system

The way these PCs are managed is a core business for an efficient-flawless-costless operation of the farm

Page 6: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

Vincenzo Vagnoni LHCb RTTC Meeting, 24th February 2005 6

Preparation for the RTTCPreparation for the RTTC Well in time for the RTTC, we should sit together Well in time for the RTTC, we should sit together

and configure a testbed farm at CERNand configure a testbed farm at CERN Administrator(s) should make experience with this configuration

Several other issues should be addressedSeveral other issues should be addressed VLANs, e.g., might interfere with network boot

for example in case of using unmanaged switches which learn dynamically the VLANs from the nodes… the node doesn’t know anything about the VLAN until it is booted with an “intelligent” operating system

Just an example, as we won’t use unmanaged switches to my knowledge, but however all the details with occasional problems and solutions should be settled down in time

Thus, we should sit together and define in more details the hardware/software infrastructure of the RTTC farm

Page 7: Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

Vincenzo Vagnoni LHCb RTTC Meeting, 24th February 2005 7

Relevant toolsRelevant tools All the operations to prepare a new node can be All the operations to prepare a new node can be

done in principle “by hand”done in principle “by hand” Of course, not feasible for a 1800 PC farm

But, feasible for the RTTC as few nodes will be But, feasible for the RTTC as few nodes will be involvedinvolved

However (see Gianluca’s talk) it would be nice to have already for the RTTC a GUI that automatically triggers the work to be done (update of the configuration files on the control PC, e.g. DHCP, etc…)

Needs some strict interaction between administrator(s) and PVSS GUIs developers

Remote control of electrical powerRemote control of electrical power Not really necessary for the RTTC, but still nice to have it in time IPMI solution (see Gianluca’s) controlled via PVSS GUIs Alternatively, remotely ethernet-controlled power switches (e.g.

those used at CNAF) controlled via PVSS GUIs Of course, essential issue for the final online farm