prof. nectarios koziris vice chairman, grnet iccs, ntua
DESCRIPTION
Prof. Nectarios Koziris Vice Chairman, GRNET ICCS, NTUA. Building a nation-wide production Grid infrastructure: the HELLASGRID project. HELLASGRID in a nutshell. National Grid Initiative led by GRNET (2002), deployed on top of 2,5 Gbps GRNET network - PowerPoint PPT PresentationTRANSCRIPT
Prof. Nectarios KozirisProf. Nectarios Koziris Vice Chairman, GRNETVice Chairman, GRNET
ICCS, NTUAICCS, NTUA
Building a nation-wide production Grid infrastructure:
the HELLASGRID project
2
HELLASGRID HELLASGRID in a nutshellin a nutshell•National Grid Initiative led by
GRNET (2002), deployed on top of 2,5 Gbps GRNET network
•HG-01 cluster @ Demokritos: 64 CPU, 10TB FC SAN, 12TB Tape Library, LCG2/EGEE middleware
•HG02-HG06 clusters located @NDC, IASA, AUTH, FORTH, CTI research centers.
~800 CPUs (x86_64, 2 GB RAM,
80GB HDD, 2x Gbit) ~30 TBytes total raw SAN storage capacity~80TBytes Tape Library4 Access Grid nodes
Leased lambda 2,5 Gbps PoS
Athens MAN (2,5 Gbps PoS)
Dark Fibre (not yet lit)
Patra
Larissa
Heraclion
Syros
Athens
Chania
Rethymnon
Xanthi
Thessaloniki
Ioannina
Leased lambda 1,25 Gbps Gigabit Ethernet
HG-06-AUTH
HG-04-IASA
HG-05-NDC
HG-03- CTI-CEID
HG-01-GRNET Isabella @ Demokritos
HG-02- ICS-FORTH
Hellasgrid Grid Node
http://www.hellasgrid.gr/infrastructure
3
HG structureHG structure
Main site: HG-01-GRNET (Isabella, cslab@ICCS/NTUA)
HG-02…HG-06 sites + operation centers (NDC, IASA, AUTH, FORTH, CTI)
5 smaller sites (AUTH, UoM, FORTH, Demokritos, HEP-NTUA)
HG CA and VOMS (GridAUTH, Dept. of Physics, AUTH) HG helpdesk (CTI) Regional monitoring tools (FORTH) HG user support/apps (Demokritos + all site teams) 4 AccessGRID sites
HG membership: more than 20 Universities + 15 Research Institutes
Feb 06 update: 6+5 infrastructure sites, > 900 CPUs in total
4
HG-01-GRNET HG-01-GRNET IsabellaIsabella
5
HellasGrid Infrastructure, Phase II, NDC HellasGrid Infrastructure, Phase II, NDC (2/2006)(2/2006)
6
HG-01: operations targetsHG-01: operations targets
High node availability Through HW and SW redundancy
Security Timely resolution of problems
Efficient collaboration between team members; ticketing system, interface with EGEE ticketing
Close cooperation with VOs
7
HW/SW RedundancyHW/SW Redundancy
RAID1 on Service Nodes and WNs Reliable Data Storage Infrastructure
• RAID5 volumes on storage array• Redundant FC controllers• Redundant FC links in failover mode for
GPFS storage nodes• Node redundancy at the GPFS level
Redundant GPFS storage nodes• One primary / one secondary per Network Storage
Device (NSD) Redundant network service instances
• DNS two on-site, two off-site servers
8
Security: OpenVPNSecurity: OpenVPN
Management interfaces unreachable from the outside Secure remote access to management VLAN using the free
OpenVPN tool Certificate-based authentication, SSL-based encryption
security hierarchy with different levels Platinum: Backup server, Remote Console Access Gold: Management Server Copper: Worker Nodes for the Grid
Encrypted Virtual
TUN/TAP Ethernet
Interface
Encrypted Ethernet
frames
encapsulated in
UDP/IP or TCP/IP
Virtual Ethernet Switch
(Ethernet Bridge)
between tap1 and eth1
aurum.isabella.grnet.gr
(assigned to bridge interface)
Internet
Management VLAN
tap1
eth1
Cisco Router
(DHCP Server)
tap1Secure extension of Management
VLAN
eth0
Virtual Ethernet Switch
(Ethernet Bridge)
between eth0, tap1
Workstation 2
9
Day-to-day OperationsDay-to-day Operations
Operations in shifts, faster response Use of global EGEE monitoring tools Local monitoring tools: Ganglia, MRTG Vendor-specific tools
IBM Cluster Systems Management• Monitors various node health parameters • Sends e-mail alerts which are routed to mobiles
Web-based ticketing system, also for archiving
Weekly meetings
10
Day2Day CollaborationDay2Day Collaboration
Request Tracker (RT) Web-based Ticketing System. E-mails to [email protected] are
automatically added to the Request Tracker. Used mainly for day2day collaboration and
maintenance. Problem reports are rare. Permanent archive of information on all
events during shifts. Facilitates integration of new team members. Acts as an HG Knowledge base.
11
RT ticketing system: the big pictureRT ticketing system: the big picture
12
Introduction of HG-0x sites:Introduction of HG-0x sites:
Streamlining of new site installations Guide for new HW installations Customized instructions for OS deployment
Certification Period Certification SFTs run by the HG-01-GRNET
team for all yet uncertified sites Site enters production when the tests have
run without problems for 5 days
13
HG Local Users distribution per HG Local Users distribution per DisciplineDiscipline
14
CPU Hours per SiteCPU Hours per Site
15
HellasGrid CA statistics:HellasGrid CA statistics:HellasGrid CA statistics:HellasGrid CA statistics:
16
CPU time: distribution of overall EGEE CPU time: distribution of overall EGEE VOs usage of HG infrastructureVOs usage of HG infrastructure
Normalised CPU Time
28%
10%12%
40%
8% 2% atlas
biomed
cms
lhcb
see
others
17
Cornerstones to Hellasgrid:Cornerstones to Hellasgrid:
1. Widely available distributed e-Infrastructures (network, storage, computer nodes)
2. GRID aware communities • GOCs
• Infrastructure integrators
• middleware developers
• end-users
3. Need for GRID enabled Applications!
“glue && cement” for the cornerstones above?
GRNET network
18
HG international roleHG international role
Synergies: South East Europe
SEE Regional Operations Centre Coordination of SEEGRID project by
GRNET EU Research Infrastructures (EGEE-I
& EGEE-II) EU GRID projects: operations and
research (EumedGrid, EuChinaGrid, GRIDCC, e-IRGSP)
19
GRNET NGI role:GRNET NGI role:
HG follows NGI-EGO paradigm: GRNET:
• Provides/deploys/operates GRID research infrastructure (RI)
• Coordinates national GRID efforts/activities• Has an application neutral role
GRNET acts as an early champion for the SEE area
(See Tuesday’s talk (11:50) by Ognjen Prnjat,GRNET: ”Towards Production Grids in Greenfield Regions” )
20
Conclusions/ExperiencesConclusions/Experiences
Local “incubators” for GRID technology needed-experts in local sites
Training, Training, Training! Users+Apps, Users+Apps, Users+Apps!
Pilot Applications: GRID-APP call (GSRT) received 45 proposals!
GOC&NOCs cooperation, NRENs play vital role International coordination/concertation in
middleware, VOs, infrastructures for Prod.Level.GRID New application calls from GSRT should be planned
21
Welcome to Hellas...GRID!
www.hellasgrid.grFor more: