computational grids and grids projects dss, 4.4.2005 [email protected]

21
Computational grids Computational grids and and grids projects grids projects DSS, 4.4.2005 DSS, 4.4.2005 [email protected] [email protected]

Upload: camilla-small

Post on 31-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Computational grids Computational grids and and

grids projectsgrids projects

DSS, 4.4.2005DSS, 4.4.2005

[email protected]@kiv.zcu.cz

ContentContent

Grid computing (terminology)Grid computing (terminology) EGEE grid elements, how it worksEGEE grid elements, how it works Gilda testbed (example of simple job)Gilda testbed (example of simple job) Grid projectsGrid projects

Grid computingGrid computing

model for solving massive computational problemsmodel for solving massive computational problems use of use of unusedunused resources (CPU cycles, disk storage,...) resources (CPU cycles, disk storage,...) support computation support computation acrossacross administrative domains administrative domains

– apart from traditional clustersapart from traditional clusters

creates “virtual cluster” embedded in network infrastructurecreates “virtual cluster” embedded in network infrastructure

multi-usermulti-user environment environment issue of issue of authorizationauthorization – allow remote users to control – allow remote users to control

computing resourcescomputing resources

Grid computing - resourcesGrid computing - resources

sharing heterogenous resourcessharing heterogenous resources

– different platformsdifferent platforms– hw / sw architectureshw / sw architectures– computer languagescomputer languages

located in different placeslocated in different places– different administrative domainsdifferent administrative domains– connected through the networkconnected through the network

virtualizingvirtualizing computing resources computing resources

Grid x clusterGrid x cluster

grids – grids – heterogeneousheterogeneous

– can use ordinary desktops as wellcan use ordinary desktops as well

cluster – cluster – homogenoushomogenous– located in data centreslocated in data centres

Grids are build from Grids are build from Computational ElementsComputational Elements ( (CECE)) The The clustercluster can act as an CE of the whole grid system can act as an CE of the whole grid system

Global Grid ForumGlobal Grid Forum

GGFGGF – defines specification for grid computing – defines specification for grid computing Globus AllianceGlobus Alliance – implements standards – GT – implements standards – GT Globus ToolkitGlobus Toolkit – middleware to build services based – middleware to build services based

on GT; on GT; de factode facto standard; just part of the grid standard; just part of the grid

Globus – implemented servicesGlobus – implemented services

Resource managementResource management

– GRAMGRAM (Grid Resource Allocation Management) (Grid Resource Allocation Management)

Information servicesInformation services– MDSMDS (Monitoring and Discovery Services) (Monitoring and Discovery Services)

Security ServicesSecurity Services– GSIGSI (Grid Security Infrastructure) (Grid Security Infrastructure)

Data Movement and ManagementData Movement and Management– GridFTPGridFTP, , GASSGASS (Global Access to Secondary Storage) (Global Access to Secondary Storage)

EGEE grid componentsEGEE grid components

UIUI ( (User InterfaceUser Interface))

– user access to the computational griduser access to the computational grid– logon, start jobs, info about state of jobslogon, start jobs, info about state of jobs– information about free resourcesinformation about free resources– management of user’s datamanagement of user’s data

CECE ( (Computing ElementComputing Element))– receive jobs for the given cluster, farm (homogenous)receive jobs for the given cluster, farm (homogenous)– info about computational power and installed swinfo about computational power and installed sw– give the jobs to the local job management systemgive the jobs to the local job management system

(PBS, LFS, NQE, LoadLeveler, Condor), LJMS sends the job later (PBS, LFS, NQE, LoadLeveler, Condor), LJMS sends the job later to the working nodesto the working nodes

EGEE grid components II.EGEE grid components II.

SE SE (Storage Element)(Storage Element)

– interface how to store user data inside the gridinterface how to store user data inside the grid– access to the files access to the files – replication of filesreplication of files– file is registrated inside the grid with the internal namefile is registrated inside the grid with the internal name

(independent of the name and the location)(independent of the name and the location)

RC RC (Replica Catalog)(Replica Catalog) RLSRLS (Replica Location Server) (Replica Location Server)

– info about file replicas, selection of the appropriate replicainfo about file replicas, selection of the appropriate replica

EGEE grid components III.EGEE grid components III.

WN WN (Worker Nodes)(Worker Nodes)

– computation nodes, place where the computation is runningcomputation nodes, place where the computation is running– have access to the application software (mount from server)have access to the application software (mount from server)– capable of manipulation with data stored on SEcapable of manipulation with data stored on SE– they are accessible only from CE, not from the whole environmentthey are accessible only from CE, not from the whole environment

EGEE grid components IV.EGEE grid components IV.

ISIS (Information Service) (Information Service)– state information about elements of grids (CE, SE, ...)state information about elements of grids (CE, SE, ...)– monitoring of the state of the jobsmonitoring of the state of the jobs

RBRB (Resource Broker) (Resource Broker)– scheduler, find the proper resources for the job requirementsscheduler, find the proper resources for the job requirements– divide jobs to the CE, sending JDL (Job Description Language)divide jobs to the CE, sending JDL (Job Description Language)– use IS for its decisionsuse IS for its decisions

UI

- PKI X.509 certificate keys- JDL files

Students Terminals

enterGrid

enterGrid

enterGrid

enterGrid

UI WN

WN

WN

WN

WN

WNRB

CESE

GILDARLS

How it all works together – step by step How it all works together – step by step

User connectsUser connects to the UI to the UI– time limited proxy certificate is createdtime limited proxy certificate is created

User definesUser defines the computational job and tell it to the the computational job and tell it to the resource brokerresource broker– by the means of JDL fileby the means of JDL file– JDL file may contain some input data (more datasets – SE)JDL file may contain some input data (more datasets – SE)

Resource brokerResource broker talks to IS, talks to IS, finds finds proper CEproper CE Resource brokerResource broker creates job and creates job and sends sends it to the CEit to the CE

How it all works together II.How it all works together II.

CE receivesCE receives job and job and sendssends it to the local job it to the local job management systemmanagement system

The job is The job is running onrunning on the WN (working nodes) the WN (working nodes)– using lager datasets – copy data from SE using lager datasets – copy data from SE – new large output data – copy to SE, registrated with RLS (Replica new large output data – copy to SE, registrated with RLS (Replica

Location Server)Location Server)

At the end of the job, At the end of the job, outputoutput (stdout, stderr) (stdout, stderr) copiedcopied back to the RBback to the RB

How to try it and participateHow to try it and participate

Genius portalGenius portal – access to the grid– access to the grid GildaGilda

– demo applicationsdemo applications– last versions of middleware swlast versions of middleware sw

https://grid-demo.ct.infn.it/https://grid-demo.ct.infn.it/

Example – hostname.jdlExample – hostname.jdl

Type = "Job";Type = "Job";

JobType = "Normal"; JobType = "Normal";

Executable = "/bin/hostname";Executable = "/bin/hostname";

StdOutput = "hostname.out"; StdOutput = "hostname.out";

StdError = "hostname.err"; StdError = "hostname.err";

OutputSandbox = {"hostname.err","hostname.out"};OutputSandbox = {"hostname.err","hostname.out"};

Arguments = "-f";RetryCount = 7;Arguments = "-f";RetryCount = 7;

Example – log after job submissionExample – log after job submission

Let the GILDA Resource Broker choose Selected VirtualLet the GILDA Resource Broker choose Selected VirtualOrganisation name (from UI conf file): gilda Organisation name (from UI conf file): gilda Connecting to host grid004.ct.infn.it, port 7772 Logging Connecting to host grid004.ct.infn.it, port 7772 Logging

to host grid004.ct.infn.it, port 9002 to host grid004.ct.infn.it, port 9002 ================================ edg-================================ edg-job-submit Success job-submit Success ===================================== =====================================

The job has been successfully submitted to the Network The job has been successfully submitted to the Network Server. Use Server. Use edg-job-statusedg-job-status command command to check job to check job current statuscurrent status. Your . Your job identifierjob identifier (edg_jobId) is: - (edg_jobId) is: - https://grid004.ct.infn.it:9000/YWwYrwIircPajba_1pAdehttps://grid004.ct.infn.it:9000/YWwYrwIircPajba_1pAdegg The edg_jobId has been saved in the following file: The edg_jobId has been saved in the following file: /home/demo03/.genius/.tmp_submittedjob_demo03/home/demo03/.genius/.tmp_submittedjob_demo03 ====================================================================================================

Example – job queueExample – job queue

Status of the job can be checked in job queue Status of the job can be checked in job queue – readyready– scheduled scheduled – running running – donedone – – Get OutputGet Output– cleared (after GetOutput)cleared (after GetOutput)

OutputOutput– hostname.errhostname.err 00 – hostname.out.txthostname.out.txt 2424

Hostname.out.txtHostname.out.txt– testbed010.cnaf.infn.it testbed010.cnaf.infn.it {Heureka! We got it!} {Heureka! We got it!}

Grid ProjectsGrid Projects

EGEE (Enabling Grid for E-sciencE)EGEE (Enabling Grid for E-sciencE)– connect Europian grids, create production gridconnect Europian grids, create production grid– starten on 1.April 2004starten on 1.April 2004– 70 partners (EU, USA, Russia)70 partners (EU, USA, Russia)– 7 federations (CE federation – Czech Rep.)7 federations (CE federation – Czech Rep.)– CERN – one federation itself CERN – one federation itself – CESNET – scheduling and state monitoring part of the CESNET – scheduling and state monitoring part of the

middlewaremiddleware

Project GenevaProject Geneva

CoreGrid, Akogrimo, DataMiningGridCoreGrid, Akogrimo, DataMiningGrid GridCoord, HPC4U, IntelliGridGridCoord, HPC4U, IntelliGrid K-WF Grid, NextGrid, OntoGridK-WF Grid, NextGrid, OntoGrid Provenance, SIMDAT, UniGridSProvenance, SIMDAT, UniGridS

Literature, MaterialsLiterature, Materials

WikipediaWikipedia http://egee.cesnet.czhttp://egee.cesnet.cz http://www.globus.orghttp://www.globus.org