grid activity at ccs

34
Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1

Upload: kira

Post on 23-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Grid Activity at CCS. Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba. About Myself. Name Toshiyuki Amagasa Affiliation: Division of Computational Informatics, Center for Computational Sciences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid Activity at CCS

Grid Activity at CCS

Toshiyuki AmagasaCenter for Computational Sciences, Univertsity of Tsukuba

1

Page 2: Grid Activity at CCS

About Myself

2

Name Toshiyuki Amagasa

Affiliation: Division of

Computational Informatics, Center for Computational Sciences

Department of Computer Science, Graduate School of Systems and Information Engineering

Area of research Data engineering Database system

Recent topics XML databases

Parallel XML query processing OLAP analysis for XML Web information extraction for

XML Databases in scientific

applications Faceted navigation for QCDml Meteorological database

Page 3: Grid Activity at CCS

ILDG-JP Members

3

Prof. Mitsuhisa Sato (Director, CCS) Prof. Tomoteru Yoshie (CCS) Prof. Osamu Tatebe (CCS) Dr. Naoya Ukita (CCS) Prof. Toshiyuki Amagasa (CCS)

Page 4: Grid Activity at CCS

Talk Outline

4

Current Status of ILDG A Brief History of JLDG An Overview of JLDG

A Development of New ILDG Client Faceted Navigation of QCDml

Conclusions and Future Work

Page 5: Grid Activity at CCS

Current Status of JLDG

5

Page 6: Grid Activity at CCS

A Brief History of JLDG (1/3) Hepnet-J/sc 2002- (SINET GbE private network)

Widely-distributed file system Network backbone: Super SINET VPN Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U.,

Hiroshima U., and Kanazawa U. Objective and Implementation

Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security

Mirroring among FSs attached to SCs with administrative

CCP@Tsukuba

CP-PACS

RCNP@Osaka

SX-5

CRC @ KEK

SR8000

YITP@Kyoto

SX-5

Hepnet-J/scFile ServerFile Server

File Server File Server

6

Page 7: Grid Activity at CCS

A Brief History of JLDG (2/3) Problems

Growing cost for managing data location A dataset may be distributed in several disks. It is hard for users to remember location of data and mirrors.

No concepts of users and user groups Hard to support multiple research groups.

Necessary functionalities A flat data sharing system which has not space limit (or can be

extended at anytime) Users and user group management over several organizations

Japan Lattice Data Grid (JLDG) Project launched in November 2005 Operation started in March 2007

7

Page 8: Grid Activity at CCS

A Brief History of JLDG (3/3) JLDG v1 started operation in May 2008

Available datasets CP-PACS Nf=2 QCD configuration

8,000 files, 1.5 TBytes CP-PACS/JLQCD Nf=2+1 QCD configuration

21,000 files, 6 TBytes PACS-CS Nf=2+1 323x64 lattice QCD configuration

2,600 files, 3 TBytes

JLDG v2 started operation in December 2009 Storing and sharing research data generated in daily

research activities Data sharing within a research group

8

Page 9: Grid Activity at CCS

An Overview of JLDG A widely-distributed file system with 100 TB-scale storage

for domestic researchers in particle physics Sharing simulation data computed by SCs for several months to

several years. Data files are distributed. Create replications if necessary. A user do not need to recognize file locations. Files can be

accessed very quickly if the site has replicas. Storage space can be incrementally added during operation.

Kyoto

Kanazawa

SINET3 Network

Gfarm file system

ILDG

www.jldg.orgTsukuba

KEK

OsakaHiroshima

9

Page 10: Grid Activity at CCS

Software Components Globus Toolkit V4 (ANL) www.globus.org

GSI authentication, Proxy user certificate creation GridFTP server / client

VOMS (EDG) VO management

Naregi-CA (Naregi) www.naregi.org User / host certificate creation

Gfarm file system (U. of Tsukuba) datafarm.apgrid.org Widely-distributed file system

Uberftp  (NCSA) http://dims.ncsa.uiuc.edu/set/uberftp/ Interactive GridFTP client

10

Page 11: Grid Activity at CCS

Gfarm Distributed File System An open-source distributed file system A global namespace to unify storage systems Scalable I/O performance exploiting data access locality Automated replica selection for fault-tolerance and load-

balancing

Gfarm File System

/gfarm

ggf jp

aist gtrc

file1 file3file2 file4

file1 file2

Replica creation

Globalnamespace

Mapping

11

Page 12: Grid Activity at CCS

Summary

14

JLDG A brief history An overview

Used as an infrastructure for daily research activity Hands on meeting on 27 Jan., 2009

Successfully done with19 attendees

Page 13: Grid Activity at CCS

Development of a New ILDG Client

15

Page 14: Grid Activity at CCS

Int’l Lattice Data Grid (ILDG)

16

A data grid for sharing Lattice QCD configuration File Formats in ILDG

Configuration binary LIME (Lattice QCD Interchange Message Encapsulation)

Metadata (QCDml) ensemble XML configuration XML

LFN (Logical File Name) Identifier for configuration binary

ensembleXML

configurationXML

configuration(binary)

LFN

configurationXML

configuration(binary)

LFN

configurationXML

configuration(binary)

LFN

configurationXML

configuration(binary)

LFNmarkovChainURI

Page 15: Grid Activity at CCS

QCDml Ensemble XML

17

<markovChain xmlns=“…"> <markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C1600</markovChainURI> <management> <revisions>1</revisions> <collaboration>CP-PACS</collaboration> <projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action)</projectName> <ensembleLabel>B1800</ensembleLabel> <reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901</reference> <archiveHistory> <elem> <revision>1</revision> <revisionAction>add</revisionAction> <participant> <name>T.Yoshie</name> <institution>Center fof Computational Sciences, University of Tsukuba</institution>

Page 16: Grid Activity at CCS

Typical Usecase of ILDG

18

Find desired data by MDC

Find nearby site by FC

Access to the site

Data transfer

LFN (Logical File Name)

SURL (Site URL)

TURL (Transfer URL)

VOMS

Authentication

Page 17: Grid Activity at CCS

Difficulties in Finding Desired Configuration

19

Directly use query language (XQuery / XPath) A simple example:

Knowledge about XML, QCDml, and XQuery (XPath) are needed.

Hard to get the whole picture of available data. Hierarchical list

Easy to use. Need huge screen to show the entire list. Still difficult to get the whole picture of the data.

/markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]

Page 18: Grid Activity at CCS

Basic Idea

20

Applying faceted-navigation interface to browse QCDml ensemble XML data.

Page 19: Grid Activity at CCS

21

Faceted-Navigation What is “faceted-navigation”?

A scheme for browsing objects with attributes. Successfully used in some applications, such as Apple iTunes.

Procedure A user select a value in a facet

To select a set of objects of interest The system updates the list of objects, list of facets, and

respective values (Repeat)

Example The Flamenco Search

http://flamenco.berkeley.edu/

Page 20: Grid Activity at CCS

The Flamenco Searchhttp://flamenco.berkeley.edu/

22

Page 21: Grid Activity at CCS

The Flamenco Searchhttp://flamenco.berkeley.edu/

23

Page 22: Grid Activity at CCS

The Flamenco Searchhttp://flamenco.berkeley.edu/

24

Page 23: Grid Activity at CCS

25

Faceted-Navigation Good features

Users have a freedom to choose a facet c.f. Hierarchical list

Give a big picture of the dataset Available values along with their population

Effective Busch’s Law: 4 facets consisting of 10 values are enough to

deal with 10,000 objects.

Page 24: Grid Activity at CCS

Technical Challenges

26

How to define facets? How to extract values according to the facets? How to achieve quick response from the database

for improving user experience?

Page 25: Grid Activity at CCS

Choosing the Facets Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe. Selected elements from QCDml ensemble XML

Regional grid Collaboration Project name Number of flavors Time Parameters

Lattice size Gluon action

Parameters Quark action

Parameters

27

Page 26: Grid Activity at CCS

Extracting Values from a Facet(1/3) Extract text values

Collaboration Project name

Need substringextraction Date

28

CP-PACSCP-PACS+JLQCDCSSMLHPCMILCRBC-UKQCDUKQCDdiketmcgralqcdsfsesamthetatxl…

2+1 DWF2+1 Dynamical AsqTADBaryon ResonancesDynamical FLIC StudiesElectromagnetic Form FactorsFLIC Overlap StudiesFlux Tube TestGluon PropagatorLong_aqstad_runPentaquark Volume Dependence…

20002005200620072008

<date>2007-02-26T21:39:33+09:00</date>

Page 27: Grid Activity at CCS

Extracting Values from a Facet(2/3) Need text value generation

Lattice sizee.g. 12 / 12 / 12 / 24

29

<physics> <size>

<elem> <name>X</name> <length>12</length></elem><elem> <name>Y</name> <length>12</length></elem><elem> <name>Z</name> <length>12</length></elem><elem> <name>T</name> <length>24</length></elem>…

Page 28: Grid Activity at CCS

Extracting Values from a Facet(2/3) Gluon action / Quark action

An element name itself represents a value Extract element name as a value of a facet

30

<action> <gluon>

<iwasakiRGGluonAction> <glossary>http://www.jldg.org/JLDG/...

<action> <gluon>

<DBW2GluonAction> <glossary>www.lqcd.org/ildg/pla...

Page 29: Grid Activity at CCS

QCDmlEnsemble (ILDG)& Configuration

(JLDG)

Facet Navigation System(PHP + SQL + XQuery)

QCDml Faceted Navigation I/FSystem Configuration

Facet Database

XML DB (eXist)RDBMS (MySQL)

Facetextraction(XQuery)

ILDG

JLDG

CSSM

LDG

UKQCD

USQCD

Web Server(Apache)

DownloadingEnsemble XML

31

Page 30: Grid Activity at CCS

Database Design (1/2)

32

Use RDBMS for quick response Use fixed relational schema for extensibility

*************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: rgrid value: cssm*************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: collaboration value: CSSM*************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: projectName value: Dynamical FLIC Studies*************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: date value: 2007

Page 31: Grid Activity at CCS

Database Design (2/2)

33

Store preformatted text for improving rendering performance

*************************** 1. row ***************************collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300)*************************** 2. row ***************************collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:

Page 32: Grid Activity at CCS

A Screenshot of the System

34

Page 33: Grid Activity at CCS

Conclusion and Future Work

35

Conclusion Current Status of ILDG A Development of New ILDG Client

Future work Exploring more chances to apply data engineering

techniques in various e-Science fields. Data mining Data integration …

Page 34: Grid Activity at CCS

Thank you very muchfor your kind attention

Questions should be addressed to [email protected]

36