grid activity at ccs
DESCRIPTION
Grid Activity at CCS. Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba. About Myself. Name Toshiyuki Amagasa Affiliation: Division of Computational Informatics, Center for Computational Sciences - PowerPoint PPT PresentationTRANSCRIPT
Grid Activity at CCS
Toshiyuki AmagasaCenter for Computational Sciences, Univertsity of Tsukuba
1
About Myself
2
Name Toshiyuki Amagasa
Affiliation: Division of
Computational Informatics, Center for Computational Sciences
Department of Computer Science, Graduate School of Systems and Information Engineering
Area of research Data engineering Database system
Recent topics XML databases
Parallel XML query processing OLAP analysis for XML Web information extraction for
XML Databases in scientific
applications Faceted navigation for QCDml Meteorological database
ILDG-JP Members
3
Prof. Mitsuhisa Sato (Director, CCS) Prof. Tomoteru Yoshie (CCS) Prof. Osamu Tatebe (CCS) Dr. Naoya Ukita (CCS) Prof. Toshiyuki Amagasa (CCS)
Talk Outline
4
Current Status of ILDG A Brief History of JLDG An Overview of JLDG
A Development of New ILDG Client Faceted Navigation of QCDml
Conclusions and Future Work
Current Status of JLDG
5
A Brief History of JLDG (1/3) Hepnet-J/sc 2002- (SINET GbE private network)
Widely-distributed file system Network backbone: Super SINET VPN Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U.,
Hiroshima U., and Kanazawa U. Objective and Implementation
Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security
Mirroring among FSs attached to SCs with administrative
CCP@Tsukuba
CP-PACS
RCNP@Osaka
SX-5
CRC @ KEK
SR8000
YITP@Kyoto
SX-5
Hepnet-J/scFile ServerFile Server
File Server File Server
6
A Brief History of JLDG (2/3) Problems
Growing cost for managing data location A dataset may be distributed in several disks. It is hard for users to remember location of data and mirrors.
No concepts of users and user groups Hard to support multiple research groups.
Necessary functionalities A flat data sharing system which has not space limit (or can be
extended at anytime) Users and user group management over several organizations
Japan Lattice Data Grid (JLDG) Project launched in November 2005 Operation started in March 2007
7
A Brief History of JLDG (3/3) JLDG v1 started operation in May 2008
Available datasets CP-PACS Nf=2 QCD configuration
8,000 files, 1.5 TBytes CP-PACS/JLQCD Nf=2+1 QCD configuration
21,000 files, 6 TBytes PACS-CS Nf=2+1 323x64 lattice QCD configuration
2,600 files, 3 TBytes
JLDG v2 started operation in December 2009 Storing and sharing research data generated in daily
research activities Data sharing within a research group
8
An Overview of JLDG A widely-distributed file system with 100 TB-scale storage
for domestic researchers in particle physics Sharing simulation data computed by SCs for several months to
several years. Data files are distributed. Create replications if necessary. A user do not need to recognize file locations. Files can be
accessed very quickly if the site has replicas. Storage space can be incrementally added during operation.
Kyoto
Kanazawa
SINET3 Network
Gfarm file system
ILDG
www.jldg.orgTsukuba
KEK
OsakaHiroshima
9
Software Components Globus Toolkit V4 (ANL) www.globus.org
GSI authentication, Proxy user certificate creation GridFTP server / client
VOMS (EDG) VO management
Naregi-CA (Naregi) www.naregi.org User / host certificate creation
Gfarm file system (U. of Tsukuba) datafarm.apgrid.org Widely-distributed file system
Uberftp (NCSA) http://dims.ncsa.uiuc.edu/set/uberftp/ Interactive GridFTP client
10
Gfarm Distributed File System An open-source distributed file system A global namespace to unify storage systems Scalable I/O performance exploiting data access locality Automated replica selection for fault-tolerance and load-
balancing
Gfarm File System
/gfarm
ggf jp
aist gtrc
file1 file3file2 file4
file1 file2
Replica creation
Globalnamespace
Mapping
11
Summary
14
JLDG A brief history An overview
Used as an infrastructure for daily research activity Hands on meeting on 27 Jan., 2009
Successfully done with19 attendees
Development of a New ILDG Client
15
Int’l Lattice Data Grid (ILDG)
16
A data grid for sharing Lattice QCD configuration File Formats in ILDG
Configuration binary LIME (Lattice QCD Interchange Message Encapsulation)
Metadata (QCDml) ensemble XML configuration XML
LFN (Logical File Name) Identifier for configuration binary
ensembleXML
configurationXML
configuration(binary)
LFN
configurationXML
configuration(binary)
LFN
configurationXML
configuration(binary)
LFN
configurationXML
configuration(binary)
LFNmarkovChainURI
QCDml Ensemble XML
17
<markovChain xmlns=“…"> <markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C1600</markovChainURI> <management> <revisions>1</revisions> <collaboration>CP-PACS</collaboration> <projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action)</projectName> <ensembleLabel>B1800</ensembleLabel> <reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901</reference> <archiveHistory> <elem> <revision>1</revision> <revisionAction>add</revisionAction> <participant> <name>T.Yoshie</name> <institution>Center fof Computational Sciences, University of Tsukuba</institution>
Typical Usecase of ILDG
18
Find desired data by MDC
Find nearby site by FC
Access to the site
Data transfer
LFN (Logical File Name)
SURL (Site URL)
TURL (Transfer URL)
VOMS
Authentication
Difficulties in Finding Desired Configuration
19
Directly use query language (XQuery / XPath) A simple example:
Knowledge about XML, QCDml, and XQuery (XPath) are needed.
Hard to get the whole picture of available data. Hierarchical list
Easy to use. Need huge screen to show the entire list. Still difficult to get the whole picture of the data.
/markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]
Basic Idea
20
Applying faceted-navigation interface to browse QCDml ensemble XML data.
21
Faceted-Navigation What is “faceted-navigation”?
A scheme for browsing objects with attributes. Successfully used in some applications, such as Apple iTunes.
Procedure A user select a value in a facet
To select a set of objects of interest The system updates the list of objects, list of facets, and
respective values (Repeat)
Example The Flamenco Search
http://flamenco.berkeley.edu/
The Flamenco Searchhttp://flamenco.berkeley.edu/
22
The Flamenco Searchhttp://flamenco.berkeley.edu/
23
The Flamenco Searchhttp://flamenco.berkeley.edu/
24
25
Faceted-Navigation Good features
Users have a freedom to choose a facet c.f. Hierarchical list
Give a big picture of the dataset Available values along with their population
Effective Busch’s Law: 4 facets consisting of 10 values are enough to
deal with 10,000 objects.
Technical Challenges
26
How to define facets? How to extract values according to the facets? How to achieve quick response from the database
for improving user experience?
Choosing the Facets Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe. Selected elements from QCDml ensemble XML
Regional grid Collaboration Project name Number of flavors Time Parameters
Lattice size Gluon action
Parameters Quark action
Parameters
27
Extracting Values from a Facet(1/3) Extract text values
Collaboration Project name
Need substringextraction Date
28
CP-PACSCP-PACS+JLQCDCSSMLHPCMILCRBC-UKQCDUKQCDdiketmcgralqcdsfsesamthetatxl…
2+1 DWF2+1 Dynamical AsqTADBaryon ResonancesDynamical FLIC StudiesElectromagnetic Form FactorsFLIC Overlap StudiesFlux Tube TestGluon PropagatorLong_aqstad_runPentaquark Volume Dependence…
20002005200620072008
<date>2007-02-26T21:39:33+09:00</date>
Extracting Values from a Facet(2/3) Need text value generation
Lattice sizee.g. 12 / 12 / 12 / 24
29
<physics> <size>
<elem> <name>X</name> <length>12</length></elem><elem> <name>Y</name> <length>12</length></elem><elem> <name>Z</name> <length>12</length></elem><elem> <name>T</name> <length>24</length></elem>…
Extracting Values from a Facet(2/3) Gluon action / Quark action
An element name itself represents a value Extract element name as a value of a facet
30
<action> <gluon>
<iwasakiRGGluonAction> <glossary>http://www.jldg.org/JLDG/...
<action> <gluon>
<DBW2GluonAction> <glossary>www.lqcd.org/ildg/pla...
QCDmlEnsemble (ILDG)& Configuration
(JLDG)
Facet Navigation System(PHP + SQL + XQuery)
QCDml Faceted Navigation I/FSystem Configuration
Facet Database
XML DB (eXist)RDBMS (MySQL)
Facetextraction(XQuery)
ILDG
JLDG
CSSM
LDG
UKQCD
USQCD
Web Server(Apache)
DownloadingEnsemble XML
31
Database Design (1/2)
32
Use RDBMS for quick response Use fixed relational schema for extensibility
*************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: rgrid value: cssm*************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: collaboration value: CSSM*************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: projectName value: Dynamical FLIC Studies*************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLICproperty: date value: 2007
Database Design (2/2)
33
Store preformatted text for improving rendering performance
*************************** 1. row ***************************collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300)*************************** 2. row ***************************collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:
Conclusion and Future Work
35
Conclusion Current Status of ILDG A Development of New ILDG Client
Future work Exploring more chances to apply data engineering
techniques in various e-Science fields. Data mining Data integration …