Metadata Working Group Report

22
1 Metadata Working roup Report Members (fixed in mid-January) G.Andronico INFN,Italy P.CoddingtonAdelaide,Australia R.Edwards Jlab,USA C.Maynard Edinburgh,UK D.Pleiter DESY,Germany J.Simone FNAL,USA T.Yoshie Tsukuba,Japan B.Joo (observer) Edinburgh,UK Mailing List [email protected] p About 80 mails circulated QCDML (QCD Markup Language) for ILDG

Upload: africa

Post on 22-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Metadata Working Group Report. Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK D.PleiterDESY,Germany J.SimoneFNAL,USA T.YoshieTsukuba,Japan B.Joo (observer)Edinburgh,UK - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Metadata Working Group Report

1

Metadata Working Group Report

• Members (fixed in mid-January)G.Andronico INFN,Italy

P.Coddington Adelaide,Australia

R.Edwards Jlab,USA

C.Maynard Edinburgh,UK

D.Pleiter DESY,Germany

J.Simone FNAL,USA

T.Yoshie Tsukuba,Japan

B.Joo (observer) Edinburgh,UK

• Mailing List [email protected] – About 80 mails circulated

• QCDML (QCD Markup Language) for ILDG

Page 2: Metadata Working Group Report

2

0. Introduction

1. QCDML: Strategy and Standard Configuration Format (T.Yoshie)

2. QCDML: Physics (C.Maynard)

3. QCDML: Machine and Management (D.Pleiter)

• My proposal for QCDML

not be used in my talk may be useful for discussions

Page 3: Metadata Working Group Report

3

Strategy• QCDML: XML schema for ILDG

– write a QCDML document for each configuration

– store QCDML documents in (a) database(s)

– search/retrieve configurations

design QCDML so that developing applications is easy

• QCDML defines a minimal set of XML tags– necessary for exchanging configurations

• tags which will be searched

– researchers are usually interested in• required: physics parameters (beta,mq)

• not included: random number seed

Page 4: Metadata Working Group Report

4

Strategy (cont.)

• Each collaboration can extend QCDML and use it for own purposes

• Every collaborations are asked to provide values of all relevant QCDML tags

Page 5: Metadata Working Group Report

5

Category of QCDML

Standard configuration format (SCF)

1. Physics and parameters

2. Algorithm and status

3. Code

4. Machine

5. Management

6. Miscellaneous• finalized

• 4,5: almost finalized

• 1: discussions on-going (different opinions)

Page 6: Metadata Working Group Report

6

SCF: Strategy• Standard Format is an abstract (reference) format

for exchanging configurations

– collaborations submitting configurations to ILDG do not have to convert archived files

– some groups have already archived a lot of configurations with an original format

– each format is chosen for convenience

• Conversions will be done at a user side– two methods to convert format of configurations

• given format to the standard one via C-library

• one format to another using BinX technology

(without referring to the standard format)

Page 7: Metadata Working Group Report

7

SCF: Format• Definition of Gauge configuration

– i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t)

• employ NERSC (Gauge Connection) format– a sequence of 8-byte double precision real numbers

– coded in 32-bit IEEE numerical format

– endian is not specified

)()()()()()(,

3

1,

nnUnnnUn jjiji

i

Page 8: Metadata Working Group Report

8

SCF: Format (cont.)

• In C-program,– last index runs faster, index runs from 0

• re =0 (real part) re=1 (imaginary part)• Store first two rows (2x3) of 3x3 link matrix

– U11,U12,U13,U21,U22,U23 • mu=1,2,3,4

• x=0,1,2,...NX-1 y=0,1,2,...NY-1 z, t

]][1][1][1][][][][[ rejixyztU ]2][3][2][4][][][][[ NXNYNZNTUdouble

),,,,4,2,3( NTNZNYNXUComplex*16

Row-Column

Column-Row

Page 9: Metadata Working Group Report

9

SCF: C-library• Each collaboration submitting configurations to IL

DG prepares a C-library to read their configurations in the standard format– pointer to the C-library is stored in QCDML document

• read a hyper-cubic region– (ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1)

of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice

void ILDG_read_conf(file, NX, ix0,ix1, NY, iy0,iy1, NZ, iz0,iz1, NT, it0,it1, endian,config)

Page 10: Metadata Working Group Report

10

SCF: C-library (cont.)

the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice

(0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format

and stored in U[8][4][4][4][4][2][3][2].

main() { int NX=8,NY=8,NZ=8,NT=16 ; int endian=1 ; /* big endian, =0 for little endian */ double U[8][4][4][4][4][2][3][2] ;

ILDG_read_conf("test-file", NX,0,3, NY,4,7, NZ,4,7, NT,0,15, endian,U) ; }

Page 11: Metadata Working Group Report

11

SCF: C-library (cont.)

• in general, the conversion program requires huge memory of 1-2 configuration size:

--- memory bottleneck cannot be avoided • We propose the above interface:

– Simple

– mainly for full QCD configurations

32^3 x Nt lattice for forthcoming several years

can be handled by a high-end PC with memory of 2GB

• some extension might be necessary in future

Page 12: Metadata Working Group Report

12

SCF: BinX• BinX

– an XML schema to describe format of binary file developed by the edikt project (a part of OGSA-DIA)

http://www.edikt.org/

– software to convert one binary format to the other will be available in May, 2003

– enables us to convert configuration without referring to the standard format

• Each collaboration submitting configurations to the ILDG describes its own format by BinX– User may write his/her favorite format in BinX

Page 13: Metadata Working Group Report

13

SCF: BinX (Cont.)<dataset> <definitions> <typeDef typeName="complexDouble"> <struct> <ieeeDouble-32 varName="Real"/> <ieeeDouble-32 varName="Imaginary"/> </struct> </typeDef> <typeDef typeName="matrix2x3"> <arrayFixed> <defType typeName="complexDouble"/> <dim name="row" indexFrom="0" indexTo="1"/> <dim name="column" indexFrom="0" indexTo="2"/> </arrayFixed> </typeDef> </definitions>

Page 14: Metadata Working Group Report

14

SCF: BinX (Cont.) <file src="sample.configuration" byteOrder="bigEndian"> <arrayFixed varName="StandardGaugeConfig"> <defType typeName="matrix2x3"/> <dim name="t" indexFrom="0" indexTo="31"/> <dim name="z" indexFrom="0" indexTo="15"/> <dim name="y" indexFrom="0" indexTo="15"/> <dim name="x" indexFrom="0" indexTo="15"/> <dim name="mu" indexFrom="0" indexTo="3"/> </arrayFixed> </file></dataset>•Mechanism for describing an array split across several files

Page 15: Metadata Working Group Report

15

Distribution

• SCF defines format of only binary configuration – no parameters (size,coupling..)

– no management info (checksums, collaboration name..)

– all of them are described in a QCDML document

• Keeping identification of configuration– encapsulate the configuration and the QCDML

document into one file

– distribute it via ILDG

– (need opinions and help from the middleware working group)

Page 16: Metadata Working Group Report

16

Distribution (cont.)

• Candidate :

DIME (Direct Internet Message Encapsulation)– format is fixed (different from MIME)

header (fixed bytes) length (fixed bytes) body of data (QCDML document) length (fixed bytes) body of data (QCDML-BinX document) length (fixed bytes) body of data (configuration itself) footer (fixed bytes)

Page 17: Metadata Working Group Report

17

Distribution (cont.)

• Merits– don’t have to unpack files before reading

– file size is not increased (cf. MIME: factor 3/2 incl.)

• Discussions:– prepare a tool to extract QCDML document

– C-library has to seek the file to point the origin (the first byte) of binary configuration

– Compatibility with BinX

Page 18: Metadata Working Group Report

18

My opinion for QCDML

my opinion/proposal agreed by working group

• Physics– actions, physics parameters, lattice size

• Simulation– algorithm, machine, code, series, trajectory

• Management– revision, crc, reference, collaboration, project, action

• Pointers– site, file, C-library

Page 19: Metadata Working Group Report

19

Action• a human readable document for each action

– XML schema is powerful, but cannot describe completely the action

• Three versions– UKQCD Schema v0.5– A compromise proposal– My very simple version

• Problems in UKQCD schema– too complicated

• Action consists of operators• Operators consist of coupling and fields

– Action and operator names are XML tags

Page 20: Metadata Working Group Report

20

Action (cont.)

• My very simple version– just listing up coupling names and values

• A compromised version http://www.rccp.tsukuba.ac.jp/people/yoshie/QCDML-my-sample2.xml

– fields for each operator are removed

– names of actions and operators are described by values

– action is divided into gluon and quark sections

• enables us to include boundary conditions

Page 21: Metadata Working Group Report

21

Simulation

• Algorithm section: – we may have to prepare a human readable docu

ment– simple version is sufficient

• Machine • Code• Series

– several runs with the same parameter sets– distinguishes them

• Trajectory_or_Sweep

Page 22: Metadata Working Group Report

22

Management• Action• Checksums

– CRC32 or MD5– for binary configuration with original format

• Collaboration name and Project Name– Useful tags to search configuration

• Reference– some information not suitable to include into QCDML

• auto-correlation time

– do not have to include all references

• Revision– To check whether the QCDML document is changed