Metadata Working Group Report
DESCRIPTION
Metadata Working Group Report. Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK D.PleiterDESY,Germany J.SimoneFNAL,USA T.YoshieTsukuba,Japan B.Joo (observer)Edinburgh,UK - PowerPoint PPT PresentationTRANSCRIPT
1
Metadata Working Group Report
• Members (fixed in mid-January)G.Andronico INFN,Italy
P.Coddington Adelaide,Australia
R.Edwards Jlab,USA
C.Maynard Edinburgh,UK
D.Pleiter DESY,Germany
J.Simone FNAL,USA
T.Yoshie Tsukuba,Japan
B.Joo (observer) Edinburgh,UK
• Mailing List [email protected] – About 80 mails circulated
• QCDML (QCD Markup Language) for ILDG
2
0. Introduction
1. QCDML: Strategy and Standard Configuration Format (T.Yoshie)
2. QCDML: Physics (C.Maynard)
3. QCDML: Machine and Management (D.Pleiter)
• My proposal for QCDML
not be used in my talk may be useful for discussions
3
Strategy• QCDML: XML schema for ILDG
– write a QCDML document for each configuration
– store QCDML documents in (a) database(s)
– search/retrieve configurations
design QCDML so that developing applications is easy
• QCDML defines a minimal set of XML tags– necessary for exchanging configurations
• tags which will be searched
– researchers are usually interested in• required: physics parameters (beta,mq)
• not included: random number seed
4
Strategy (cont.)
• Each collaboration can extend QCDML and use it for own purposes
• Every collaborations are asked to provide values of all relevant QCDML tags
5
Category of QCDML
Standard configuration format (SCF)
1. Physics and parameters
2. Algorithm and status
3. Code
4. Machine
5. Management
6. Miscellaneous• finalized
• 4,5: almost finalized
• 1: discussions on-going (different opinions)
6
SCF: Strategy• Standard Format is an abstract (reference) format
for exchanging configurations
– collaborations submitting configurations to ILDG do not have to convert archived files
– some groups have already archived a lot of configurations with an original format
– each format is chosen for convenience
• Conversions will be done at a user side– two methods to convert format of configurations
• given format to the standard one via C-library
• one format to another using BinX technology
(without referring to the standard format)
7
SCF: Format• Definition of Gauge configuration
– i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t)
• employ NERSC (Gauge Connection) format– a sequence of 8-byte double precision real numbers
– coded in 32-bit IEEE numerical format
– endian is not specified
)()()()()()(,
3
1,
nnUnnnUn jjiji
i
8
SCF: Format (cont.)
• In C-program,– last index runs faster, index runs from 0
• re =0 (real part) re=1 (imaginary part)• Store first two rows (2x3) of 3x3 link matrix
– U11,U12,U13,U21,U22,U23 • mu=1,2,3,4
• x=0,1,2,...NX-1 y=0,1,2,...NY-1 z, t
]][1][1][1][][][][[ rejixyztU ]2][3][2][4][][][][[ NXNYNZNTUdouble
),,,,4,2,3( NTNZNYNXUComplex*16
Row-Column
Column-Row
9
SCF: C-library• Each collaboration submitting configurations to IL
DG prepares a C-library to read their configurations in the standard format– pointer to the C-library is stored in QCDML document
• read a hyper-cubic region– (ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1)
of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice
void ILDG_read_conf(file, NX, ix0,ix1, NY, iy0,iy1, NZ, iz0,iz1, NT, it0,it1, endian,config)
10
SCF: C-library (cont.)
the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice
(0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format
and stored in U[8][4][4][4][4][2][3][2].
main() { int NX=8,NY=8,NZ=8,NT=16 ; int endian=1 ; /* big endian, =0 for little endian */ double U[8][4][4][4][4][2][3][2] ;
ILDG_read_conf("test-file", NX,0,3, NY,4,7, NZ,4,7, NT,0,15, endian,U) ; }
11
SCF: C-library (cont.)
• in general, the conversion program requires huge memory of 1-2 configuration size:
--- memory bottleneck cannot be avoided • We propose the above interface:
– Simple
– mainly for full QCD configurations
32^3 x Nt lattice for forthcoming several years
can be handled by a high-end PC with memory of 2GB
• some extension might be necessary in future
12
SCF: BinX• BinX
– an XML schema to describe format of binary file developed by the edikt project (a part of OGSA-DIA)
http://www.edikt.org/
– software to convert one binary format to the other will be available in May, 2003
– enables us to convert configuration without referring to the standard format
• Each collaboration submitting configurations to the ILDG describes its own format by BinX– User may write his/her favorite format in BinX
13
SCF: BinX (Cont.)<dataset> <definitions> <typeDef typeName="complexDouble"> <struct> <ieeeDouble-32 varName="Real"/> <ieeeDouble-32 varName="Imaginary"/> </struct> </typeDef> <typeDef typeName="matrix2x3"> <arrayFixed> <defType typeName="complexDouble"/> <dim name="row" indexFrom="0" indexTo="1"/> <dim name="column" indexFrom="0" indexTo="2"/> </arrayFixed> </typeDef> </definitions>
14
SCF: BinX (Cont.) <file src="sample.configuration" byteOrder="bigEndian"> <arrayFixed varName="StandardGaugeConfig"> <defType typeName="matrix2x3"/> <dim name="t" indexFrom="0" indexTo="31"/> <dim name="z" indexFrom="0" indexTo="15"/> <dim name="y" indexFrom="0" indexTo="15"/> <dim name="x" indexFrom="0" indexTo="15"/> <dim name="mu" indexFrom="0" indexTo="3"/> </arrayFixed> </file></dataset>•Mechanism for describing an array split across several files
15
Distribution
• SCF defines format of only binary configuration – no parameters (size,coupling..)
– no management info (checksums, collaboration name..)
– all of them are described in a QCDML document
• Keeping identification of configuration– encapsulate the configuration and the QCDML
document into one file
– distribute it via ILDG
– (need opinions and help from the middleware working group)
16
Distribution (cont.)
• Candidate :
DIME (Direct Internet Message Encapsulation)– format is fixed (different from MIME)
header (fixed bytes) length (fixed bytes) body of data (QCDML document) length (fixed bytes) body of data (QCDML-BinX document) length (fixed bytes) body of data (configuration itself) footer (fixed bytes)
17
Distribution (cont.)
• Merits– don’t have to unpack files before reading
– file size is not increased (cf. MIME: factor 3/2 incl.)
• Discussions:– prepare a tool to extract QCDML document
– C-library has to seek the file to point the origin (the first byte) of binary configuration
– Compatibility with BinX
18
My opinion for QCDML
my opinion/proposal agreed by working group
• Physics– actions, physics parameters, lattice size
• Simulation– algorithm, machine, code, series, trajectory
• Management– revision, crc, reference, collaboration, project, action
• Pointers– site, file, C-library
19
Action• a human readable document for each action
– XML schema is powerful, but cannot describe completely the action
• Three versions– UKQCD Schema v0.5– A compromise proposal– My very simple version
• Problems in UKQCD schema– too complicated
• Action consists of operators• Operators consist of coupling and fields
– Action and operator names are XML tags
20
Action (cont.)
• My very simple version– just listing up coupling names and values
• A compromised version http://www.rccp.tsukuba.ac.jp/people/yoshie/QCDML-my-sample2.xml
– fields for each operator are removed
– names of actions and operators are described by values
– action is divided into gluon and quark sections
• enables us to include boundary conditions
21
Simulation
• Algorithm section: – we may have to prepare a human readable docu
ment– simple version is sufficient
• Machine • Code• Series
– several runs with the same parameter sets– distinguishes them
• Trajectory_or_Sweep
22
Management• Action• Checksums
– CRC32 or MD5– for binary configuration with original format
• Collaboration name and Project Name– Useful tags to search configuration
• Reference– some information not suitable to include into QCDML
• auto-correlation time
– do not have to include all references
• Revision– To check whether the QCDML document is changed