metadata (data about data) gridpp-15, paul millar

15
Metadata (data about data) GridPP-15, Paul Millar

Upload: ella-keith

Post on 28-Mar-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Metadata (data about data) GridPP-15, Paul Millar

Metadata

(data about data)

GridPP-15, Paul Millar

Page 2: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 2

Contents

Monitoring metadata service Event-level metadata Work improving AMI Cataloguing ATLAS metadata Conclusions

Page 3: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 3

Monitoring a metadata service

Requirements document has been released Why do people want to

monitor metadata services? How to make that happen? What already exist out there? What still needs to be done?

Page 4: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 4

JMX – servlet monitoring

Page 5: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 5

Monitoring Architecture

Page 6: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 6

MonAMI architecture

Page 7: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 7

MonAMI summary

Follows the UNIX “do one thing well” philosophy. Takes data from from a site and pushes it

somewhere. Plugin architecture: easy to add extra targets Easy to configure (really!) Multiligual: currently speaks Ganglia, but will

(soon) speak Nagios, LEMON, R-GMA?, ... Monitors: Apache, tomcat/JMX, MySQL, ...

Page 8: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 8

ATLAS Event Level Metadata• Event tag infrastructure is being developed to:

– Allow physicists to exclude uninteresting events from data sample

– Extract samples of specific interest into a smaller fileset, for repeated running

– Provide a global view of the data, useful for data mining • Event tag infrastructure is currently under review• Content & usage of event tags under discussion with

physics groups• Tags produced from Rome data for 2.3 million events,

stored at CERN– https://uimon.cern.ch/twiki/bin/view/Atlas/RomeTagFAQ

Page 9: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 9

Integration with DDM• For event tag infrastructure to be used successfully, it

must be able to work with ATLAS Distributed Data Management system (DQ2)

• DQ2 uses datasets as basic units of data manipulation

• Event tag tools currently refer only to files• Currently working on ways to solve this problem

DQ2

Dataset 1

File 1

File 2

File 3

Tag browser

File 1

File 3

Page 10: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 10

Cataloguing Event Tags

• Event tags can now be built and written to a master database.– This may be replicated to T1 and T2s– Different physics groups may build their own

tags• Geographically diverse locations for data.

• How should these be catalogued? – Many open questions (e.g., should AOD and

Tag be distributed together, and how?)– Aim to build on prototype CollectionCatalog, to

design and implement Event Tag Catalogue

Page 11: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 11

AMI VOMS

Completed requirements analysis of the impact of VOMS on AMI.

AMI will use ATLAS-wide agreed VOMS groups and roles.

Mapping of VOMS groups to AMI decided. Problem of getting the proxy certificate into AMI

... using your web browser... Solutions? mod_gridsite or MyProxy.

Page 12: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 12

AMI and SQLite

Page 13: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 13

Cataloguing metadata Looking at ATLAS, there's metadata in:

DDM (several databases) Production system (ProdDB) Event-level (tag) database(s) COOL (used for correlation of DAQ and run Numbers) AMI Catalogues datasets by physics metadata. Detector Description (file catalogues, but should be able to ignore these)

Risk of information being duplicated, unknown or impossible to get at.

Implement easy navigation between different metadata

Page 14: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 14

Summary

The metadata collaboration's work ongoing with various aspects of metadata.

Monitoring requirements document released. Opportunity for working together for developing

additional monitoring tools. AMI will soon support off-line analysis How to implement event-level metadata is going

through discussion stage and into prototyping.

Page 15: Metadata (data about data) GridPP-15, Paul Millar

2006-01-11 GridPP-15 Metadata 15

Questions

Comments

Thoughts