the helmholtz association project „large scale data management and analysis“ (lsdma)

13
The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung , KIT

Upload: sinead

Post on 24-Feb-2016

76 views

Category:

Documents


0 download

DESCRIPTION

The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA). Kilian Schwarz, GSI; Christopher Jung , KIT. Overview. Motivation Data Life Cycle LSDMA’s dual approach Facts and Numbers Initial Communities LSDMA, FAIR and ALICE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA)

Kilian Schwarz, GSI; Christopher Jung, KIT

Page 2: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

2 05.10.2012 Christopher Jung SCC, KIT

Overview

• Motivation• Data Life Cycle• LSDMA’s dual approach• Facts and Numbers• Initial Communities• LSDMA, FAIR and ALICE

Page 3: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

3 05.10.2012 Christopher Jung SCC, KIT

Why is Scientific Big Data important?

Honestly, I do not need to explain this to you.

Page 4: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

4 05.10.2012 Christopher Jung SCC, KIT

Examples of Scientific Big Data in non-HEP

Examples for sciences with Big Data:• Systems Biology: ~10 TB per day in high-

throughput microscopy (zebra fish embryos)• Climate simulation: 10-100 PB per year• Brain research: 1 PB per year for brain

mapping• Photon Science: XFEL 10 PB/year• and many other sciences which do know their

needs yet

Page 5: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

5 05.10.2012 Christopher Jung SCC, KIT

Challenges of Big Data

• Non-reproducibility of scientific data (or at high costs)• Current analysis methods scale poorly• Existing big data knowledge in the respective fields• Each discipline has its specific needs• Multidiscliplanary research• Metadata• Authentication and authorization (single sign-on)• Data privacy (incl. removal of private data)• “Good scientific practice”• Cost estimation for long-term archival (at different service levels)• Data preservation• Open Access• …

Page 6: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

6 05.10.2012 Christopher Jung SCC, KIT

Data Life Cycle

Inspiration for LSDMA: support the whole data life cycle!

Page 7: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

7 05.10.2012 Christopher Jung SCC, KIT

Dual approach: community-specific and generic

Data Life Cycle Labs• Joint r&d with the scientific user

communities– Optimization of the data life

cycle– Community-specific data

analysis tools and services

Data Services Integration Team• Generic r&d

– Interface between federated data infrastructures and DLCLs/communities

– Integration of data services into scientific working process

Page 8: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

8 05.10.2012 Christopher Jung SCC, KIT

Facts and numbers

• Initial project period: 1.1.2012-31.12.2016• Funded by Helmholtz Association (13 MEUR for 5 years)• To become a part of the sustainable program-oriented funding of

Helmholtz Association in 2015• Partners: 4 Helmholtz research centers, 6 universities and the

German climate research center• Leading project partner: KIT

Page 9: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

9 05.10.2012 Christopher Jung SCC, KIT

Initial communities

• Energy– Smart grids, battery research, fusion research

• Earth and Environment– Climate model, environmental satellite data

• Health– Virtual human brain map

• Key Technologies– Synchroton radiation, nanoscopy, systems biology, electron-

microscopical imaging techniques• Structure of Matter

– Photon Science: Petra 3, XFEL– FAIR@GSI (14 experiments with big and small communities)

Page 10: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

10 05.10.2012 Christopher Jung SCC, KIT

LHC Computing – Prototype for FAIR

• FAIR profits from computing experience within an already running experiment

• ALICE can test new developments in FAIR

• new FAIR developments are on the way, and to some extend they already go back to ALICE

• FAIR will play an increasing role (funding, network architecture, software development and more ...)

Page 11: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

11 05.10.2012 Christopher Jung SCC, KIT

• parallel and distributed computing– triggerless “online” system

• porting of needed algorithms to GPU

– Grid/Cloud infrastructure• enable the possibility to submit

compute jobs to Clouds– create interfaces to existing

environments (AliEn, ...)• data archives

– long term data archives• including concepts for xrootd and

gStore– meta data calatog and data analysis

To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found)

Goals for GSI/FAIR in LSDMA

• Metropolitan Area Systems– include the distributed FAIR

T0/T1 centre into a global Grid/Cloud infrastructure

– Federated Identity Management• Global Federations

– Global File System– Optimization of Data Storage

• hot versus cold data• corrupt and incomplete data sets• parallel storage• 3rd party copy

Additional synergies via DSIT

Page 12: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

12 05.10.2012 Christopher Jung SCC, KIT

Next Steps at GSI

• Advertise LSDMA positions (2 for FAIR DLCL) – do you know candidates ?– GSI DSIT already started to hire people

• Discussion with FAIR experiments and ALICE• Set-up of e-science infrastructures, first for PANDA and

CBM, based on the experiences with ALICE (AliEn/xrootd/...)

• Include smaller FAIR experiments• Continue to develop existing e-science infrastructure,

also in close collaboration with DSIT and ALICE

Page 13: The Helmholtz  Association Project  „Large  Scale  Data Management  and  Analysis“ (LSDMA)

13 05.10.2012 Christopher Jung SCC, KIT

Summary and Outlook

• There are many challenges in Scientific Big Data• LSDMA is a sustainable Helmholtz Association project, supporting

the whole data life cycle, using a community-specific and a generic approach

• FAIR is an important initial community in the research field ‘structure of matter’; several developments planned -> synergies w/ALICE

• GSI has two open job positions for LSDMA