the helmholtz association project „large scale data management and analysis“ (lsdma)
DESCRIPTION
The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA). Kilian Schwarz, GSI; Christopher Jung , KIT. Overview. Motivation Data Life Cycle LSDMA’s dual approach Facts and Numbers Initial Communities LSDMA, FAIR and ALICE. - PowerPoint PPT PresentationTRANSCRIPT
The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA)
Kilian Schwarz, GSI; Christopher Jung, KIT
2 05.10.2012 Christopher Jung SCC, KIT
Overview
• Motivation• Data Life Cycle• LSDMA’s dual approach• Facts and Numbers• Initial Communities• LSDMA, FAIR and ALICE
3 05.10.2012 Christopher Jung SCC, KIT
Why is Scientific Big Data important?
Honestly, I do not need to explain this to you.
4 05.10.2012 Christopher Jung SCC, KIT
Examples of Scientific Big Data in non-HEP
Examples for sciences with Big Data:• Systems Biology: ~10 TB per day in high-
throughput microscopy (zebra fish embryos)• Climate simulation: 10-100 PB per year• Brain research: 1 PB per year for brain
mapping• Photon Science: XFEL 10 PB/year• and many other sciences which do know their
needs yet
5 05.10.2012 Christopher Jung SCC, KIT
Challenges of Big Data
• Non-reproducibility of scientific data (or at high costs)• Current analysis methods scale poorly• Existing big data knowledge in the respective fields• Each discipline has its specific needs• Multidiscliplanary research• Metadata• Authentication and authorization (single sign-on)• Data privacy (incl. removal of private data)• “Good scientific practice”• Cost estimation for long-term archival (at different service levels)• Data preservation• Open Access• …
6 05.10.2012 Christopher Jung SCC, KIT
Data Life Cycle
Inspiration for LSDMA: support the whole data life cycle!
7 05.10.2012 Christopher Jung SCC, KIT
Dual approach: community-specific and generic
Data Life Cycle Labs• Joint r&d with the scientific user
communities– Optimization of the data life
cycle– Community-specific data
analysis tools and services
Data Services Integration Team• Generic r&d
– Interface between federated data infrastructures and DLCLs/communities
– Integration of data services into scientific working process
8 05.10.2012 Christopher Jung SCC, KIT
Facts and numbers
• Initial project period: 1.1.2012-31.12.2016• Funded by Helmholtz Association (13 MEUR for 5 years)• To become a part of the sustainable program-oriented funding of
Helmholtz Association in 2015• Partners: 4 Helmholtz research centers, 6 universities and the
German climate research center• Leading project partner: KIT
9 05.10.2012 Christopher Jung SCC, KIT
Initial communities
• Energy– Smart grids, battery research, fusion research
• Earth and Environment– Climate model, environmental satellite data
• Health– Virtual human brain map
• Key Technologies– Synchroton radiation, nanoscopy, systems biology, electron-
microscopical imaging techniques• Structure of Matter
– Photon Science: Petra 3, XFEL– FAIR@GSI (14 experiments with big and small communities)
10 05.10.2012 Christopher Jung SCC, KIT
LHC Computing – Prototype for FAIR
• FAIR profits from computing experience within an already running experiment
• ALICE can test new developments in FAIR
• new FAIR developments are on the way, and to some extend they already go back to ALICE
• FAIR will play an increasing role (funding, network architecture, software development and more ...)
11 05.10.2012 Christopher Jung SCC, KIT
• parallel and distributed computing– triggerless “online” system
• porting of needed algorithms to GPU
– Grid/Cloud infrastructure• enable the possibility to submit
compute jobs to Clouds– create interfaces to existing
environments (AliEn, ...)• data archives
– long term data archives• including concepts for xrootd and
gStore– meta data calatog and data analysis
To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found)
Goals for GSI/FAIR in LSDMA
• Metropolitan Area Systems– include the distributed FAIR
T0/T1 centre into a global Grid/Cloud infrastructure
– Federated Identity Management• Global Federations
– Global File System– Optimization of Data Storage
• hot versus cold data• corrupt and incomplete data sets• parallel storage• 3rd party copy
Additional synergies via DSIT
12 05.10.2012 Christopher Jung SCC, KIT
Next Steps at GSI
• Advertise LSDMA positions (2 for FAIR DLCL) – do you know candidates ?– GSI DSIT already started to hire people
• Discussion with FAIR experiments and ALICE• Set-up of e-science infrastructures, first for PANDA and
CBM, based on the experiences with ALICE (AliEn/xrootd/...)
• Include smaller FAIR experiments• Continue to develop existing e-science infrastructure,
also in close collaboration with DSIT and ALICE
13 05.10.2012 Christopher Jung SCC, KIT
Summary and Outlook
• There are many challenges in Scientific Big Data• LSDMA is a sustainable Helmholtz Association project, supporting
the whole data life cycle, using a community-specific and a generic approach
• FAIR is an important initial community in the research field ‘structure of matter’; several developments planned -> synergies w/ALICE
• GSI has two open job positions for LSDMA