future developments: storage
DESCRIPTION
Future developments: storage. Wahid Bhimji. Xrootd testing. Xrootd as a file access protocol used by HEP that offers both performance in file access as well as failover / redirection. Recently DPM support of this improved. - PowerPoint PPT PresentationTRANSCRIPT
Future developments: storage
Wahid Bhimji
Xrootd testing
• Xrootd as a file access protocol used by HEP that offers both performance in file access as well as failover / redirection. Recently DPM support of this improved.
• ECDF (and Glasgow) now use xrootd copying instead of DPM’s legacy protocol rfio.
• We are involved in testing the redirection aspects for ATLAS (“FAX”) too.
• http / WebDav offers a more widely used alternative
Regional redirectors
4
Federation traffic Modest levels nowwill grow when in production
• In fact inc. local traffic UK sites dominate• Oxford and ECDF switched to xrootd for
local traffic
5
Systematic FDR load tests in progress
EU cloud results
destinationevents/s BNL-ATLAS CERN-PROD ECDF ROMA1 QMUL
source
BNL-ATLAS 126.76 29.4 25.1 26.05 57.26CERN-PROD 82.68 232.52 108.46 123.52 145.96
ECDF 80.68 56.06 252.39 62.83 145.18ROMA1 32 73.66 23.95 197.01 49.72QMUL 41.34 24.14 52.2 99.43 105.46
MB/s BNL-ATLAS CERN-PROD ECDF ROMA1 QMUL
source
BNL-ATLAS 13.07 3.03 2.61 2.65 5.84CERN-PROD 8.36 23.26 11.02 12.71 14.68
ECDF 8.23 5.64 25.14 6.52 14.42ROMA1 3.15 7.49 2.47 20.77 4.79QMUL 4.26 2.6 5.33 9.65 10.38
BNL-ATLAS CERN-PROD ECDF ROMA1 QMUL0
5
10
15
20
25
30
Read 10% events 30MB TTCBNL-ATLAS
CERN-PROD
ECDF
ROMA1
QMUL
Source
MB/
s
SlideStolen from I. Vukotic
Absolute values not important (Affected by CPU /HTEtc.) and setup
Point is remoteread can be good but varies
Other stuff
• Puppet: Testing DPM modules for ECDF storage– But: we don’t use for WNs or anything else …
• S3: (with Imperial Swift instance not ECDF) – DPM integration: some problems with accessing swift
storage - new development version to test…– Access of files from cluster via ROOT - not done yet
• SRM: making ECDF a non-SRM site for Atlas– As part of WLCG “Storage Interfaces” Working Group– Stage-out; FTS3 copies; space reporting– all in progress
“HEPDOOP” – a proposal • “Big data” – not a buzzword: plenty of industry activity • HEP uses little of the same tools• HEPDOOP bridges the divide
1st Phase:, 1 year : Technical review via demonstrators• Workshops with interspersed development activities• Use-case focused: Deliver ATLAS Higgs analysis with non-HEP tools • Milestones:
– BigData Workshop Imperial 28th June– CHEP2013 (poster + possible birds-of-a-feather session)
2nd Phase: Possible ongoing activity providing a technical-level bridge between GridPP and wider Big Data communities: • Continuing interoperability in the case of common aims • Delivering advanced data processing and management tools for HEP, wider
academia, and industry.
Initial development areas
Processing Step Serialization Map / Reduce Data Mining
Statistical Analysis
HEP Ntuple /TTree making Skimming/slimming “Cut-and-count”MVA
RooStats
Possible technologies Google protobuf, dremel(a4 , drillbit)
Hadoop Apache Mahout PythonScikit-learn
?
Ntuple making
Data Filtering
Skimming/Slimming
Data Mining
Cuts Multivariate Analyses
Statistical Analysis
Visualisation
Starting with skimming and mining
Python (scikit) version of H->bb analysis implemented
Next step: map / reduce skimmingcode on local Hadoop cluster(or cloud resources)
Principle: focus on ease-of-use and access to wide community not (just) performance
Typical HEP analysis flow: