future developments: storage

8
Future developments: storage Wahid Bhimji

Upload: kieve

Post on 22-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Future developments: storage. Wahid Bhimji. Xrootd testing. Xrootd as a file access protocol used by HEP that offers both performance in file access as well as failover / redirection. Recently DPM support of this improved. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Future developments: storage

Future developments: storage

Wahid Bhimji

Page 2: Future developments: storage

Xrootd testing

• Xrootd as a file access protocol used by HEP that offers both performance in file access as well as failover / redirection. Recently DPM support of this improved.

• ECDF (and Glasgow) now use xrootd copying instead of DPM’s legacy protocol rfio.

• We are involved in testing the redirection aspects for ATLAS (“FAX”) too.

• http / WebDav offers a more widely used alternative

Page 3: Future developments: storage

Regional redirectors

Page 4: Future developments: storage

4

Federation traffic Modest levels nowwill grow when in production

• In fact inc. local traffic UK sites dominate• Oxford and ECDF switched to xrootd for

local traffic

Page 5: Future developments: storage

5

Systematic FDR load tests in progress

EU cloud results

destinationevents/s BNL-ATLAS CERN-PROD ECDF ROMA1 QMUL

source

BNL-ATLAS 126.76 29.4 25.1 26.05 57.26CERN-PROD 82.68 232.52 108.46 123.52 145.96

ECDF 80.68 56.06 252.39 62.83 145.18ROMA1 32 73.66 23.95 197.01 49.72QMUL 41.34 24.14 52.2 99.43 105.46

MB/s BNL-ATLAS CERN-PROD ECDF ROMA1 QMUL

source

BNL-ATLAS 13.07 3.03 2.61 2.65 5.84CERN-PROD 8.36 23.26 11.02 12.71 14.68

ECDF 8.23 5.64 25.14 6.52 14.42ROMA1 3.15 7.49 2.47 20.77 4.79QMUL 4.26 2.6 5.33 9.65 10.38

BNL-ATLAS CERN-PROD ECDF ROMA1 QMUL0

5

10

15

20

25

30

Read 10% events 30MB TTCBNL-ATLAS

CERN-PROD

ECDF

ROMA1

QMUL

Source

MB/

s

SlideStolen from I. Vukotic

Absolute values not important (Affected by CPU /HTEtc.) and setup

Point is remoteread can be good but varies

Page 6: Future developments: storage

Other stuff

• Puppet: Testing DPM modules for ECDF storage– But: we don’t use for WNs or anything else …

• S3: (with Imperial Swift instance not ECDF) – DPM integration: some problems with accessing swift

storage - new development version to test…– Access of files from cluster via ROOT - not done yet

• SRM: making ECDF a non-SRM site for Atlas– As part of WLCG “Storage Interfaces” Working Group– Stage-out; FTS3 copies; space reporting– all in progress

Page 7: Future developments: storage

“HEPDOOP” – a proposal • “Big data” – not a buzzword: plenty of industry activity • HEP uses little of the same tools• HEPDOOP bridges the divide

1st Phase:, 1 year : Technical review via demonstrators• Workshops with interspersed development activities• Use-case focused: Deliver ATLAS Higgs analysis with non-HEP tools • Milestones:

– BigData Workshop Imperial 28th June– CHEP2013 (poster + possible birds-of-a-feather session)

2nd Phase: Possible ongoing activity providing a technical-level bridge between GridPP and wider Big Data communities: • Continuing interoperability in the case of common aims • Delivering advanced data processing and management tools for HEP, wider

academia, and industry.

Page 8: Future developments: storage

Initial development areas

Processing Step Serialization Map / Reduce Data Mining 

Statistical Analysis 

HEP Ntuple /TTree making Skimming/slimming “Cut-and-count”MVA

RooStats 

Possible technologies Google protobuf, dremel(a4 , drillbit)

Hadoop Apache Mahout PythonScikit-learn

Ntuple making

Data Filtering

Skimming/Slimming

Data Mining

Cuts Multivariate Analyses

Statistical Analysis

Visualisation

Starting with skimming and mining

Python (scikit) version of H->bb analysis implemented

Next step: map / reduce skimmingcode on local Hadoop cluster(or cloud resources)

Principle: focus on ease-of-use and access to wide community not (just) performance

Typical HEP analysis flow: