ophidia ictp trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12....

25
The Ophidia framework: toward big data analy7cs for climate change Dr. Sandro Fiore Leader, Scientific data management research group Scientific Computing Division @ CMCC Prof. Giovanni Aloisio Director, Scientific Computing Division @ CMCC

Upload: others

Post on 20-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

The  Ophidia  framework:  toward  big  data  analy7cs  for  climate  change  

Dr. Sandro Fiore Leader, Scientific data management research group Scientific Computing Division @ CMCC Prof. Giovanni Aloisio Director, Scientific Computing Division @ CMCC

Page 2: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

A set of requirements have been discussed with our scientists to identify the key data analytics needs.    

Preliminary  requirements  and  needs  focus  on:    Time  series  analysis    Data  reduc6on  (e.g.  by  aggrega6on)    Data  subse<ng   Model  intercomparison   Mul6model  means    Data  transforma6on  (through  array-­‐based  primi6ves)    Param.  Sweep  experiments  (same  task  applied  on  a  set  of  data)    Comparison  between  historical  data  and  future  scenarios   Maps  genera6on    Ensemble  analysis    Data  analy6cs  worflow  support  But  also…    Performance,  re-­‐usability,  extensibility  

Data analytics requirements!

Page 3: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

The Ophidia Project  

S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams, G. Aloisio, “Ophidia: toward bigdata analytics for eScience”, ICCS2013 Conference, Procedia Elsevier, Barcelona, June 5-7, 2013

Ophidia is a research effort carried out at the Euro Mediterranean Centre on Climate Change (CMCC) to address “big data” challenges, issues and requirements for climate change data analytics

Page 4: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Ophidia Architecture 1.0  

Front end  

Compute layer  

I/O layer  

I/O server instance  

Storage layer  

System catalog  

Array-based primitives  

Analytics Framework

Standard interfaces

Partitioning/hierarchical data mng

Declarative language  

New storage model  

Page 5: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Storage model (dimension-independent) & implementation !Array-based support and hierarchical storage !

Parallel I/O  

Page 6: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

OPERATOR NAME OPERATOR DESCRIPTION Operators “Data processing” – Domain-agnostic

OPH_APPLY(datacube_in, datacube_out,

array_based_primitive)

Creates the datacube_out by applying the array-based primitive to the

datacube_in OPH_DUPLICATE(datacube_

in, datacube_out) Creates a copy of the datacube_in in

the datacube_out OPH_SUBSET(datacube_in, subset_string, datacube_out)

Creates the datacube_out by doing a sub-setting of the datacube_in by

applying the subset_string OPH_MERGE(datacube_in, merge_param, datacube_out)

Creates the datacube_out by merging groups of merge_param fragments

from datacube_in OPH_SPLIT(datacube_in, split_param, datacube_out)

Creates the datacube_out by splitting into groups of split_param fragments

each fragment of the datacube_in OPH_INTERCOMPARISON (datacube_in1, datacube_in2,

datacube_out)

Creates the datacube_out which is the element-wise difference between datacube_in1 and datacube_in2

OPH_DELETE(datacube_in) Removes the datacube_in

OPERATOR NAME OPERATOR DESCRIPTION Operators “Data processing” – Domain-oriented

OPH_EXPORT_NC (datacube_in, file_out)

Exports the datacube_in data into the file_out NetCDF file.

OPH_IMPORT_NC (file_in, datacube_out)

Imports the data stored into the file_in NetCDF file into the new datacube_in

datacube Operators “Data access”

OPH_INSPECT_FRAG (datacube_in, fragment_in)

Inspects the data stored in the fragment_in from the datacube_in

OPH_PUBLISH(datacube_in) Publishes the datacube_in fragments into HTML pages

Operators “Metadata” OPH_CUBE_ELEMENTS

(datacube_in) Provides the total number of the

elements in the datacube_in OPH_CUBE_SIZE

(datacube_in) Provides the disk space occupied by the

datacube_in OPH_LIST(void) Provides the list of available datacubes.

OPH_CUBEIO(datacube_in) Provides the provenance information related to the datacube_in

OPH_FIND(search_param) Provides the list of datacubes matching the search_param criteria

Metadata management (sequential and parallel operators)

Data processing (parallel operators, MPI &

OpenMP based)  

Data Access (sequential and parallel operators)  

The analytics framework: datacube operators (about 50)!

Import/Export (parallel operators)

Page 7: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

The analytics framework: datacube operators!

Page 8: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

The analytics framework: datacube operators!

Page 9: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

The analytics framework: datacube operators!

operator=oph_apply;datacube_input=cube_in;datacube_output=cube_out;query=array_based_primi/ve  

Page 10: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

•  The array data type support is not enough to provide scientific data

management capabilities… primitives are needed as well.

•  A set of array-based primitives have been implemented

•  A primitive is applied to a single array

•  They come in the form of plugins (I/O server extensions)

•  So far, Ophidia provides a wide set of plugins (about 100) to perform

data reduction (by aggregation), sub-setting, predicates evaluation,

statistical analysis, compression, and so forth. •  Support is provided both for byte-oriented and bit-oriented arrays

•  Plugins can be nested to get more complex functionalities

•  Compression is provided as a primitive too

Array based primitives!

Page 11: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Array based primitives: OPH_MATH (“SIGN”)  oph_math(measure, “OPH_SIGN”, "OPH_DOUBLE”)

Single chunk or fragment (input)   Single chunk or fragment (output)  

Page 12: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Array based primitives: OPH_BOXPLOT  oph_boxplot(measure, "OPH_DOUBLE”)

Single chunk or fragment (input)   Single chunk or fragment (output)  

Page 13: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

oph_boxplot(oph_subarray(oph_uncompress(measure), 1,18), "OPH_DOUBLE”)

subarray(measure, 1,18)  

Array based primitives: nesting feature  

Single chunk or fragment (input)   Single chunk or fragment (output)  

Page 14: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Array based primitives: oph_aggregate  oph_aggregate(measure,"oph_avg”)

Ver6cal  aggrega6on  

Single chunk or fragment (input)  

Single chunk or fragment (output)  

Page 15: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

•  Mathematical primitives:

oph_math, oph_mul_scalar, oph_sum_array, oph_sum_scalar, oph_sum_array_r, oph_gsl_sd, oph_gsl_complex_get_abs,

oph_gsl_complex_get_arg, oph_gsl_complex_get_imag, oph_gsl_complex_get_real, oph_gsl_complex_to_polar,

oph_gsl_complex_to_rect, oph_gsl_dwt, oph_gsl_fft, oph_gsl_idwt, oph_gsl_ifft, oph_gsl_quantile, oph_gsl_sort,

oph_gsl_stats, oph_gsl_histogram, oph_gsl_boxplot, oph_petsc_vec_norm, oph_petsc_vec, oph_petsc_vec_r,

oph_sum_scalar2, oph_mul_scalar2, oph_compare,…

•  Data transformations:

oph_to_bin, oph_to_bit, oph_permute, oph_reverse, oph_dump, oph_convert_d, oph_shift, oph_rotate, oph_concat,

oph_predicate, oph_roll_up, oph_get_subarray, oph_get_subarray2, oph_get_subarray3, oph_mask, oph_cast,

oph_drill_down (stored procedure), oph_reduce, oph_reduce2, oph_reduce3, oph_aggregate_operator, oph_operator,

oph_aggregate_stats, oph_extract,…

•  Compression:

oph_compress, oph_uncompress, oph_compress2, oph_uncompress2, oph_id_to_index, oph_size_array, oph_count_array,

oph_find, oph_find2, …

•  Bit measures processing:

oph_bit_aggregate, oph_bit_import, oph_bit_export, oph_bit_size, oph_bit_count, oph_bit_dump, oph_bit_operator,

oph_bit_not, oph_bit_shift, oph_bit_rotate, oph_bit_reverse, oph_bit_subarray, oph_bit_subarray2, oph_bit_concat,

oph_bit_find, oph_bit_reduce, …

•  Integration of numerical libraries: math, GNU GSL, PetsC.

Ophidia primitives  

Integration of some CDO operators

Page 16: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

•  The  Ophidia  terminal  provides  an  effec6ve  and  lightweight  way  to  interact  with  the  Ophidia  server  •  Bash-­‐like  environment  (commands  interpreter)  •  Terminal  with  history  management,  auto-­‐comple6on,  specific  environment  variables  and  commands  

with  integrated  help…  •  Easy  installa6on  as  an  only  one  executable  using  a  small  number  of  well-­‐known  and  open-­‐source  

libraries  •  More  than  15  KLOC  •  Simple  enough  for  a  novice  and  at  the  same  6me  powerful  enough  for  an  expert  

The Ophidia Terminal  

Page 17: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

PRE-DEFINED VARIABLES (user-defined variables possible) OPH_TERM_PS1  =  "color  of  the  prompt”  OPH_USER  =  "username”  OPH_PASSWD  =  "password”  OPH_SERVER_HOST  =  "server  address”  OPH_SERVER_PORT  =  "server  port”  OPH_SESSION_ID  =  "current  session”  OPH_EXEC_MODE  =  "execu6on  mode”  OPH_NCORES  =  "number  of  cores”  OPH_CWD  =  "current  working  directory”  OPH_DATACUBE  =  "current  datacube”  OPH_TERM_VIEWER  =  "output  renderer”  OPH_TERM_IMGS  =  "save  and/or  auto-­‐open  images"  

PRE-DEFINED COMMANDS (user-defined aliases possible) help  =  "get  the  descrip6on  of  a  command  or  variable”  history  =  "manage  the  history  of  commands”  env  =  "list  environment  variables”  setenv  =  "set  or  change  the  value  of  a  variable”  unsetenv  =  "unset  an  environment  variable”  getenv  =  "print  the  value  of  a  variable”  quit  =  "quit  Oph_Term”  exit  =  "quit  Oph_Term”  clear  =  "clear  screen”  update  =  "update  local  XML  files”  resume  =  "resume  session”  view  =  "view  jobs  output”  alias  =  "list  aliases”  setalias  =  "set  or  change  the  value  of  an  alias”  unsetalias  =  "unset  an  alias”  getalias  =  "print  the  value  of  an  alias"  

Variables and commands  

Page 18: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Terminal: (system) metadata information  

Page 19: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Terminal: looking into the storage back-end  

Page 20: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Terminal: provenance (II)  

Page 21: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

“n”D  base  cuboid  

“0”D  apex  cuboid  

Ideal  path  

FRAGMENT1  –    1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1   24,35  x1  

FRAGMENT1  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10  

FRAGMENT2  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10  

FRAGMENT3  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10  

FRAGMENT4  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10^  

x64  

1,95   8,64   10,47   …   16,11  

14,81   18,14   19,93   …   24,35  

…  

6,87   10,99   12,85   …   16,93  

1,95   8,64   10,47   …   16,11  

14,81   18,14   19,93   …   24,35  

…  

6,87   10,99   12,85   …   16,93  

FRAG64  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10^4  

1,95   8,64   …   16,11  

14,81   18,14   …   24,35  

…  

6,87   10,99   …   16,93  

Ideal  path  “3”D  base  cuboid  

“0”D  apex  cuboid  

½  Terabyte  Datacube  

Preliminary performance insights !on a massive “in-memory” data reduction workflow  

Page 22: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

AGGREGATE  ALL  MAX    

FRAGMENT1  –    1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1   24,35  x1  

FRAGMENT1  –  64  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

2  

…  

64  

19,78  

12,87  

…  

24,35  x1  

MERGE  ALL  FRAGMENT  1  –    

1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

x64  

FRAGMENT  2  –    1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

FRAGMENT  3  –    1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

FRAGMENT  4  –    1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1   24,35  

FRAGMENT  64  –    1  TUPLE  x  1  ELEM.  

ID     MEASURE  

1   24,35  

AGGREGATE  ALL  MAX    

FRAGMENT1  –  10^4  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

2  

…  

104  

FRAGMENT2  –  10^4  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

2  

…  

104  

FRAGMENT3  –  10^4  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

2  

…  

104  

FRAGMENT4  –  10^4  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

2  

…  

10^  

FRAGMENT  64  –  10^4  TUPLE  x  1  ELEM.  

ID     MEASURE  

1  

2  

…  

10^4  

16,11  

24,35  

…  

FRAGMENT1  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10  

FRAGMENT2  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10  

FRAGMENT3  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10  

FRAGMENT4  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10^  

x64  

1,95   8,64   10,47   …   16,11  

14,81   18,14   19,93   …   24,35  

…  

6,87   10,99   12,85   …   16,93  

1,95   8,64   10,47   …   16,11  

14,81   18,14   19,93   …   24,35  

…  

6,87   10,99   12,85   …   16,93  

FRAG64  –  10^4  TUPLE  x  10^5  ELEMENTS  

ID     MEASURE  

1  

2  

…  

10^4  

1,95   8,64   …   16,11  

14,81   18,14   …   24,35  

…  

6,87   10,99   …   16,93  

REDUCE  ALL  MAX    

16,93  x64  

Real  path  through  4  Ophidia    operators  

Ideal  path  “3”D  base  cuboid  

“0”D  apex  cuboid  

REDUCE AGGREGATE MERGE AGGREGATE TOTAL

2,30 sec 0,17 sec 0,90 sec 0,06 sec 3,43 sec 512GB     5,12  MB   1KB   1KB   8Bytes  

Preliminary performance insights on a !massive “in-memory” data reduction workflow  

Page 23: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

•  The Ophidia framework provides: –  an internal storage model for multidimensional data

–  about 100 array-based primitives

–  about 50 operators for data analytics and metadata management

–  a Web Service (WS-I+) based server interface

•  Client application available as user interface –  Complete bash-like environment, production-level, lightweight, well-documented

•  APIs available at 5 different levels of the stack –  server front-end, I/O server, storage device, primitives, operators

•  Performance evaluation still ongoing

•  Beta release available in December –  Website, documentation, github repository

Summary  

Page 24: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

[1] G. Aloisio, S. Fiore, I. Foster, D. N. Williams , “Scientific big data analytics challenges at large scale”, Big Data and Extreme-scale Computing (BDEC), April 30 to May 01, 2013, Charleston, USA (position paper).

[2] S. Fiore, A. D'Anca, C. Palazzo, I. Foster, Dean N. Williams, Giovanni Aloisio, “Ophidia: Toward Big Data Analytics for eScience”, ICCS 2013, June 5-7, 2013 Barcelona, Spain, Procedia Computer Science, Elsevier, pp. 2376-2385.

[3] S. Fiore, C. Palazzo, A. D’Anca, I. Foster, D. N. Williams, G. Aloisio, “A big data analytics framework for scientific data management”, Workshop on “Big Data and Science: Infrastructure and Services”, IEEE International Conference on BigData 2013, October 6-9, 2013, Santa Clara, USA, pp. 1-8.

[4] S.Fiore, A. D’Anca, D. Elia, C. Palazzo, I. Foster, D. N. Williams, G. Aloisio, “Ophidia: a full software stack for scientific data analytics”, Workshop on Big Data Principles, Architectures & Applications, HPCS2014, Bologna, USA, July 21-25, 2014. Contacts:

Sandro Fiore, Giovanni Aloisio CMCC and University of Salento [email protected] , [email protected],

References and contacts!

Page 25: Ophidia ICTP Trieste-shortindico.ictp.it/event/a13229/session/19/contribution/75/... · 2014. 12. 3. · The Ophidia Project! S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams,

Questions?!