workflows and data integration vision and …

24
!!"#"!$ ! WORKFLOWS AND DATA INTEGRATION VISION AND SUSTAINABILITY NICOLA MARZARI, EPFL OUTLINE 1. Provenance and reproducibility of data 2. Materials’ properties from workflows, turnkey solutions 3. Interoperability of codes 4. Curation of data 5. Services to the community and scalability of efforts

Upload: others

Post on 19-Mar-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!

WORKFLOWS AND DATA INTEGRATIONVISION AND SUSTAINABILITY

NICOLA MARZARI, EPFL

MARVEL

OUTLINE

1. Provenance and reproducibility of data

2. Materials’ properties from workflows, turnkey solutions

3. Interoperability of codes

4. Curation of data

5. Services to the community and scalability of efforts

Page 2: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

%

MARVEL

COMPUTATIONAL SCIENCE AS A MIDDLE-AGES WORKSHOP

reproducibleoften not possible from the data

reported in papers

searchablefind existing calculations,

reuse them and data-mine

reliableresults persisted in repositories,

automated procedures to reduce errors and verify results

shareablecommunity to share results,

cross-validate them, and boost scientific discovery

COMPUTATIONAL SCIENCE SHOULD RATHER BE…

Page 3: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

&

MARVEL

ADES MODEL FOR COMPUTATIONAL SCIENCE

G. Pizzi et al., Comp. Mat. Sci 111, 218G. Pizzi et al., Comp. Mat. Sci 111, 218-G. Pizzi et al., Comp. Mat. Sci 111, 218-230 (2016)

Low-level pillars User-level pillars

Automation Data Environment Sharing

Automation Database Research environment SocialRemote management Provenance Scientific workflows SharingHigh-throughput Storage Data analytics Standards

AN OPERATING SYSTEM FOR SIMULATIONShttp://www.aiida.net (BST-MIT license)G. Pizzi et al., Comp. Mat. Sci. 111, 218 (2016)

Page 4: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

'

MARVEL

AiiDA INFRASTRUCTUREWhat is AiiDA?

MARVEL

PROVENCANCE AND REPRODUCIBILITY

Page 5: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

(

MARVEL

DIRECTED ACYCLIC GRAPHS

Nodes:CalculationsCodes Data

MARVEL

!"#$!%&"'()*%$"+!,-"$%!'.(/".$0'1#'

)*+

,-./0/1234536524781591:8;18:-

%<5=+>5;/.;8./12349365?-63:@-?591:8;18:-9

+219

A./912;51-493:9

Page 6: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

B

MARVEL

NoSQL flexibility within an efficient SQL schemaNoSQL flexibility within an efficient SQL schema

,-739213:C5+3.?-:

DbNode: entry for each node. DbLink: all links. Everything else in DbAttribute (+DbExtra for later).

DATABASE STRUCTURE

MARVEL

MULTIPLE STORAGE-BACKEND SUPPORT

! AiiDA API decoupled from object-relational mapper

! Two ORM implemented (Django and SQL Alchemy)

! Flexible backend choice based on needs

! Easy incorporation of graph databases like Neo4J and Titan

Page 7: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

$

MARVEL

WORKFLOWS AND TURNKEY SOLUTIONS

MARVEL

WORKFLOWS, WORKFUNCTIONS, WORKCHAINS

! !23456378"9:4;<=6=8<"3:68>7=?"7="3@<"+A7=923B&"423923B&"C2=8374="86DDB"B34:<E"FG"6EE7=?"B759D<"E<84:634:"34"<H7B37=?"C2=8374=B

! '<:76D"6=E"96:6DD<D"<H<82374="B2994:3D62=8@"D4=?":2==7=?"36B>B"4="3@<"F68>?:42=E

! /4=3:4D"9:4;<=6=8<"?:6=2D6:73GB34:<"D<;<D"4C"E<367D":<D<;6=3"34"3@<"I4:>CD4IB

! *:4?:<BB"8@<8>947=37=?:<B36:3"C:45"6:F73:6:G"B3<9&":<3:G"4="C67D2:<

! $6BG"E<F2??7=?&"B<DCJE4825<=37=?! '<65D<BB"57H7=?"4C"D486D"6=E":<543<"K4FB&"F68>?:42=E"<H<82374=E6<54="<H<82374="6DD4IB"568@7=<"34"F<"B@23E4I="6=E"84=37=2<"C:45"D6B3"947=3&"<BB<=376D"C4:":2==7=?"D4=?":<543<"K4FB

class PwBandsWorkChain(WorkChain):@classmethoddef define(cls, spec):

spec.input('codename', valid_type=Str)

spec.input('structure', valid_type=StructureData)

spec.input('protocol', valid_type=Str,default=Str(‘standard'))

spec.outline(cls.setup_protocol,cls.setup_structure,cls.setup_kpoints,cls.setup_pseudo_potentials,cls.setup_parameters,cls.run_relax,cls.run_seekpath,cls.run_scf,cls.run_bands,cls.run_results,

)

Page 8: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

D

MARVEL

+!,L")M%.(*%$"/1+$'"%(0N$+

)*+

,-./0/1234536524781591:8;18:-E-.-;1:342;5F/4?9E5987-:;-..E5621124G5365@3?-.57/:/@-1-:9H

,-98.15/4/.C9295/4?:-91/:15365/431I-:5J=F:/4;I24G5.337

J8.127.-524?-7-4?-41F:/4;I-?5J=5KLA5:8496:3@5/5KL>5:84

+24/.5:-98.1

MARVEL

COD ( 7228 Li-containing )

CodImporter (no partial occ., no attached H ?)

cifs_from_COD_imported_rejected ( 3451 )

Ncifs_from_COD_imported_accepted ( 3777 )

Y

CifCleaner (Cleans and standardizes cif files)

cifs_cleaned_with_codtools_clean_cif ( 7731 )

Cif2Structure ( Parses cif files for structure) inline_standardize_from_cleaned_cif ( 7472 )

structures_standardized_from_cleaned_cif ( 7472 )

No duplicate?

structures_standardized_from_cleaned_cif_dup_filtered ( 4963 )

Ystructures_standardized_from_cleaned_cif_dup_rejected ( 2509 )

N

NiggliReduce ( get reduced (Niggli) structure ) inline_niggle_reduce

structures_standardized_from_cleaned_cif_dup_filtered_niggli_reduced ( 4963 )

CompositionFilter ( suitable composition ?)

structures_composition_filter_rejected ( 3534 )

Nstructures_composition_filter_accepted ( 1429 )

Y

AtomicDistanceFilter ( meaningful bond distances ?)

structures_atomic_distance_filter_rejected ( 62 )

Nstructures_atomic_distance_filter_accepted ( 1367 )

Y

IonicityFilter ( Enough anions for Li ?)

structures_ionicity_filter_accepted ( 1362 )

Ystructures_ionicity_filter_rejected ( 5 )

N

CalculateBands ( Relax the structure and calculate bandgap ) structures___calc_vc-relax_deg_0p02_kpts_dist_0p2_psfam_ssspv1p0eff_smear_cold_volthr_0p01 ( 348 )

FirstSCF ( One SCF-cycle to estimate occupations )

bands__calc_vc-relax_deg_0p02_kpts_dist_0p2_psfam_ssspv1p0eff_smear_cold_volthr_0p01 ( 350 )

BandGapFilter ( Does the relaxed structure have a bandgap ? )

structures_relaxed_bandgap_filter_accepted ( 284 )

Ystructures_relaxed_bandgap_filter_rejected ( 66 )

N

PrepareSupercell ( Prepare supercell of suitable dimensions ) inline_make_supercell_minimal_dimension_8 ( 284 )

supercell_minimal_dimension_8_rattle_sigma_0 ( 973 )

PrepareFlipperStructure ( Prepare the structures for the pinball ) delithiate_structure_inline_pinball_kind_symbol_Li ( 975 )

structure_flipper-compatible_pinball_kind_symbol_Li ( 973 ) structure_flipper-compatible_pinball_kind_symbol_Li-diffusion-failed-on-bellatrix ( 11 )

ChargeCalc ( Calculate the charge density w. impl. lithium ) chillstep_calculations_singlescf-on-delthiated ( 257 )

charge-densities-bellatrix ( 182 )

Fitting ( Find the coefficients for the flipper ) chillstep_fitting_random_displacements-divide_r2-False_is_local-True_nr_of_force_components-5000_stdev-0p1 ( 185 )

coefficients ( 182 )

Pinball dynamics

Diffusion ( 180 )

Icsd ( 8627 Li-containing )

IcsdImporter (no partial occ., no attached H ?)

cifs_from_ICSD_imported_rejected ( 4670 ) cifs_from_ICSD_imported_accepted ( 3956 )

structures-calc_scf-deg_0p02-kpts_dist_0p2-psfam_ssspv1p0eff-smear_cold ( 1009 )

OccupationFilter ( Are the occupations compatible with bandgap ? )

structures_non-relaxed_occupation_filter_accepted ( 734 )

Ystructures_non-relaxed_occupation_filter_rejected ( 235 )

N

Rattle (Shake atoms)inline_displace_atoms_sigma-0p1 ( 734 )

structures_non-relaxed_rattled_sigma_0p1 ( 734 )

VC-Relax-WF (relax atoms and cell) structures_calc_vc-relax_electrons_c_2e-11_energy_c_0p0001_force_c_5e-05_kpts_dist_0p2_pressure_c_0p5_psfam_ssspv1p0eff_volthr_0p01 ( 718 )

ChargeCalc ( Calculate the charge density w. impl. lithium )

charge-densities-deneb ( 11 )

Finished relaxations pw_calc_vc-relax_electrons_c_2e-11_energy_c_0p0001_force_c_5e-05_kpts_dist_0p2_pressure_c_0p5_psfam_ssspv1p0eff_volthr_0p01 ( 689 )

SupercellPrepare ( Prepare supercell of suitable dimensions ) inline_make_supercell_minimal_dimension_8_rattle_sigma_0 ( 689 )

COD ( 7228 Li-containing )

CodImporter (no partial occ., no attached H ?)

cifs_from_COD_imported_rejected ( 3451 )

Ncifs_from_COD_imported_accepted ( 3777 )

Y

CifCleaner (Cleans and standardizes cif files)

cifs_cleaned_with_codtools_clean_cif ( 7731 )

Cif2Structure ( Parses cif files for structure) inline_standardize_from_cleaned_cif ( 7472 )

structures_standardized_from_cleaned_cif ( 7472 )

No duplicate?

structures_standardized_from_cleaned_cif_dup_filtered ( 4963 )

Ystructures_standardized_from_cleaned_cif_dup_rejected ( 2509 )

N

NiggliReduce ( get reduced (Niggli) structure ) inline_niggle_reduce

structures_standardized_from_cleaned_cif_dup_filtered_niggli_reduced ( 4963 )

CompositionFilter ( suitable composition ?)

structures_composition_filter_rejected ( 3534 )

Nstructures_composition_filter_accepted ( 1429 )

Y

AtomicDistanceFilter ( meaningful bond distances ?)

structures_atomic_distance_filter_rejected ( 62 )

Nstructures_atomic_distance_filter_accepted ( 1367 )

Y

IonicityFilter ( Enough anions for Li ?)

structures_ionicity_filter_accepted ( 1362 )

Ystructures_ionicity_filter_rejected ( 5 )

N

CalculateBands ( Relax the structure and calculate bandgap ) structures___calc_vc-relax_deg_0p02_kpts_dist_0p2_psfam_ssspv1p0eff_smear_cold_volthr_0p01 ( 348 )

FirstSCF ( One SCF-cycle to estimate occupations )

bands__calc_vc-relax_deg_0p02_kpts_dist_0p2_psfam_ssspv1p0eff_smear_cold_volthr_0p01 ( 350 )

BandGapFilter ( Does the relaxed structure have a bandgap ? )

structures_relaxed_bandgap_filter_accepted ( 284 )

Ystructures_relaxed_bandgap_filter_rejected ( 66 )

N

PrepareSupercell ( Prepare supercell of suitable dimensions ) inline_make_supercell_minimal_dimension_8 ( 284 )

supercell_minimal_dimension_8_rattle_sigma_0 ( 973 )

PrepareFlipperStructure ( Prepare the structures for the pinball ) delithiate_structure_inline_pinball_kind_symbol_Li ( 975 )

structure_flipper-compatible_pinball_kind_symbol_Li ( 973 ) structure_flipper-compatible_pinball_kind_symbol_Li-diffusion-failed-on-bellatrix ( 11 )

ChargeCalc ( Calculate the charge density w. impl. lithium ) chillstep_calculations_singlescf-on-delthiated ( 257 )

charge-densities-bellatrix ( 182 )

Fitting ( Find the coefficients for the flipper ) chillstep_fitting_random_displacements-divide_r2-False_is_local-True_nr_of_force_components-5000_stdev-0p1 ( 185 )

coefficients ( 182 )

Pinball dynamics

Diffusion ( 180 )

Icsd ( 8627 Li-containing )

IcsdImporter (no partial occ., no attached H ?)

cifs_from_ICSD_imported_rejected ( 4670 ) cifs_from_ICSD_imported_accepted ( 3956 )

structures-calc_scf-deg_0p02-kpts_dist_0p2-psfam_ssspv1p0eff-smear_cold ( 1009 )

OccupationFilter ( Are the occupations compatible with bandgap ? )

structures_non-relaxed_occupation_filter_accepted ( 734 )

Ystructures_non-relaxed_occupation_filter_rejected ( 235 )

N

Rattle (Shake atoms)inline_displace_atoms_sigma-0p1 ( 734 )

structures_non-relaxed_rattled_sigma_0p1 ( 734 )

VC-Relax-WF (relax atoms and cell) structures_calc_vc-relax_electrons_c_2e-11_energy_c_0p0001_force_c_5e-05_kpts_dist_0p2_pressure_c_0p5_psfam_ssspv1p0eff_volthr_0p01 ( 718 )

ChargeCalc ( Calculate the charge density w. impl. lithium )

charge-densities-deneb ( 11 )

Finished relaxations pw_calc_vc-relax_electrons_c_2e-11_energy_c_0p0001_force_c_5e-05_kpts_dist_0p2_pressure_c_0p5_psfam_ssspv1p0eff_volthr_0p01 ( 689 )

SupercellPrepare ( Prepare supercell of suitable dimensions ) inline_make_supercell_minimal_dimension_8_rattle_sigma_0 ( 689 )

WORKFLOW ABSTRACTION

Page 9: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

#

MARVEL

INTEROPERABILITY OF CODES

MARVEL

*%M,(0"(0O#!'.#M/.M#$

Page 10: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!<

MARVEL

O%$M#"(=3<:49<:6F7D73G"3@:42?@"!77+!OD<2:"B752D6374=B"86="F<"8@67=<E"I73@"43@<:"84E<B"3@:42?@"!77+!"'3:2832:<+636 =4E<BP"

'3:2832:<+636

OD<2:"'/O"IC OD<2:"$1'"IC OD<2:"A6=E"IC

>2M%

OD<2:"+1'"ICA6=E"IC OD<2:"+1'"

N Q4:>CD4I"RICS"*6:65<3<:+636

������������

����

OD<2:"'/O"IC OD<2:"$1'"OD<2:"$1'"IC OD<2:"A6=E"$1'"OD<2:"$1'"

O

H

3@:42?@"!77+!"

'3:2832:<+636

PAE5H

MARVEL

! Is published in the PyPI, can be integrated with AiiDA via:`pip install aiida-siesta`

! Has features implemented such as:

" Band structure calculation

" PDOS calculation

" STM imaging

! Contains workflows for band structure and STM imaging both based on top of standard WorkChain class for Siesta plugin.

! Is completely interoperable with other modules of AiiDA, since no plugin-specific node types are introduced during workflow executions. Example of graphene system band structure calculation carried out completely in

an interactive python environment with AiiDA + Siesta plugin.

'($'.!"(=3<:49<:6F7D73G"3@:42?@"!77+!

Page 11: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!!

MARVEL

QE-Yambo Interoperability through AiiDA

StructureData PwCalculation(SCF + NSCF)

YamboCalculation(p2y + GW)

Code-agnostic AiiDA datatypes (StructureData, KpointsData, BandsData and more) allow seamless interoperability between flagship codes in the AiiDA plugins and workflows ecosystem

Sharing Computing AnalysingStoring

After structural optimisation of monolayer MoS2 performed with either FLEUR or SIESTA or QE, the G0W0 band gap is calculated using QE+Yambo

RemoteDataKpointsData

BandsDataParametersData

YamboWorkflow

MARVEL

CURATION OF DATA

Page 12: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!%

MARVEL

SOME THOUGHTS ON DATA

! In computational science, data are naturally generated, so the workflows that create properties and data from a structure are key

! Curated data are needed (e.g. for verification or for machine learning)

! A model of data-on-demand can be implemented (high-throughput pushes the development of robust workflows to calculate automatically).

! Full provenance allows a-posteriori decoration of metadataA. Merkys et al., A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD, Journal of Cheminformatics, in press (2017).

MARVEL

MATERIALS REPOSITORIES

Page 13: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!&

MARVEL

SERVICES TO THE COMMUNITYSCALABILITY OF EFFORTS

MARVEL

SOME CONSIDERATIONS ON HTC

1) The need for massive high-throughput calculationspushes us to develop robust “turnkey” solutions forpredicting materials properties

2) Such effort automatically makes it possible to offercore capabilities open to the community at large –fellow computational scientists, experimental groups,national laboratories, companies

Page 14: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!'

MARVEL

MOVING TO THE CLOUD

Computer centers are moving from HPC only to service providers

Services needed in federated supercomputer centres:

! Database (to store and query information)" PostgreSQL 9.5 supporting data intensive queries, JSON and multiple users

! Object store (to store large files)" Apache Swift: Efficient storage and retrieval of large objects

! Web backends (hosting of web services)" Apache: Discovery, exploration of existing materials, calculations, workflows & launch

of new ones

! AAI (authentication and authorization infrastructure)" In progress: Keystone, Shibboleth, identity management and authentication for

federated access

MARVEL

.T$")!.$#(!%'"/%1M+"*%!.O1#)

Page 15: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!(

MARVEL

'$#U(/$'-"!M.1)!.$+"'*$/.#!"R$%$/.#10'&"*T1010'S

MARVEL

AN EXAMPLE: MATERIALS DISCOVERY

Page 16: WORKFLOWS AND DATA INTEGRATION VISION AND …

11/9/17

16

COMPUTATIONAL EXFOLIATION OF ALL KNOWN INORGANIC MATERIALS ALL

KNOWN INORGANIC MATERIALS

MARVEL

HOW DO WE PRODUCE 2D MATERIALS?

Mechanical (e.g. Geim/Novoselov, fig. from Nature/NUS)...

…or liquid exfoliation (e.g. Nicolosi/Coleman, fig. from Science)Also, bottom-up: CVD and wet chemical synthesis

Page 17: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!$

MARVEL

HIGH-THROUGHPUT COMPUTATIONAL EXFOLIATION

1. Identification of layered materials among known experimental compounds

2. Automatic calculation of all properties (structure/electronic/magnetic)

3. Binding energies and shear elastic constants ! can be exfoliated

4. Testing for mechanical, thermodynamical, electro/chemical stability

MARVEL

$H3<:=6D"E636F6B<B"R(/'+&"/1+S

%6G<:<E"563<:76DB

$HC4D76FD<V)<8@6=786DDG"

B36FD<V

*:49<:37<BV

)6?=<378"?:42=E"B363<V

Q82.?24G5345/57:-R23895918?C5365#%51S3T?2@-49234/.5;3@7384?9UVW5X-FYG8- -15/.WE5N,Z5[%<!&\

HIGH-THROUGHPUT COMPUTATIONAL EXFOLIATION

Page 18: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!D

MARVEL

!"O$Q"$W!)*%$'

MARVEL

%$.X'"'.!#."Q(.T"U1A:Y

Page 19: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

!#

MARVEL

O#1)"(/'+".1"!"Q1#N(0,"'.#M/.M#$

Primitive cell & structure sym. refinement

MARVEL

3D RELAXATION

Lowdimfinder on relaxed structure

Pw calculations

PwWorkflow

Page 20: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

%<

MARVEL

)!,0$.(/"'/#$$0(0,"1O".T$"Y+")101%!Z$#

ChronosWorkflow

MARVEL

#$)1U(0,")$/T!0(/!%"(0'.!A(%(.($'

Stabilization procedureStabilization procedure

Γ-phonon calculation

Displace the atoms along the unstable eigenvectors

Final vc-relax

Page 21: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

%!

MARVEL

FINALLY…

MARVEL

NOVEL EXFOLIABLE MATERIALS

At least 1800 structures are below this line.

Three groups:

! Eb < 30 meV/Å2 (DF2-C09) or Eb < 35 meV/Å2 (rVV10) ! 2D, easily exfoliable

! Eb > 130 meV/Å2 ! not 2D (discarded)

! In-between] 2D, potentially exfoliable

1053 monolayers

791 monolayers

Mounet et al., arXiv:1611.05234 (2016)

Page 22: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

%%

MARVEL Mounet et al., arXiv:1611.05234 (2016)

2D PROTOTYPES

MARVEL

OPTIMAL MATERIALS FOR ELECTRONICS

A. Kis (EPFL)

Page 23: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

%&

MARVEL

2D TOPOLOGICAL

� X �

�0.4

�0.2

0.0

0.2

0.4

Energy[eV]

Z2 topological insulators (9 in ~1000; 4 new, +3 under strain )

MARVEL

AiiDA THANKS

Giovanni Pizzi

(EPFL)

AndriusMerkys(Vilnius)

Nicolas Mounet(EPFL)

Boris Kozinsky(BOSCH)

MartinUhrin(EPFL)

SpyrosZoupanos

(EPFL)

Nicola Marzari(EPFL)

Snehal P.Kumbhar

(EPFL)

LeonidKahle(EPFL)

FernandoGargiulo

(EPFL)

RicoHäuselmann

(EPFL)

SebastiaanP. Huber

(EPFL)

Andrea Cepellotti

(EPFL)

MarcoBorelli(EPFL)

ElsaPassaro(EPFL)

ThomasSchulthess

(ETHZ,CSCS)

LeopoldTalirz(EPFL)

OleSchütt(EMPA)

Page 24: WORKFLOWS AND DATA INTEGRATION VISION AND …

!!"#"!$

%'

MARVEL

http://emmc.info

http://nffa.eu

Swiss National Centre for Computational Design and Discovery of Novel Materials

H2020 Centre of Excellence MaX:Materials Design at the Exascale

H2020 Nanoscience Foundries and Fine AnalysisH2020 European Materials Modelling Council

H2020 Graphene FlagshipH2020 Marketplace

H2020 Marie-Curie CofundMax-Planck-EPFL Centre

PASCVarinor

ConstelliumRobert Bosch RTC

http://nccr-marvel.ch

http://max-centre.eu

FUNDING