citrination-mrs fall meeting 2015

42
Citrination: Open Infrastructure for Ingesting, Storing, & Mining Materials Data Bryce Meredig & Greg Mulholland Citrine Informatics MRS Fall Meeting 2 December 2015

Upload: bmeredig

Post on 07-Apr-2017

381 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Citrination-MRS Fall Meeting 2015

Citrination:  Open  Infrastructure  for  Ingesting,  Storing,  &  Mining

Materials  Data

Bryce Meredig & Greg MulhollandCitrine Informatics

MRS Fall Meeting2 December 2015

Page 2: Citrination-MRS Fall Meeting 2015

Introduction

Page 3: Citrination-MRS Fall Meeting 2015

About  Citrine

Data  platform  for  the  physical  world—our  software  aggregates  and  mines  materials  data  to  aid  R&D,  mfg,  sales

Page 4: Citrination-MRS Fall Meeting 2015

Business  ModelWe  sell  enterprise-­scale  industrial  deployments  of  our  platform

We  don’t  charge  academic  or  government  labs  for  public  data  storage  or  access

Page 5: Citrination-MRS Fall Meeting 2015

Bold  Assertion  #1Our  platform  is  a  one-­line  data  management  plan  for  everyone

-­Funding  agencies  ask  you  to  make  data  accessible,  but  do  not  specify  how-­Anyone  can  store  public  data  on  our  platform  for  free,  today

Page 6: Citrination-MRS Fall Meeting 2015

Bold  Assertion  #2

Public  data  should  be  free  and  universally  available

Page 7: Citrination-MRS Fall Meeting 2015

Bold  Assertion  #3

-­Funding  agencies  don’t  want  an  infrastructure  mortgage

Scientists  should  focus  on  science,  not  IT

-­Proliferation  of  unconnected  data  islands  doesn’t  serve  the  community

Page 8: Citrination-MRS Fall Meeting 2015

We’re  Nice,  But  Not  a  CharityMore  data  make  our  platform  smarter  and  more  valuableUsers  help  us,  and  each  other,  by  curating  and  organizing  data

Page 9: Citrination-MRS Fall Meeting 2015

Statistics  on  Citrination

Users  from  >1k  institutions

3.1m  materials  data  records3.2k  distinct  datasets150k  documents

Page 10: Citrination-MRS Fall Meeting 2015

Ingesting  Data:Extraction  from  Documents

Page 11: Citrination-MRS Fall Meeting 2015

Platform Overview

Data extraction pipeline turns docs & files into a structured database

Structured data are far more discoverable, and also amenable to machine learning

Page 12: Citrination-MRS Fall Meeting 2015

Data Structure ContinuumNumerical DataDocuments

(most materials-related arXiv papers indexed)

Completely unstructured Highly structured

Page 13: Citrination-MRS Fall Meeting 2015

Data Extraction: Text

cell parameter a = MATERIALS PROPERTY5.82445(1) = NUMERICAL VALUEangstrom = UNITS

Page 14: Citrination-MRS Fall Meeting 2015

Extraction: Images & Tables

Page 15: Citrination-MRS Fall Meeting 2015

Extraction: Images & Tables

Image containing dataUnderlying x,y data

(actual extraction shown)

machine vision

Page 16: Citrination-MRS Fall Meeting 2015

Ingesting  Data:Community  Contributions

Page 17: Citrination-MRS Fall Meeting 2015

Contributors & Partners

Page 18: Citrination-MRS Fall Meeting 2015

Uploading Data

Page 19: Citrination-MRS Fall Meeting 2015

Uploading  Data

Ingestion  is  instant  if  you  create  JSON  or  .csv  files-­see  (citrination.com/contributing)

Otherwise,  we  figure  it  out!

Page 20: Citrination-MRS Fall Meeting 2015

Credit and ProvenanceWe acknowledge both the contributor (i.e., uploader) and the published work via the DOI

Page 21: Citrination-MRS Fall Meeting 2015

Incentives: Vanity MetricsWeekly pageviews of the OQMD paper’s page (does not count individual datum views)—comparable to high-impact journal metrics!

Page 22: Citrination-MRS Fall Meeting 2015

Incentives: DiscoverabilityCan scientists find your data via Google?

Page 23: Citrination-MRS Fall Meeting 2015

Why  Upload?

Data  management  planDiscoverability  &  impactPersist  your  raw  data

Page 24: Citrination-MRS Fall Meeting 2015

Ingesting  Data:Case  Studies

Page 25: Citrination-MRS Fall Meeting 2015

Computational Data: OQMD

J.E. Saal et al., JOM 65, 1501 (2013)

Page 26: Citrination-MRS Fall Meeting 2015

Computational Data: Mat. Proj.

A. Jain et al., APL Materials 1, 011002 (2013)

B2O3 DOS

Page 27: Citrination-MRS Fall Meeting 2015

Experimental Data: JCAP

A. Shinde et al., J Mat Res 30, 442 (2014)

Page 28: Citrination-MRS Fall Meeting 2015

Data  Partnerships

https://citrination.com/api/doi/banner/10.1016/j.jallcom.2014.11.091

Implemented  an  API  for  Elsevier:

Page 29: Citrination-MRS Fall Meeting 2015

Data Partnerships

Link to Citrinationdata will appear here

Page 30: Citrination-MRS Fall Meeting 2015

Data Partnerships

Page 31: Citrination-MRS Fall Meeting 2015

Storing  Data

Page 32: Citrination-MRS Fall Meeting 2015

Data  StandardsMIF  – Materials  Information  File:  General  JSON  schema  for  defining  materials  data

Open  standard  and  open-­source  tools  for  working  with  it

Page 33: Citrination-MRS Fall Meeting 2015

MIF  SampleSchema available: http://citrineinformatics.github.io/mif-documentation{ "sample": {

"material": { "chemicalFormula": "LiF", "condition": [ { “scalar": [

{ "value": "Single crystalline" } ],

"name": "Crystallinity" }

]…

Page 34: Citrination-MRS Fall Meeting 2015

Working  with  MIFmifkit – open-­source  Python  toolset  for  working  with  MIFCreate  MIFs  in  your  code,  or  import  MIFs  from  Citrination into  your  code

Page 35: Citrination-MRS Fall Meeting 2015

Programmatic  Data  Access

# full documentation: http://citrineinformatics.github.io/api-documentation# search the entire databaseclient.search(formula='CrFeSn’)# filter on valuesclient.search(formula=‘GaN’, property=‘band gap’, max_measurement=‘3’)# search a single data setclient.search(formula='CrFeSn', data_set_id=‘100’)

API  usage  example:

Page 36: Citrination-MRS Fall Meeting 2015

Data Quality and CurationOur philosophy: We are not arbiters of data quality; instead, we give the community tools to assess and discuss quality

Page 37: Citrination-MRS Fall Meeting 2015

Mining  Data

Page 38: Citrination-MRS Fall Meeting 2015

Machine  Learning:  TE  Case  Study

Sparks, T. D., Gaultois, M. W., Oliynyk, A., Brgoch, J., & Meredig, B. “Data mining our way to the next generation of thermoelectrics.” Scripta Materialia (2015)

ML-­based  web  app  that  predicts  key  thermoelectric  properties  for  any  bulk  poly  material

Heat  map  of  thermal  conductivity  predicted  by  ML  in  Ru-­Dy-­Ge  system

Page 39: Citrination-MRS Fall Meeting 2015

TE Search: All Ternary Systems

Page 40: Citrination-MRS Fall Meeting 2015

Industrial  ApplicationsIn  production  at  several  Fortune  500  corps  and  smaller  co’s:CoatingsAlloysCrystallographyEnergy  materials

Page 41: Citrination-MRS Fall Meeting 2015

Releases Every ~2 WeeksWebGL Crystal Structure

VisualizationsAutocomplete Profile Pages

Page 42: Citrination-MRS Fall Meeting 2015

Get  Involved• We’ll  store  your  data  today—easy  data  management  plan  template

• Join  Citrination newsletter:  bit.ly/1NGNgdb

• Access  our  public  data• Email  me:  [email protected]