data repositories & linked data
DESCRIPTION
ARD Prasad DRTC Indian Statistical Institute [email protected]. Data Repositories & Linked Data. Open Access to Information (OAI) A Fairly successful movement, resulted in Open Access Repositories (> 2000) Open Access Journals (> 5000) - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/2.jpg)
Looking Back
Open Access to Information (OAI)
•A Fairly successful movement, resulted in•Open Access Repositories (> 2000)•Open Access Journals (> 5000)
•Partially bridging digital divide in Social, Physical, Natural Sciences and Humanities,
![Page 3: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/3.jpg)
Nature of Publications
Many publications use data. Actual article may not have complete data used
• For lack of space• Author might have overlooked the data• Author deliberately did not present data - so that others can not verify the data
![Page 4: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/4.jpg)
For Example
Some suspect that Sigmund Freud's data is of fictious persons, it is not just fictitious names
![Page 5: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/5.jpg)
If data is available ...
• Others may draw different conclusions contradictory to that of the author
• Others may deal with other facets of the data• Data Transparency supplements the Objectivity
and self corrective characteristics of Science
If “Case history of patients” is openly available, it will contribute significantly to medical research
![Page 6: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/6.jpg)
Digital Divide• Social Sciences do not require laboratory
infrastructure• However, physical and natural sciences do
require expensive infrastructure• If experimental data is available to scientists that
do not have infrastructure, it will significantly reduce digital divide in Physical and Natural Sciences
ODA is a step toward transparency and quality in science
![Page 7: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/7.jpg)
For Example
• Human Genome data• Data from Accelerator Labs (CERN)• Recent controversy about particle moving faster
than light• Not surprisingly, astronomy data is openly
available even before the OA movement
![Page 8: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/8.jpg)
Features of Open Data Repositories
Metadata: specify who is the owner, creator etc• license the data to waive your rights to facilitate
bulk download Open Data
• Technology Tools: automate data extraction preferable on Cloud
• Ontology: Index data
![Page 9: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/9.jpg)
Licences
Creative Commons licenses (apart from CCZero), GPL, BSD, etc are NOT quite appropriate for
open data licences
![Page 10: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/10.jpg)
Open Data Licences
• Open Data Commons Public Domain Dedication and Licence (PDDL)
• Dedicate to the Public Domain (all rights waived)• Open Data Commons Attribution License• Attribution for data(bases)• Open Data Commons Open Database
License (OdbL)• Attribution-ShareAlike for data(bases)• Creative Commons CCZero• Dedicate to the Public Domain (all rights waived)
![Page 11: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/11.jpg)
Amazon Web Services (AWS)
Public Data Sets on AWS• Annotated Human Genome Data provided by ENSEMBL•The Ensembl project produces genome databases for human as well as almost 50 other species, and makes this information freely available.
• Various US Census Databases from The US Census Bureau•Demographic data•US Censuses•Summary information about Business and Industry•Economic Household Profile Data.
• UniGene provided by the National Center for Biotechnology Information
![Page 12: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/12.jpg)
Astronomy
Sloan Digital Sky Survey DR6 Subset
![Page 13: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/13.jpg)
Biology
• Influenza Virus (including updated Swine Flu sequence
• Ensembl Annotated Human Genome Data - for MySQL
• GenBank
![Page 14: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/14.jpg)
Chemistry
• PubChem Library• A data set of information on the biological activities of
small molecules.
• 3D Version of the PubChem Library
• UGI Virtual Conformer Library• 500,000 molecules for virtual screening.
![Page 15: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/15.jpg)
Climate
Daily Global Weather Measurements, 1929-2009
![Page 16: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/16.jpg)
Economics
• Federal Reserve Economic Data • Transportation Databases• Labor Statistics Databases• US Census• Business and Industry Summary Data
![Page 17: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/17.jpg)
Digital Curation
• Collecting verifiable digital assets• Providing digital asset search and retrieval• Certification of the trustworthiness and integrity
of the collection content• Semantic and ontological continuity and
comparability of the collection content• Use of open standards (formats) for term
preservation and future proofing by migration of data
![Page 18: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/18.jpg)
Technology
• Data repositories are much larger than OA repositories
• Cloud Computing is a good solution (AWS uses)• Semantic Web & Linked Data (Linking Data
through various methods)
![Page 19: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/19.jpg)
Resource Description in terms of Metadata and Ontology
RDF: Resource Description Framework SKOS: Simple Knowledge Organization
System OWL: Web Ontology Language
SPARQL: SPARQL Protocol and RDF Query Language
![Page 20: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/20.jpg)
RDF Example
Title: Dil-E-NaddanArtist: Talat MahamedArtist: SuraiyaCompany: HMVCountry: IndiaPrice: Rs.100Year: 1955
![Page 21: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/21.jpg)
<?xml version="1.0"?><rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-
ns#"xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Descriptionrdf:about="http://www.hmv.com/cd/Dil-E-Naddan"> <cd:artist>Talat Mahamed</cd:artist> <cd:artist>Suraiya</cd:artist> <cd:country>India</cd:country> <cd:company>HMV</cd:company> <cd:price>Rs. 100</cd:price> <cd:year>1955</cd:year></rdf:Description>
![Page 22: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/22.jpg)
SKOS Example
prefLabel - The preferred term altLabel - These are the See references which point to this
record narrower - Contains the related narrower term broader - Contains a sub-element for the authority type which
contains the related broader term related - Contains a related term which is at the same level in
the heirarchy scopeNote - Note information
![Page 23: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/23.jpg)
DBpedia Data Set
Multi-domain ontology derived from Wikipedia 3.77 million “things” (entities - Entitypedia) 400 million “facts” Uses YAGO (Yet Another Great Ontology)
![Page 24: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/24.jpg)
![Page 25: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/25.jpg)
Entitypedia
Multilingual controlled vocabulary Entity matching Data quality and type checking Entity type specific services Semantic or faceted search and navigation on
entities Summarization of entities and concepts
![Page 26: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/26.jpg)
DRTC Projects
Living Knowledge (EC funded project) ITPAR: India-Trento Program for Advanced
Research (work on Entitypedia)
CHAIN – REDS (EC funded Project): Coordination and Harmonization of Advanced e-Infrastructures–Research & Education Data Sets
![Page 27: Data Repositories & Linked Data](https://reader035.vdocuments.site/reader035/viewer/2022070410/568146a9550346895db3c640/html5/thumbnails/27.jpg)
Thank You