1 peter fox data science - csci-6961-01 week 8, october 25, 2011 academic basis for data and...
TRANSCRIPT
![Page 1: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/1.jpg)
1
Peter Fox
Data Science - CSCI-6961-01
Week 8, October 25, 2011
Academic Basis for Data and Information Science, Data
Models, Schema, Data Tools and Data as Service
Paradigms+ Class Project Definition
![Page 2: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/2.jpg)
Contents• Informatics
• Data models
• Schema
• Tools
• Markup languages
• Data as service
• Projects!
2
![Page 3: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/3.jpg)
Definitions (revisited)
• Data - are pieces of <x> that represent the qualitative or quantitative attributes of a variable or set of variables.
• Data (plural of "datum", which is seldom used) - are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables.
• Data - are often viewed as the lowest level of abstraction from which information and knowledge are derived 3
![Page 4: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/4.jpg)
Definitions ctd.
• Information– Representations (of facts? data?) in a form that
lends itself to human use
• Knowledge– …. meaning
• Metadata – data about data
• Metainformation – information about information
• Data documentation – integrated collection of information and metadata intended to support all aspects of data (find, access, use…)
4
![Page 5: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/5.jpg)
Data-Information-Knowledge Ecosystem
5
Data Information Knowledge
Producers Consumers
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
![Page 6: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/6.jpg)
6
Mind the gap• As we aim to use modern technology to
advance data science:
• There is often a gap between science and
the underlying infrastructure and technology
that is available
• Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.
Informatics - information science includes the
science of (data and) information, the practice
of information processing, and the engineering
of information systems. Informatics studies the
structure, behavior, and interactions of natural
and artificial systems that store, process and
communicate (data and) information. It also
develops its own conceptual and theoretical
foundations. Since computers, individuals and
organizations all process information,
informatics has computational, cognitive and
social aspects, including study of the social
impact of information technologies. Wikipedia.
![Page 7: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/7.jpg)
A moment of history
• In the late 1950’s (actually around 1957-1958) the modern informatics term was coined
• Existed for a while but then split into library science and computer science and developed their own fields, became disconnected
• Now coming back to be relevant to science
• Informatics IS NOT just having a scientist work with an “IT/ICT” person (NOT, NOT, NOT) 7
![Page 8: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/8.jpg)
Advertisement• Spring 2012 – Xinformatics
• http://tw.rpi.edu/web/courses/Xinformatics/2012
8
![Page 9: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/9.jpg)
Library science• Curates the artifacts of knowledge
• Organizes and manages them for consumers– Cataloging and classification
• Preservation– ‘maintaining or restoring access to artifacts,
documents and records through the study, diagnosis, treatment and prevention of decay and damage’ (wikipedia)
• Digital age– Curation and preservation
9
![Page 10: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/10.jpg)
Cognitive Science• Cognitive science is an interdisciplinary study
of the mind and intelligence
• It operates at the intersection of psychology, philosophy, computer science, linguistics, anthropology, and neuroscience.
• Of relevance for data and information science are three significant theoretical underpinnings– mental representation,– the nature of expertise, – and intuition
• Very relevant to model, data/metadata choice10
![Page 11: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/11.jpg)
Social Science• Branch of humanities
• Especially as it relates to networks of scientists
• Exploits sociology of groups, teams
• Cultural norms as well as discipline norms– Modes of what and how rewards are given– Between those who produce and those who
consume data (and information)– More
11
![Page 12: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/12.jpg)
Information theory• Semiotics, also called semiotic studies or
semiology, is the study of sign processes (semiosis), or signification and communication, signs and symbols, into three branches:– Syntactics: Relation of signs to each other in
formal structures– Semantics: Relation between signs and the
things to which they refer; their denotata– Pragmatics: Relation of signs to their impacts on
those who use them 12
![Page 13: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/13.jpg)
(Information) Architecture
• Definition: – “is the art of expressing a model or concept of
information used in activities that require explicit details of complex systems” (wikipedia)
– “… I mean architect as in the creating of systemic, structural, and orderly principles to make something work - the thoughtful making of either artifact, or idea, or policy that informs because it is clear.” Wuman
13
![Page 14: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/14.jpg)
Information Models• Conceptual models, sometimes called domain
models, are typically used to explore domain concepts
• High-level conceptual models are often created as part of initial requirements envisioning efforts as they are used to explore the high-level static business or science or medicine structures and concepts.
• Conceptual models are often created as the precursor to logical models or as alternatives to them
• Followed by logical and physical models 14
![Page 15: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/15.jpg)
Data Models• Conceptual data models, sometimes called domain
models, are typically used to explore domain concepts
• High-level conceptual models are often created as part of initial requirements envisioning efforts as they are used to explore the high-level static business structures and concepts.
• Conceptual data models are often created as the precursor to logical data models or as alternatives to LDMs.
15
![Page 16: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/16.jpg)
Conceptual model
16
![Page 17: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/17.jpg)
Data Models• Logical data models (LDMs). • LDMs are used to explore the domain concepts,
and their relationships, of your problem domain. • This could be done for the scope of a single project
or for your entire enterprise. • LDMs depict the logical entity types, typically
referred to simply as entity types, the data attributes describing those entities, and the relationships between the entities.
17
![Page 18: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/18.jpg)
Logical model
18
![Page 19: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/19.jpg)
Data Models• Physical data models (PDMs). • PDMs are used to design the internal schema of a
database, depicting the data tables, the data columns of those tables, and the relationships between the tables.
• PDMs often prove to be useful on a range of applications
19
![Page 20: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/20.jpg)
Physical model
20
![Page 21: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/21.jpg)
21
![Page 22: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/22.jpg)
22
![Page 23: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/23.jpg)
23
![Page 24: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/24.jpg)
24
![Page 25: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/25.jpg)
However as a consumer• Do you ever really see these data models?
• What’s the most common form of making data available to others?
• What’s the most common means? Second most common?
25
![Page 26: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/26.jpg)
Example XML<?xml version="1.0" encoding="ISO-8859-1"?>
<shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
<orderperson>John Smith</orderperson>
<shipto>
<name>Ola Nordmann</name>
<address>Langgt 23</address>
<city>4000 Stavanger</city>
<country>Norway</country>
</shipto>
<item>
<title>Empire </title>
<note>Special Edition</note>
<quantity>1</quantity>
<price>10.90</price>
</item>
<item>
<title>Hide your heart</title>
<quantity>1</quantity>
<price>9.90</price>
</item>
</shiporder>
26
![Page 27: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/27.jpg)
Very simple schema<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
27
<xs:element name="item" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string" minOccurs="0"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="orderid" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
![Page 28: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/28.jpg)
Markup Languages• Reminder:
– Mixes data and metadata, and yes, information– Tag structure does not always model the
underlying data structure– Modeling the XML itself, i.e. the schema is
another task– Does have the potential benefit that it is more for
use than storage
• Parsing the file:– incomplete versus complete tags– Empty or optional fields 28
![Page 29: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/29.jpg)
Data tools (just a few)• Models
– http://www.datamodel.org/– MSDN:
http://msdn.microsoft.com/en-us/library/bb399249.aspx
• Schema– The Schematron differs in basic concept from other
schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use The Schematron immediately.
– http://www.schematron.com/29
![Page 30: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/30.jpg)
Markup Language tools• Any context-sensitive editor
• XMLSpy, XML Notepad, XML Editor, oXygen
30
![Page 31: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/31.jpg)
Data as Service• Modern internet architectures allow for
– Service oriented architectures– Resource oriented architectures
• Why is this important for data models, schema, etc.– Hides/ obscures underlying model, schemas– Service interfaces are often a poor/ hybrid match
for underlying models
• UML and ISO 19xxx family of standards, e.g. 19135 are changing the landscape
• Mature in certain settings.31
![Page 32: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/32.jpg)
Open Geospatial Consortium• Web Feature Service (WFS)
– http://www.opengeospatial.org/standards/wfs– support INSERT, UPDATE, DELETE, LOCK,
QUERY and DISCOVERY operations on geographic features using HTTP as the distributed computing platform
– Built on Geographic Markup Language (GML)
• Tutorial– http://docs.codehaus.org/display/MAP/
WFS+Tutorial
32
![Page 33: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/33.jpg)
WFS examples
33
![Page 34: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/34.jpg)
Open Geospatial Consortium• Web Mapping Service (WMS)
– http://www.opengeospatial.org/standards/wms– produces maps of spatially referenced data
dynamically from geographic information ("map" is a portrayal of geographic information as a digital image file suitable for display on a computer screen). A map is not the data itself. WMS-produced maps are generally rendered in a pictorial format such as PNG, GIF or JPEG, or occasionally as vector-based graphical elements in Scalable Vector Graphics formats.
– http://www.intl-interfaces.com/cookbook/WMS/– http://oceanesip.jpl.nasa.gov/esipde/guide.html
34
![Page 35: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/35.jpg)
Open Geospatial Consortium• Web Coverage Service (WCS)
– http://www.opengeospatial.org/standards/wcs– supports electronic interchange of geospatial
data as "coverages" – that is, digital geospatial information representing space-varying phenomena
35
![Page 36: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/36.jpg)
Open Geospatial Consortium• Sensor Observation Service (SOS)
– http://www.opengeospatial.org/standards/sos
• SWE Common– http://www.opengeospatial.org/projects/groups/
swecommonswg – Get_capabilities
36
![Page 37: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/37.jpg)
IVOA (www.ivoa.net)• Simple Image Access Protocol
– http://ivoa.net/Documents/SIA/20091008/PR-SIA-1.0-20091008.pdf
– This specification defines a protocol for retrieving image data from a variety of astronomical image repositories through a uniform interface. The interface is meant to be reasonably simple to implement by service providers. A query defining a rectangular region on the sky is used to query for candidate images.
– The service returns a list of candidate images formatted as a VOTable. For each candidate image an access reference URL may be used to retrieve the image. Images may be returned in a variety of formats including FITS and various graphics formats. Referenced images are often computed on the fly, e.g., as cutouts from larger images.
37
![Page 38: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/38.jpg)
IVOA (www.ivoa.net)• Simple Spectrum Access Protocol
– http://ivoa.net/Documents/REC/DAL/SSA-20080201.pdf– The Simple Spectral Access (SSA) Protocol (SSAP) defines
a uniform interface to remotely discover and access one dimensional spectra. SSA is a member of an integrated family of data access interfaces altogether comprising the Data Access Layer (DAL) of the IVOA.
– SSA is based on a more general data model capable of describing most tabular spectrophotometric data, including time series and spectral energy distributions (SEDs) as well as 1-D spectra; however the scope of the SSA interface as specified in this document is limited to simple 1-D spectra, including simple aggregations of 1-D spectra.
38
![Page 39: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/39.jpg)
Discussion• Theoretical concepts?
• Data models?
• Schema?
• Tools?
• Service paradigms?
• Relation to data management?
39
![Page 40: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/40.jpg)
Summary• Informatics as a new field
• Data models and schema and the tools that go with them are plentiful
• Modern use of XML and specific markup languages obscure the underlying data structure (physical and logical) but have other advantages
• Data as service carry this to another level
40
![Page 41: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/41.jpg)
41
Week 8 – part 2
Class exercise - Working with someone else’s data
![Page 42: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/42.jpg)
42
Producers Consumers
Quality Control
Fitness for Purpose Fitness for Use
Quality Assessment
Trustee Trustor
![Page 43: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/43.jpg)
Reading – data resources• Department of Energy EIA• Humanities - Digging into Data • Environmental Protection Agency (EPA)• US Geological Survey (and state surveys) (USGS)• NASA Earth Observing System (EOS) and ECHO• National Oceanic and Atmospheric Administration (NOAA)
NODC, NGDC, NCDC• Department of Energy (DoE)• National Library of Medicine (NLM)• Cancer Grid (CaBIG)• OneGeology• data.gov • One of your own
43
![Page 44: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/44.jpg)
Contents• Is it possible to use someone else’s data – a
quote from the trenches
• Understanding appropriate data sources
• Finding and accessing them
• Using them
• Defining a project
44
![Page 45: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/45.jpg)
From Carole Goble (Manchester)
• “Scientists would rather share their toothbrush than their data”
• However, some are made to share…
45
![Page 46: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/46.jpg)
Appropriate Data Sources• Remember the data management principles
• Goal and investigation?
• What is of interest?
46
![Page 47: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/47.jpg)
Finding and Accessing• From the lists already provided?
• Web search?
47
![Page 48: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/48.jpg)
Using• The access interface
– Form?– Web service?– Limits?
• Formats?
• Metadata conventions?
48
![Page 49: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/49.jpg)
Defining a Project: Class Exercise
• Using someone else’s data– You will use someone else’s toothbrush– 6 groups of 5+ people– Let’s discuss some options
• Intent is to mix your skills/ interest and carry out a challenging data (collect/ manage/ use) exercise
• Remainder of this class is to search, formulate, develop your ideas and discuss and seek guidance
49
![Page 50: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/50.jpg)
What is next• Assignment 4 on the wiki to guide your efforts
– Due on Nov. 22 write up and Nov. 29 presentation
• Note Term assignment also due Nov. 29 but it is written and individual – will be handed out on Nov. 15 (week 11).
• Next week– Um….
• Reading:– None
50
![Page 51: 1 Peter Fox Data Science - CSCI-6961-01 Week 8, October 25, 2011 Academic Basis for Data and Information Science, Data Models, Schema, Data Tools and Data](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649f335503460f94c4f355/html5/thumbnails/51.jpg)
Groups 2011• A – Grace, Jay, Louis, Abhirami, Linyun
• B – Jason, Geok, Charisma, Naveen, Zack
• C – Stephanie, Michael, Bhavana, Thirun, Amruta, Buster
• D – Jeff, Anshuman, Nupoor, Yiyi, Daniel
• E – Ram, Peter, Nikut, Jon, Apurva
• F – Benno, Kiran, Akeem, Sapan, Randy, Misbah
51