data management david nathan & peter austin & robert munro

24
Data Management David Nathan & Peter Austin & Robert Munro

Upload: leon-roberts

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management David Nathan & Peter Austin & Robert Munro

Data Management

David Nathan & Peter Austin

& Robert Munro

Page 2: Data Management David Nathan & Peter Austin & Robert Munro

This section

1. Data management

2. Properties of data

3. Relational data model

4. XML

5. Example

Page 3: Data Management David Nathan & Peter Austin & Robert Munro

Workflows - description vs documentation

something inscribed

something happened

you applied knowledge, made decisions

NOT OF INTEREST!

representations, lists, summaries, analysescleaned up,

selected, analysedarchived, presented, published

something happened

recordingyou applied knowledge, techniques FOCUS OF

INTEREST!

representations, eg transcription, annotationmade decisions,

applied linguistic knowledge archived & ... ??

recapitulates

Description

Documentation

Page 4: Data Management David Nathan & Peter Austin & Robert Munro

Choosing values/priorities

Standards & compliance Adeptness with tools Modelling of phenomena, architecture of data Dissemination/publishing Preserving Ethics, responsibility, protocol Range, comprehensiveness Intellectual rigour

Which are priorities? Which are dispensible?

Page 5: Data Management David Nathan & Peter Austin & Robert Munro

Data should be:

explicitconsistentrobustmeaningfulconventionaladaptable, convertible, machine readable etcuseful!

Page 6: Data Management David Nathan & Peter Austin & Robert Munro

“Portability”

Bird and Simons 2003:

language documentation data needs to have integrity, flexibility, longevity

Page 7: Data Management David Nathan & Peter Austin & Robert Munro

“Portability”

completeexplicitdocumentedpreservabletransferableaccessibleadaptablenot technology-specific(also appropriate, accurate, useful etc!!)

Page 8: Data Management David Nathan & Peter Austin & Robert Munro

Data management

the way that data is structured is also information, that may be complex

properly structured data allows:usage including manipulation, conversion, derivationpreservationmachine readability

Page 9: Data Management David Nathan & Peter Austin & Robert Munro

Data management systems

a data management system is a system you design for storing data and metadata:information about content and structuresrelationship between units of information

it is not necessarily tied to any particular software, or even a computer

Page 10: Data Management David Nathan & Peter Austin & Robert Munro

Naive managment using filenames

a (too) simple management system:information about a recording is captured in the

filenames:1st_int_john_5Aug.wav

market_conv_mj.wav

….

what does ‘int’ mean? what information about the recording is missing?

Page 11: Data Management David Nathan & Peter Austin & Robert Munro

Data modeling

World/universeDomainRelevant

entitiespropertiesrelationships

We also need formal ways to represent these

Page 12: Data Management David Nathan & Peter Austin & Robert Munro

Data modeling

data modelling is the process of designing your data management system:what information do you need to record?what are the units of information?what are their properties (attributes)?what are the relationships between the units of

information?how is the information etc likely to change in the future?how can all this be represented?

Page 13: Data Management David Nathan & Peter Austin & Robert Munro

Data management

two well-known formats for structured data:relational databaseeXtensible Markup Language (XML)

these are methods, not softwares or hardwaresany system for well-structured data could be OK,

but generally:smaller community of users so less tools and support ... so errors more likely

Page 14: Data Management David Nathan & Peter Austin & Robert Munro

Databases

Note that database has 3 senses:a body of related informationtype of software (eg Oracle, Access, Filemaker)a model for the domain of information (ie. formulation

of entities and relationships)

Page 15: Data Management David Nathan & Peter Austin & Robert Munro

Relational format

Uses tablesTable rows represent entities in a domainTable columns represent properties/attributes of

entitiesEach cell represents one atomic unit of dataThe order of rows and columns has no

significance

Page 16: Data Management David Nathan & Peter Austin & Robert Munro

Representing a relational design

field name

TABLE NAME

simplest example

Page 17: Data Management David Nathan & Peter Austin & Robert Munro

Representing a relational design

field 1

TABLE NAME

less trivial entity

field 2

Page 18: Data Management David Nathan & Peter Austin & Robert Munro

Representing a relational design

less trivial domain

name

CONTINENT

name

COUNTRY

= one to many

Page 19: Data Management David Nathan & Peter Austin & Robert Munro

Non-trivial domains

non-trivial domains have many-to-many relationships

name

AUTHOR

name

SUBJECT

.....

.....

Page 20: Data Management David Nathan & Peter Austin & Robert Munro

From model to implementation

implementing table relationships

name

CONTINENT

name

COUNTRY

id id continent_id

Page 21: Data Management David Nathan & Peter Austin & Robert Munro

Designing a database

Determine the domain, entities and relationshipsExperiment with scenariosAny non-trivial model will evolve as it is thought

out and testedNormalisation is the process of refining models

Page 22: Data Management David Nathan & Peter Austin & Robert Munro

Practical example

Create a database model for some audio metadata

Page 23: Data Management David Nathan & Peter Austin & Robert Munro

What does all this achieve?

conceptual/intellectual validity scalable, searchable, modular machine readable in fact, portable:

complete explicit documented preservable transferable accessible adaptable not technology-specific

Page 24: Data Management David Nathan & Peter Austin & Robert Munro

Stop here!