data management david nathan & peter austin & robert munro

Post on 12-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Management

David Nathan & Peter Austin

& Robert Munro

This section

1. Data management

2. Properties of data

3. Relational data model

4. XML

5. Example

Workflows - description vs documentation

something inscribed

something happened

you applied knowledge, made decisions

NOT OF INTEREST!

representations, lists, summaries, analysescleaned up,

selected, analysedarchived, presented, published

something happened

recordingyou applied knowledge, techniques FOCUS OF

INTEREST!

representations, eg transcription, annotationmade decisions,

applied linguistic knowledge archived & ... ??

recapitulates

Description

Documentation

Choosing values/priorities

Standards & compliance Adeptness with tools Modelling of phenomena, architecture of data Dissemination/publishing Preserving Ethics, responsibility, protocol Range, comprehensiveness Intellectual rigour

Which are priorities? Which are dispensible?

Data should be:

explicitconsistentrobustmeaningfulconventionaladaptable, convertible, machine readable etcuseful!

“Portability”

Bird and Simons 2003:

language documentation data needs to have integrity, flexibility, longevity

“Portability”

completeexplicitdocumentedpreservabletransferableaccessibleadaptablenot technology-specific(also appropriate, accurate, useful etc!!)

Data management

the way that data is structured is also information, that may be complex

properly structured data allows:usage including manipulation, conversion, derivationpreservationmachine readability

Data management systems

a data management system is a system you design for storing data and metadata:information about content and structuresrelationship between units of information

it is not necessarily tied to any particular software, or even a computer

Naive managment using filenames

a (too) simple management system:information about a recording is captured in the

filenames:1st_int_john_5Aug.wav

market_conv_mj.wav

….

what does ‘int’ mean? what information about the recording is missing?

Data modeling

World/universeDomainRelevant

entitiespropertiesrelationships

We also need formal ways to represent these

Data modeling

data modelling is the process of designing your data management system:what information do you need to record?what are the units of information?what are their properties (attributes)?what are the relationships between the units of

information?how is the information etc likely to change in the future?how can all this be represented?

Data management

two well-known formats for structured data:relational databaseeXtensible Markup Language (XML)

these are methods, not softwares or hardwaresany system for well-structured data could be OK,

but generally:smaller community of users so less tools and support ... so errors more likely

Databases

Note that database has 3 senses:a body of related informationtype of software (eg Oracle, Access, Filemaker)a model for the domain of information (ie. formulation

of entities and relationships)

Relational format

Uses tablesTable rows represent entities in a domainTable columns represent properties/attributes of

entitiesEach cell represents one atomic unit of dataThe order of rows and columns has no

significance

Representing a relational design

field name

TABLE NAME

simplest example

Representing a relational design

field 1

TABLE NAME

less trivial entity

field 2

Representing a relational design

less trivial domain

name

CONTINENT

name

COUNTRY

= one to many

Non-trivial domains

non-trivial domains have many-to-many relationships

name

AUTHOR

name

SUBJECT

.....

.....

From model to implementation

implementing table relationships

name

CONTINENT

name

COUNTRY

id id continent_id

Designing a database

Determine the domain, entities and relationshipsExperiment with scenariosAny non-trivial model will evolve as it is thought

out and testedNormalisation is the process of refining models

Practical example

Create a database model for some audio metadata

What does all this achieve?

conceptual/intellectual validity scalable, searchable, modular machine readable in fact, portable:

complete explicit documented preservable transferable accessible adaptable not technology-specific

Stop here!

top related