edinburgh research explorer · 2015. 3. 14. · edina, university of edinburgh, edinburgh. link:...

92
Edinburgh Research Explorer Papers and Presentation Material Pertaining to Meta Data Citation for published version: Burnhill, P 2015, Papers and Presentation Material Pertaining to Meta Data: Visit to Statistics Canada September 25-27, 1989. EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 12. Mar. 2021

Upload: others

Post on 13-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

Edinburgh Research Explorer

Papers and Presentation Material Pertaining to Meta Data

Citation for published version:Burnhill, P 2015, Papers and Presentation Material Pertaining to Meta Data: Visit to Statistics CanadaSeptember 25-27, 1989. EDINA, University of Edinburgh, Edinburgh.

Link:Link to publication record in Edinburgh Research Explorer

Document Version:Peer reviewed version

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.

Download date: 12. Mar. 2021

Page 2: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

L

I I [ r

.+. Statistics Canada

Statistique Canada

Papers and Presentation Material Pertaining to Meta Data

From Peter Burnhill, Manager Edinburgh University Data Library

Visit to Statistics Canada September 25-27, 1989

----------~,--------- -----------------'

Page 3: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

f

r r f r

r

f

[

r' I l I r. [

{

1

[

{

PAPERS AND PRESENTATION MATERIAL

PERTAINING TO META DATA

,.

\ FOR I»IS(USS~ON PURPOStS O~~l V

PETER BURNHIll

MANAGER

EDINBURGH UNIVERSITY OATA LIBRARY

VISIT TO STATISTICS CANADA

SEPTEMBER 25-27, 1989

Page 4: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

L L f L.

L L L L

FOREWORD

I first met Peter Burnhill at the 1989 lASS 1ST Conference where

he presented a paper on work being carried out in the U.K. with

respect to the cataloguing of electronic information. I was

impressed by his presentation, his manner and by the fact that

someone other than a small group at Statistics Canada and a few

people in certain international agencies understood the concept

of Meta Data - he actually used these two four letter words!

On the occasion of a visit by Peter to New York to plan the 1990

lASS 1ST meeting, I invited him to Ottawa for a series of

meetings. This small volume contains a complete collection of

the papers and material used by Peter during his visit to

Ottawa.

I would like to thank all of the people that helped to make these

meetings a success.

ERNIE S. BOYKO DIRECTOR

ELECTRONIC DATA DISSEMINATION DIVISION

Page 5: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

"

VISIT BY

PETER BURNHILL

EDINBURGH UNIVERSITY

AT STATISTICS CANADA

September 25-27, 1989

OBJECTIVES

Page 6: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r [

[,

[

r [ r t

L. L L

PETER BURNHILL'S VISIT AT STATISTICS CANADA

OBJECTIVES

The objectives of Peter Burnhill's visit at Statistics Canada

are:

• to draw

play in

attention to the

the dissemination

important role that data librar.ies

of information, especially soc.ial

data in electronic form;

• to sensitize

importance of

identi fy some

the staff of

Meta Data in

of the issues

Statistics Canada

the dissemination

that must be deal t

development of Meta Data information systems;

as to

process

wi th in

the

and

the

• to provide advice and opinions to a group in Statistics Canada

regarding their plan to develop a Meta Data information system;

and,

• to strengthen professional ties between Statistics Canada and

the University of Edinburgh.

Page 7: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r !

r c­

r [

r L r ['

L f. L l L

L

1+ Statistics Canada

Statistique Canada

Ottawa, Canada VISIT TO STATISTICS CANADA

PETER BURNHILL

September 25-27, 19B9

AGENDA

Monday, September 25

9:00 Ernie Boyko Director Electronic Data Dissemination Division

11:45 Luncheon - Main Executive Dining Lounge

In attendance:

14:00

- Mr. Yvon Goulet Assistant Chief Communications and 6perations Field

Statistician,

- Mr. Guy Labossi~re, Assistant Chief Statistician, Management Services Field

- Mr. Denis Desjardins ... Director General, Marketing and Information Services ~ranch

- Ms. Marlene Levine, Operations Branch

Director General, Regional

- Mr. Edvard Outrata, Director General, Informatics Branch

- M~ •. E~nie Boyko, Director, Electronic Data Dissemination D.lvls10n

- Ms. Louise Desramaux, Director, Data Access and Control Services Division

- Ms. Georgia Ellis, Director, Library Services

- Mr. Harold Nightingale, Director, Publications Division

- Ms. Shaila Nijhowne, Director, Standards Division

- Mr. Jerry Stinson, Associate Director, Electronic Data Dissemination Division

- Mr. Ralph Turvey, Statistics Field

Consultant, Business and Trade

- Mr. Peter Nador, Coordinator, Steering Committee on Information Management

- Mr. Al Miller Chief, Data Development Section, Electronic Data 6issemination Division

Lecture by Peter Burnhill Information Dissemination: The Role of Data Libraries and Meta Data

r _11·1 .~n~n~

Page 8: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r

r r [.

L L L (/

r. [

L L

1+ Statistics Canada

Statistique Canada

Ottawa. Canada VISIT TO STATISTICS CANADA

PETER BURNHILL

Septe~ber 25-27, 19B9

AGENDA

Tuesday, September 26

9:30 GROUP DISCUSSION:

Progress and Issues in the Area of Meta Data

(List of participants attached)

12:00 Lunch

13: 30 GROUP DISCUSSION CONT'D:

Progress and Issues in the Area of Meta Data

.'

Page 9: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r [:

L

[ I

L L

GROUP DISCUSSION ON PROGRESS AND ISSUES IN THE AREA Of HETA DATA

September 26, 1989

In Attendance:

- Mr. Edvard Outrata, Director General, Informatics Branch

- Mr. Ernie Boyko, Director, Electronic Data Dissemination Division

- Mr. Jerry Stinson, Associate Dissemination Division

Director, Electronic Data

- Ms. Louise Desramaux, Services Division

Director, Data Access and Control

- Ms. Georgia Ellis, Director, Library Services

- Ms. Shaila Nijhowne, Director, Standards Division

- Mr. Peter Nador, Coordinator, Steering Committee on Information Management

- Mr. Ralph Turvey, Consultant, Business and Trade Statistics Field

- Ms. Raymonde Noel, Project Coordinator, Data Access and Control Services Division

- Mr. Al Miller, Chief, Data Development Section, Electronic Data Dissemination Division

- Mr. Chuck Lyon, Federal Information Collection Group, Social Survey Methods Division

- Mr. John Berigan, Research Assistant, Data Development Section, Electronic Data Dissemination Division

- Ms. Karen Davies, Data Access Officer, Data Development Section, Electronic Data Dissemination Division

- Ms. Fay Hjartarson, Head, Services Division

Bibliographic Products, Library

- Mr. Robert Parenteau, Central Statistical Reference Centre

- Ms. Koushal Sehdev, Services Division

Manager, Library Operations, Library

- Mr. Gdrard COtd, Coordinator, Industrial and Occupational Standards Section, Standards Division

- Mr. Richard Godin, Senior Analyst, Standards Division

- Mr. Mike Webber, Senior Analyst, Standards Division

- Ms. Sunanda Palekar, Librarian, Standards Division

Page 10: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r-

r r r [

r j

L r [

[

[

L L l L L L

.+ Statistics Canada,

Statistique Canada

Ollawa. Canada

VISIT TO STATISTICS CANADA

PETER BURNHILL

September 25-27, 19B9

AGENDA

Wednesday, September 27

9:30 GROUP DISCUSSION:

Software and Computing Issues with Respect to Meta Data

(List of participants attached)

12:00 Lunch

14:00 GROUP DISCUSSION CONT'D:

Software and Computing Issues with Respect to Meta Data

Canada

"

Page 11: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

c r

r r

r: r r C [

r [

L 1.'

L

L L

GROUP DISCUSSION ON

SOFTWARE AND COMPUTING ISSUES WITH RESPECT TO META DATA

September 27, 1989

In Attendance:

- Mr. Edvard Outrata, Director General, Informatics Branch

- Mr. Ernie Boyko, Director, Electronic Data Dissemination Division

- Mr. Jerry Stinson, Associate Dissemination Division

Director,

- Ms. Georgia Ellis, Director, Library Services

Electronic Data

- Mr. Peter Nador, Coordinator, Steering Committee on Information Management

- Mr. Ralph Turvey, Consultant, Business and Trade Statistics Field

- Mr. Al Miller, Chief, Data Development Section, Electronic Data Dissemination Division

- Mr. John Berigan, Research Assistant, Data Development Section, Electronic Data Dissemination Division

- Ms. Karen Davies, Data Access Officer, Data Development Section, Electronic Data Dissemination Division

- Ms. Fay Hjartarson, Services Division

- Ms. Koushal Sehdev, Services Division

Head, Bibliographic

Manager, Library

Products, Library

Operations, Library

.-

Page 12: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

(,

[

C i'

r

c

L L

DATA DISSEMINATION:

THE ROLE OF META DATA AND DATA LIBRARIES

September. 1989

Page 13: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r f, c-

r,

r r c [

c' [

l [

r·.

\ l (

l l. L

• PAPYRUS

• PEN

• PRINTING

PAPER

INK +

SYMBOLS RECORDED IN CONVENIENT

MEDIA

(MASSIVE IMPROVEMENT OVER CLAY

TABLET)

PICTURES BECOME LETTERS

ORGANIZATION OF INFORMATION

MASS ACCESS AND DISTRIBUTION

• ELECTRO·MAGNETIC MEDIA & COMPUTATIONAL POWER

Page 14: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r (

r L {

[

r [

L r [

L

l L l. L L

• INFORMATION AS A RESOURCE

• DATA AS POTENTIAL RESOURCE

• METADATAASAKEYRESOURCE

Page 15: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r-

r ENTITY TO BE DESCRIBED [

r L r [

[

[

[

C [

[

l. L l L

• SURVEY [COLLECTION METHOD] (Questions) - THAT GENERATES

• STUDY [ACTIVITY] (Topics) - THAT INSPIRES

• DATA SET (Variables) - THAT RESULTS

• PUBLICATION - THAT REPORTS

Page 16: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

c r

_--- ESRC RegiQrilal. \Res.QaCCP~~ _ Scotland --_

Data , _.. _ ~_ c _. . .• _. . _. _ •• _ _.

Numeric (~~~cMl." ~-~ ... St.r;!e!y .. ___ data

SowwI,.e8!l1!QaliaD data MusicaI~ Speech

Textual data Full text Bibliogrl!phic data

Visual represet'1ltatitDm dida Maps (including ISM boundaries) Pictures (creatt .. grllJ)tlic$) .

,Animation Sensed image

Moving images Still frame(s)

Technical drawings (eg CAD/CAM)

Mixed data and programs •

----- Ed~ University Data Ubrary -----

Page 17: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r ,

Ir ~--r [

r [

L ['

[.

[

l [

L l L L

L L

_--- ESRC Regional Research Laboratory for Scotland

in textual data, include the relationship to source. the encoding conventions used, and bibliographic citation of published coding schemes

for textual data, record use of a representation sche to indicate structural subdivisions and interpretative categories (including systematic use of fonts and special characteristics.

for statistical databases, time series tor ex .... !, r· d comments on indices used, including .eaSOilal adjustments, etc. .

record, where stated, a description of any sampling methods used in the data collection including reference, where stated, to the target sample size, an to the sampling fraction.

record the study design resulting from the samplihgl measurement combination

record number of units and use of weights /

------ Edinburgh University Data Ubrary ------

Page 18: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r , I

r l-r r-

r [

f [

L [

(

L [

l L

L

L L

_--- ESRC Regional Research Laboratory for Scotland --_

Research data are

valuable but not necessarily priced

both a product of research activity and a resource for research activity

not exclusively generated or used within the research community

exchanged - ie publishecl and acquired

'------ Edinburgh University Data Ubrary -----

Page 19: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r c r [

r­[

[

[

[

[

[

[

(

L

l L L L

---- ESRC Regional ~ Laboratory for Scotland --_

" Research

------, I publication

'makin, rmisMd work public'

act'"""k ..... EDt for inteUectDat eIfwt and achieveJDmt

citatiua ill publication

peer review

1------

libraries

acquire and manage pubis Ed (MCI unpab'isb ed) works

mate works available to a given community

data publication & data libraries

----- Edinburgh University Data Ubrary ---"----'

Page 20: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

[

[

[

L f [

r

[

[

L L l

(BUYS)

BOOK MODEL/PUBLICATION MODEL

DELIVERS

~ LOANS

SELLS

Page 21: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

c r r [

c r-

[

[

[

[

[

[

[

[

[

L L L L

\

_--- ESRC Regional Research LaboraIOI)' for Scotland ---.......

research data·as research resource

Purpose used by analysts and decision-makers multi-purpose, multi-discipline

-,ride variety of project data and data streams large-scale, aDGl very large scale

Conditions of Access own use / 'priviledged access' / puWication .

. free / at cost / priced / license or royalty

Meta data as resource to discover what's where to drive 'intelligent' software systems

'------- Edinburgh University Data Ubrary ------

Page 22: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

l r c r [

r [

[

[

[

[

[

l r l [

L L L

DATALIB

• CONTAINS 'META DATA' (DATA ABOUT DATA)

• SEARCHABLE THROUGH TITLE AND ABSTRACT (WHOLE-TEXT SEARCHING/INDEXING)

• EXPLAINS HOW TO REGISTER HOW TO OBTAIN DOCUMENTATION

• PROVIDES FILENAME AND SOFTWARE DETAILS

• EDUCATES !INFORMS THE USER (WHAT IS CENSUS / SURVEY /TIME SERIES)

Page 23: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[~

r c r l r [

r

[

[

L f

l l r l L L L

_--- ESRC Regional Research Laboratory for Scotland ---......

Data information in computer-readable form

words numbers pictures sounds ~s

Library

organised and 'managed' collection

'liber'·

a suitable (compr . ing) environment with 'reader (user) services'

user-interfaces, catalogues

'------- Edinburgh University Data Ubrary ------

Page 24: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r c r [

r [

[

[

[

[

[

L [

L L L L L

}

I

.,

-~., ..

'" DATA LIBRARY'"

Contains:

1 Computer-readable information,

2 in a software - rich environment

3 organised as a library

1 Words, numbers, pictures, (sounds)

2 Software to:

3 "

Retrieve

Analyse

Display

Library ~ publication, catalogues, etc

"liber" rind between wood and tree

Page 25: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r· Ir I I

[

f r t [

[

[

[

r-'· r L L L. L L L L

Reference Librarian

..... tie.. S.Ul!tIere S1-.Ustician 1-----1 Eagm,eer

$I.' . -- .. . , 'R!!:·W :'f"

Page 26: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r 0.

r r [

r L [

r [

r

r [

[

[

l_

L L L,

__ ---- ESRC Regional Research Laboratory for Scotland

* what exists and where can it be located?

* analyst also requires explicit and systematic statement about how data were generated

* descriptive fields for research data, to include

*

• bibliographic info on identity and availability

• info on subject coverage, intellectual content, methodology of data generation, etc

• technical requirements for use

• related documentation and management info

standards for exchange of descriptive metadata

'-------- Edinburgh University Data Ubrary ------./

Page 27: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r r r l r-

r [

[

[

[

[

L [

L L L L L

_-- ESRC Regional Research I..aborailori tor Scotland --_

Under Subject & Content

we include information about the intellectual content or function of the item described:

an abstract (with a sum.mary or systematic description of methodology or capabilities)

subject descriptors (key w~rds and classification)

designation of the type of computer work

'------- Edinburgh University Data Ubrary ----~

Page 28: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

~

r I

!~

r [

[

[

[

[

r L L L

L L L L

--- ESRC Regional Research Laboratory tor Scotland --_

Under Identification & Availability· .

we include

title, responsibility,

edition ( with date), publication and d~bution, copyright, standard numbers and terms of availability

.. -

t.~e information required to identUy the item, and indicate how a copy might be obtained

----- Edinburgh University Data Ubrary -----

-...

Page 29: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r ['

[ [,

r (­t'

[ ['

L ['

[

L

r '.

---- ESRC Regional Research Laboratory for Scotland ---_

Catalogue Strategy

Three types of catalogue I inventory:

* (national) bibliography - based on 'publications'

* (shared) union catalogues - brief top-le'MI info details location{s) where item may be obtained or accessed

* holdings catalogues - for users/analysts and providing full information on filenames, relevant documentation etc

'------- Edinburgh University Data Library ----__

Page 30: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

" [' .' .-:

r [.

[

r I '

[

[

, the probtJm of efficient catalr guing of information is unsolved' Eric T.,..,... (ESRC 0Ma ..... )1976

'A major uncertainty haunts the user.when a desi'ed piece of statistical information is not found .•• ls this because it does not exist or because the source has not been discovered?' Geoffrey Hamilton (British Library) at Library Association Referencs, Special and Information S6f "00, 1982)

'(data facilities) usually existed outside the Lb'ary and were lacking in library procedures for coIedloct management and bibliographic control' Judith Rowe, IASSIST, 1982

In 1983, the IASSIST Newsletter No 7(2}, wek::omed tile book Catalogujng Machine-Readable Data Ftit-es (Sue Dodd, American Library Associatioo, 1982) as 'this long-awaited contribution k) ~Jing practices (which) moves MRDF to bibliographic legitmacy at last'.

Page 31: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r i r Ir ,

C

r [

r ~

r [

[

l C l t· L L L L [

)

.,

0·· .. .

----- ESflC Regional Research Laboratory for Scotland

* what exists and where can it be located?

* analyst also requires explicit and systematic statement about how data were generated

* descriptive fields for research data, to include

*

- bibliographic info OIl identity and availability

- info on subject coverage, intellectual content, methodology of data generation, etc

- technical requirements for use

- related documentation and management info

standards for exchange of descriptive metadata

Edinburgh University Data Ubrary ---__ ---'

Page 32: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

'-

_--- ESRC Regional RII! wd.lIboratory tor Scotland

Under Access & Informati n

we include information specific to the I cal implementation or usage of a dataset 0

program: the name and geographic address of the iocfltion

any bibliographic or physical peculiarities of the item described

local administrative information (including registration information, access

regulations, etc)

details of remote access procedures, SIIdi as electronic address

information about local support services and any locally-produced software or documentation

management and processing information

------ Edinburgh University Data Ubrmy ------

Page 33: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

r r [

[

[

( i[~ ! -,

r [

r

_-- ESAC I'legionaI Research Laboratory for Scotland --_

Under Physical and Technical Characteristics of the Media

" . -- - --

we include information on

the extent of the computer work,

the computing environment required to make use or the item described

(software, operating system and peripherals)

where relevant, a physical description of the carrier or container which holds the work (ie the file(s» of interest,._

a description uf the various materials which accompany the work.

'------ Edinburgtll..lmiwllSiy Data ut;)rary ------'

Page 34: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[ I~

r , .

r r f r (

[

['

f L [

L

L L· r. L

_--- ESRC Regional Research Laboratory for Scotland ---.....

for textual data:

655 genre headings (not used in UKMARC) $a access term for genre (terms from ALA

standard - not known) $y chronological subdivision $z geographic 51 dxfivision

041 languages (use 041.10 ior translation without original text)

$a language(s) of the main work $b langl'agl(s) of parts of the work, summaries,

etc .

,

For map data and other geo-spatial representation Record, where relevant, the scale, projection, and gr d system used on map data.

256 mathematical data area (cartographic materials)

$a statement of scale (eg 1 :50, 000) $b additional scale information ·$c statement of projection

------ Edinburgh University Data Ubrary -----,./

Page 35: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r Ir r r r [

L r r [

[

L r r L~

L L L

L\ ,

_--- ESRC Regional Research Laboratory for Scotland ---_

KMARC fields

Library of Congress Subject headings (see belo )

Library of Congress Subject headings (see belo )

79* data profile headings $a summary note $b population of interest (use 650, 651 or equivale t

for time and place) $d areal units $f periodicity $g units of collection (including extent of aggregat on) $i instrumentation $mmethod $n source of information $0 relationship to source $p encoding ~onventions $r represenb!tion scheme $s sampling methods (including target sample size, samplhlglction, etc) $t study design $u use of weights, etc

------ Edinburgh University Data Ubrary ------

Page 36: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

=- ,t. "[ r !oo:::!. i· "\ ~ '. r----. ~ r--"""I --, ~~ '~ ---" ____ ~ ~._ ----., .--,. \ " ~ ,\ ' ~ , , , , . I J

Stud~_l

Title:

Type:

Principal

~

Assess.ent of PriMary Four (P4) Mathematics Achievement in Scotland, 1982-83 (M.chine readable data file)

Numeric, Survey

Inve.tilators: G.J. Pollock and B. Duncan

Producer: Scottish Council for Research in Education (SCRE)

Sponsor: Scottish Education Department (SED)

Conducted ~y: SCRE

files:

SUM.ary:

Several data file. and coding sheets and SPSS program listings .. , ,

Forms part· of SED'. A.sessMent of Achievement Progra.~e together with studi.. of Primary Seven and· Secondary Two ~athematic~ and of practical mathematics at these three stages. Supported by data on schools and on parents of .pupfll. IteMs drawn frOM Primary IteM Bank at Godfrey ThoMP.on Unit and Secondary leM Bank.at ~oray House College. SiMtlar studies conducted in Engli.h language, Science, Ho.e EconOMics and Technical Education.

~

i

i

~

~

.~

(

I

i

, !

t

!

Page 37: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

{- r-' t--- -. 1"---- ;--->'\. "'-- '['"'" ,t---- :--i ,.--, -" , ' ;"--'"r ,.--" ---r

Achieved sample size: 1934 pupils

114 schoo 15 113 schoo 15

Dtt. of data coll.ction: May 1983

11.e di.ension:

(in 115 schools) ~or writt.n tests for school infor~ation for parental occupation

cross-s.ctional, single point survey

Instrumentation:

(1) Written tests:

.---, --- -----, , ,~

5 . tests cov.ring the 'content artiS' of addition/money/subtraction/shape/multiplication/length/area, volume/division/time/w.1ght/relationships multi-Matrix sampling (each pupil lit 2 of 5) - administered by school ,tlfr to pupils from all schools in sample.

12) School questionnaire (confidential): A school participates at one level (P4 or P7 or 52) only Ind the linking of a school to itl number in the fil. i, confidentia~ and known only to SeRE. Information gath.red on catchment area, socio-economic background of area, rolls1ze, class organisation, textbooks in case, contact ti.e. remedial/special, staffing, homework policy - self-co.plet1on by headmaster or deputy.

, (3) Parental occupation (confidential): Inforillation provided by school (by Head), ,coded on variant of Hall- Jones Occupational Scale. About 4.51 of replies were insufficiently detailed to use, resulting in information being ~v~il~hl. fnr nnlv AAt nf thQ c~.olQ

Page 38: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

-, r--'"' t----- -r--, f--o -~. r-- ~ t--, ~ J , , -----; ----; ,>"---r ~ ---: ----, , r-i >--"\ ,

Methodology:

Unites) of collection: pupils . and schools

Source of inforMation: pupils, head teachers (see Instrumentation)

Universe: Scotland, Primary School (P4) pupils in fourth .,..". of compulsory schooling (ages about eight years old

Target Population: Primary Four (P4) pupils in EA Schools, 1982-83.

Sallpling: 'Stratified two-stage PPS (seH-weighting) randoM .. .,lift, .f pu'pils, schools saMpled at first stage, stratified by e~h.l(;et;Gft authority and· with probability proportional to stze: ~eft4 stage selection of pupils by birth date with 1"t.ftt;~ to achieve approxiMately equal numbers of pupils per .c~1. MeasureMent by Multi-Matrix sampling (see Instrumentatieft),

Target sample sizl: 2100 pupils (in 120 schools) for written test

120 schools for school information

~

Page 39: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

c r r r r [

r r: ['

[

r [

L. c L t' L [

L I L.

_--- ESRC Regional Research Laboratory for Scotland --_

ESRC Regional Research Laboratory Initiative (NSF/ESRC Joint Report, 1985 and

'Chorley Report', 1987)

promoting use of GIS methods managing large-scale data resources

'bibliographic control' of research data

(CataI0l:uin2 Computer Files in the UK: A Practical Guide to Standards, Review Draft 1988)

'------ Edinburgh University Data Library ---_~

Page 40: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r l

r c [-

e r r f r ["

[

[

[

[

[

L r L I L

4. FonI$ of Cltalogue

Three were identified:

(a) A National Bibliography of Computer Files

This would contain a complete bibliographic record for eachentity (item or thing), perhaps with limited information on physical form. No information on who held each item.

(b) ~ Union Catalogue

This would contain a brief (first-tier) bibliographic record, but would fulfil the signposting functioo by indicating where the entity was held.

(c) Local Holdings Catalogue

This would ideally be the most comprehensive bibliographic record, and would include details of how the files were held (and accessed) 1 oca 11y.

To some respects (a), (b) and (c) corresponded to the three functions of a catalogue:

(i) to ascribe intellectual authority (or responsibility).

(ii) to ,scertain what exists and where it is,located.

(iii) to describe what a 'library' holds.

5. statement of mandate

After discussion we agreed that we should have two aims:

(i) to investigate the feasibility of establishing a national bibliography of computer files;

" and (ii) to implement a trial union catalogue which included catalogued examples of the range and variety of computer files.

Page 41: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r r .;.;,. -:,::.

r ::~~~~

r r r l

r [ r'

L

r r r L

"

l L [

L L

" "

.. ~ - Parte ESRC Research Actlfity and Publications Dftabase

....... Title ~r diltased (not filename)'

/,,: . N . . ',::", '",", ,,:,:'::,',' ote on use:'

'eg techniCal 'dependence upoa', .. compu&g environment;,

si:ze;.titIe'and description of rereventdocumentation,

or' oaline access arrangements

Other information

- .. :::.

':.-' ., ,." . ,,:,:.:~ ::. : .

oabset

- ~ - __ Jl._~ __ ___ ~ ___ __ _ ~"a..u--.a: __ __ __ J •• ~ D"_~ ___ ",., ~"" __ ~

;.",

Page 42: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r·~· ~ .. ~' .... ~.~~-­t

r-

f [-

C

r-[

-

r .f"" fJ I.« [ fUel

[

f [

[

L r L

L [

L L'

A/ISj

-

-~c::. ttt..~ <:0

~c re~ c::..,vGf

\.

Page 43: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r [

[

[

[

[

r [

[

[

[

L L C L L [

L

6Sf<c. .D .4-T.,,\

ftfJ. C fft VI': ..

Se~06Wt lOAllx)

NAs (&M~S)

VAX' (. VMS)

]) 4:\-... L 10 ~ hg~~.l o.croS.1 0 ~

fu .v;I-lJork M ..... ~It\~S

Page 44: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r-' '[

[

1- .•

SECONDARY ANAL VS I S.

1. would problem benefit from empirical evidence?

2. does suitDble evidence exist. as dat8? .

3.·where is the dDtDbDse 10cDted?

4. how miQht I neQotiDte lJccess?

5. what is the stDtus &. quality of the data?

6. how do I obtain codebooks lind other document

7. ·how might I reformulate problem. so that datil can contribute?

13. what software is aVDilDble for datD retrieval, analysis, and display?

9. could I use the software myself?

1,0. how could I obtain 'hard copy' of resu1ts?

1 l.what would ~e the cost in time and money?

Page 45: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r· r [

[

c [

[

r [

c L L [

[

L L,

L L L •

_--- ESRC Regional Research Laboratory for Scotland --_

RRL Scotland University of Edinburgh

(Co-directors: Peter Burnhill and Richard Healey)

ata Library &Dept. of Geography RRL GIS Laboratory

(with the Centre for Educational Sociology • and the School of Agriculture)

*** ESRC ResearchActivity and Publications Database (RAPID Project)

----- Edinburgh University Data Library --__ ~

Page 46: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r [

c [

[

[

[

L, L [ r-

L

L L L L

. . ' .. ' .

-

IA W®[fcru® ®aucru

[p) (!1] [Q) a ~ ©®1! ~ @ au ®

o auiJ@~rnru®U~@au [W®U®[Q)®®®

The RAPID Project

carried out at the University of Edinburgh -

" .. ,

<~

". ..

Page 47: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r t r r r~

[

[

[

[

L [

L [

t L L L L

AWARD 'DATA

Basic Award Data Reference Number Name Type (grant, centre,etc) Committee Institution Department Investigators Period start and end Amount

Collected Award Data Primary and secondary diciplines Primary and secondary subject areas

Methods ?, , Spatial and historical focu~ ?'

Other and Administrative data Notes (various)

Contact details: name, address etc.

Survey and data preparation , administrative details

'.~

Page 48: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

~ r--- ,r----. ,...---- i----' ~ r--: :--) r--"l r---"' r--; '""1 --, ,........, ~

lURe 8wl.ldon

I No," 01 1 Guld.no.

~-.~-~~.,....,. "'~ --- ~- .... • • • • Janet /,~ I

... ~/"' •• .....,. .,!I

Personalised letter

•• •

Meta-Info' on Award (award-specific)

ial8nll-'om. 10~ I, Bibliographic Report 'ctet. oplu.. I for each Award I

RAP~D (In BASIS on VAX)

Inm~~lc DataiJas8 Mk2

Inmaglc Database Mk1

, , ,

Confirm I Award Award Records

Confirm I Amend Blbllo Records

Add new Blbllogr.phlc data

aurvey Admin Programme

Word Perfect Address &. Award Infol .. 'ESRC Research '- "I

BlbliograRhlc Data

'ESRC Newsletter' Supported 1986'

~ ----,

Page 49: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r [

( [ (

r

c.

r l. l

L

Introduction

Datafiles are a valuable resource. Like other research materials, such as an atlas or a statistical source book, they are rarely an end in themselves. Moreover, only if their contents and terms of availablity are widely known can their potential be realised in full. For printed materials a title, or perhaps just the surname of the author, is often all that is required to locate and then obtain what may be needed to further examine a given research question. Better still, editorial convention almost guarantees that a published article will provide a near-complete bibliographic citation to the key references which contain evidence on which an argument seemingly turns. This happy state of affairs, now taken for granted for printed works, does not exist for research data on which important conclusions increasingly now rest; nor sadly is it the case for the computer software that embodies the techniques and algorithms on which given reported results depend. We could assert in all seriousness that 'bibliographic control' over research data is vital to support modern-day claims to scholarship and to accountability in policy research. Less dramatically, we believe that initiatives to improve the research infrastructure, such as the Economic and Social Research Council's Regional Research Laboratories, must attend to the systematic use of agreed methods for the 'bibliographic description' of research data.

Any search for printed source material would involve the use of libraries and ultimately would depend upon publishers' lists. Analogously, a search for the datafiles that may lie behind the printed statistical table or high resolution map invokes such terms as 'data publication' and 'data library'. The metaphor requires modification, however. Libraries do not yet offer full facilities for prospective users of computer-readable research materials. Moreover, much of research interest is 'unpublished', or is not published in the conventional sense.

Page 50: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[ -

r [

r: [

r [

[:

[

l

L L,

A researcher, motivated by the need to provide information for decision makers or for some more longer term research enterprise, asks something like the following series of questions:

.. is the problem in hand likely to benefit from empirical evidence?

.. are there data available which could shed light on the problem?

.. where is the database located?

.. how may I negotiate access or obtain a copy?

.. what is the provenance, status and quality of the data?

.. can I obtain code books and allied documentation?

.. how may I re-cast my problem so that these data can contribute?

.. what software are available for data retrieval, recoding, analysis and display?

.. will the data, documentation and software permit me to assess accuracy and reliability?

.. can I use this software myself?

.. how may I obtain hard copy of the results of the analysis?

.. what will be the cost in time and money?

The first and seventh questions cannot be addressed here, despite their obvious importance. In this paper we describe the rational behind bibliographic control of research data, first exploring the terms data library and data publication, and drawing illustrations from current practice in RRL Scotland at the University of Edinburgh. We highlight the considerable progress that has been made recently, and point out some of the problems that remain. We argue that priority should be given to describing what exists and how to locate it. Moreover, we re-iterate arguments why any descriptive procedures adopted ought to include a systematic account of the methods by which data were generated.

Page 51: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r r-f-

r-

r [-

[ [ [ [

r [:

[ [

l. L L L

,

Research Data

confine by or

analysts, importance

All data are of potential research interest, and we cannot attention only to those data which were generated researchers. Indeed for social scientists and policy generated by government and commerce are of prime investigation of the economic and social world at large.

our for

data for

Researchers regularly expect to have recourse to data that were generated with some purpose in mind other than their own. In general these are from administrative systems (especially those involved with eligibility criteria and the disposal of public funds), from regular monitoring operations, from sales information, as well as from the special surveys and censuses conducted by government, commerce and the academic community. The task of re-analysing such data is made more sensible by knowledge of why and how these data were generated.

In the physical sciences data can be generated in the laboratories of rival research groups given that details of principles and methods are published; although perhaps to a lesser extent than may generally be supposed. In the social sciences, data are rarely generated so readily. Few research groups can command the resources and breadth of expertise. In many instances the generation of the data requires the authority of government. How should one counter arguments that data are working capital, the value of which is lessened when access is shared? Is this attitude inevitable, and is it tolerable in anything more than the short term? We cannot answer with any greater authority than the next, but we would venture some comment.

Data are not disembodied facts. Rather they are a strange admixture of theory, method and a 'real world' that we only partially understand through our theory. Unless the reader of empirically-based statements is allowed to assess the uncertainty that necessarily surround the reported findings, there must be doubt about the truth of those reports. For conclusions, supposedly based upon empirical evidence, to be taken seriously, there must be genuine access to the data upon which the analysis and arguments rest.

This is not only because there may be technical concern about the appropriateness of the methods used in analysing data. Different theoretical viewpoints upon the same data suggest varying analyical approaches and these can yield very different conclusions (Aitkin, Anderson and Hinde, 1981). This is obviously true of different policy standpoints also.

Of course there may also be doubts about the technical means used to generate the data in question, and a recognition that the data are defined, as data, by a theory which may well be contested. This all suggests that where resources and expertise are scarce, the research community should demand 'publication' of the data that underpin published conclusions.

Page 52: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

r

r: [0

l r: l. [

l. l L

l.

Data Publication

The publication of data necessarily invokes reflection practice is meant by 'publication'. Clearly it is the something public. But the public in question may restricted; and the terms of availablity may vary widely.

upon act be

what in of making

general or

With printed material the role of author and publisher are separable. The author may write to please him/herself, but the publisher has a sense of a target public and arranges accordingly. The author is responsible for the intellectual content, organisation and intellegibility of the work, The publisher is responsible for final reproduction, distribution and ensuring that the terms of availablity are acceptable to the chosen 'public'. There is nothing to stop authors acting as publishers, but there must be doubts about the effectiveness of this in the long run. Commercial publishers make commercial judgements. This may not always serve the research community, but there clearly are publishers who specialise in meeting academic needs. There are journals and other serials, and there is the grey world of the 'unpublished' mimeo and working paper.

Publishers declare title, authorship and the date of publication, the latter intended to define an edition of the work. This bibliographically describes the work, often providing ISBN reference number and using agreed cataloguing-in-publication information. This makes the work easy to cite in other work, helps the cataloguer, and thereby promotes the custom for the work.

What then is the current date of affairs in 'data publication'? Who are the authors? Who are the publishers? The situation is confused, for although one would like to say that research data were published, the term is hardly merited. The best one can say is that data are made available.

In the commercial sphere, information (and hence data) is a commodity that is packaged, consumed and has a price. In the academic sphere we might like to view it as a resource that is conserved, re-used and has value. Whichever, the research community could give more attention to 'data publication', and gain insights from the studying current practise in the commercial world of publishing and the academic research libraries.

There is another side to data publication. Preparing research data for re-analysis demands expertise and determined endeavour. First, the data have to be organised into a coherent whole and examined for inconsistency and error. Second, the assembled data require documentation to decode their meaning. A third step is required prior to their general release. The datafiles and associated documentation are processed as a complete work that would sustain use by persons other than those involved in the generation and processing of the data. The result is a 'product' and the 'productivity' of the individuals concerned may merit acknowledgement. In the research community acknowledgement comes through peer evaluation and assessment. This in turn requires publication.

The findings which arise from data are published, and sometimes so is an account of the data generation and analysis. However, this is inadequate for two basic reasons. First, the individuals responsible for data generation and database manipulation are rarely the authors of the articles reporting the research findings. Second, as we have argued earlier, open access to sources is arguably one of the basic tenets of accountablilty in academic and policy research.

..

Page 53: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r

r [

[

[

[

L

[

[

L L

L L L

These issues are not new. The ESRC and the social science community in the UK have been concerned about access to research data for over 20 years. The official focus has been upon ensuring deposit of research data, improving methods of preservation and deriving tools to index holdings of the data archives·. Datasets so collected were then distributed or 'lent' with terms of availability that sometimes seem very bureaucratic. This is best illustrated by the terms and amendments within the ESRC Research Funding booklet. The June 1984 edition read:

"Where a (funded) project or programme includes a sample survey generating machine-readable data, the investigator is required to deposit a copy of the coded data in the ESRC Data Archive, for eventual borrowing by the researchers" (page 10).

This has since been revised, and the March 1988 Edition states:

"Where a project or programme includes the use of machine­readable data, the award holder is required to offer to deposit a copy of the coded data in the ESRC Data Archive, for eventual borrowing by other researchers. In such cases ..... consult the archive at the earliest opportunity ..... to ensure that, if the offer is taken up, the appropriate technical procedures ..... are built in from the outset. The cost of preparing the copy will be bourne by the ESRC ..... Specification for the data and documentation is laid down in the Archive's Data Preparation Manual ..... At the time of transfer the award holder will be asked to sign a licence contract specifying the degree of confidentiality to be observed in making the data available to others..... (page 12)

The sense of enforced 'deposit' to ensure preservation for posterity and 'for eventual borrowing by other researchers' does not really amount to a positiive call for data publication. Despite this, the ESRC Data Archive at the University of Essex does act as a publisher, at least to some extent. It would seem to combine the role of long-term archive with that of a national clearing-house and a distributor by mail order.

Data Libraries

Data Libraries are not publishing houses. They are, as a means by which researchers can readily access analogy, the data library is not the wholesaler but the an organisation· providing inter-library loan but (electronically) walks into.

instead, intended research data. By retail store, not the library one

The terms 'data archive' and 'data library' imply different and complementary functions, although any data facility is likely to have more than a single function. There is likely to be a particular mix tc include an archival role (to ensure preservation), a clearing-house role (to facilitate distribution) and a library role (to provide direct end-user service). The Data Archive plays a national role for the first and second, and exercises the third through external distribution. More recently, in association with the University of London Computer Centre, the Archive now provides an online service to data from the General Household Survey. This is but one of the online facilities now available to the academic research community.

In 1985, the ESRC/NSF Joint Working Party on Large-Scale Data Resources for the Social Sciences noted that with the 'developments in ·telecommunications through which local, national and international networks... exist... allow... online access to remote

Page 54: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r f' L [

[

r ['

[

L r L

L

L L L

computer systems . ,. it is possible to conceive of a "distributed archive" with major datasets held centrally and with smaller specialised sets being held in local data libraries.' This conception was already reality, not only because Edinburgh University's Data Library was providing researchers in the universities in Scotland's central belt with online access to survey and census data. Similar services, though not at that time describing themselves as data libraries, were in operation at the national and regional computing centres coordinated by the Inter-University Computing Committee. The ESRC Data Archive played a leading role in this and acted as wholesale distributor. This was not, however, a "distributed,archive". Rather, a divison of labour was in operation.

JANET (the Joint Academic Network) means that data 1i~raries can hardly be 'local', except in the sense that they may serve specific, and sometimes regional or very localised c1ient1e. But this does not imply that the data sets held there will be 'smaller'. For example, satsifactory regional analysis more often than not requires access to national data. Moreover, as with other university-based research libraries one anticipates that each will have particular special isms and that these special isms will attract national usage.

At a time when there are gross pressures on their funding we note that university and research libraries are a vital part in academic life, providing essential reference material for the UK research community. Of course, individual researchers do often arrange to buy their own texts directly but shared provision is cost-effective. Research data files can be copied so readily that libraries of data must be even more cost­effective.

In practice,and particularly at the present time, data libraries in the UK are operated by staff from research centres and computing centres, and seldom by staff from libraries. This has also been the case in the US although there is now a discernible trend towards the incorporation of data facilities within the structure of academic and research libraries, with the term 'data library services' giving way to 'library data services'.

A full definition of data is likely to overlap with a definition of information', and there can be little doubt that. libraries have a mandate to manage information and make this available in a systematic and organised manner.' At the same time, library schools are renamed, to become departments of information management, or even information science. But there has to be more to it than that. We are all still exploring the potential that this new medium offers and we need to give attention to what we should expect of the information centres of the 1990s.

Brewer's Phrase and Fable notes that the word library derives from the Latin word' liber', meaning originally not a book but the rind betweer, wood and /bark. This was an early medium upon which to record information. Before paper and print that was the new technology, and special care had to be taken to preserve information recorded in that way. The skill of reading was scarce. Almost by definition this is pre-history and it is difficult to know how long it took to master and to make access to information held on that medium ordinary and every-day. We resist the temptation here to attempt analogous arguments

Page 55: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r r r r

[:

[ [

r: [

r: l l~

L L

about limits to 'data literacy', the power of a new 'priesthood', and the effect this may have upon society. The point is that a data library service, or a library data service, must now make information available in a suitable computing environment, and make this information accessible to those whose primary activity is not connected with computing.

Whether access is via multi-access mainframe computers over a wide area network (or indeed across the Joint Academic Network, Janet,) or via single-user workstations, the operation of such a service does itself require special skills in data management and a good appreciation of a computing world which is characterised by rapid changes in software and technology. These are major demands upon library staff, the vast majority of whom did not receive any experience of computing in their initial training. That said, 'librarians probably know more about computing than computer specialists know about libraries' (Battin, 1984).

There is yet another aspect. The Data Library at the University of Edinburgh is supported jointly by the University's Computing Service and Library. However, like many of its counterparts in the US and in Canada, it began as an adjunct to a large programme of research (or more accurately in Edinburgh's case to a 'consortium' of research programmes given research support by the University's Program Library Unit). As Judith Rowe of Princeton University states, 'Data Archives and ... data libraries were initially established by social scientists to provide widespread and economical access to data for secondary analysis ... '. It was social scientists who wanted access to these data but this also meant that data archives and data libraries were initially, and in Europe still almost exclusively run by individuals with a further set of skills, those associated with the generation and analysis of the data.

As remarked earlier, research data are not yet published in the conventional sense and their use often depends upon use of documentation which is both abbreviated and technical, and upon use of software to carry out elementary statistical procedures. While it can be argued that it is the end user who should possess the necessary skills for actual use and for the more sophisticated analysis, users of data libraries expect staff to be able to perform at least simple retrievals and produce the occasional summary tables and maps, if only to ensure the integrity of the data they make available. One would expect that a reference librarian would display comparable knowledge of printed source material, but we doubt whether more than a few library staff yet have such knowledge of computer-readable data or the skills described - although we acknowledge that we ourselves can cite no data on this. Clearly each data library service (or library data service) must resolve policy on the type of reference service it wishes to give to researchers. In addition to any bibliographic search services (considered later in the paper) the tasks associated with the acquisition of data and the management of the supporting documentation not only require the existing library skills and procedures. they require some knowledge of how the data came about and of what the end-user intends.

Unless a service is limited to the operation of a tape library, as would seem the case in some library based services in the US, data need to be processed and manipulated into a form better suited for their subsequent re-analysis. The data may be 'static' with respect to its intellectual

Page 56: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r [ (.

[

r c

[

L I L L L L L

content, but the format of the data often has to change to suit particular software and operating environments. A dataset which has been extensively documented, within software as well as in print, is undoubtably easier to use. This, if thorough-going, may result the production of a further 'edition' of the dataset, but this investment may reckon to be cost-effective even in the medium term.

Certainly it lowers the barriers of entry for those wishing to re-analyse data. At Edinburgh we have paid particular attention to producing 'data library tools', early examples of the 'friendly user interface' to batch and interactive mainframe computing, in order to give easier access to complex data or software. These include an interface to a computer­readable Ordnance Survey Gazeteer, to the Postcode ~irectory for Scotland, to the Central Statistical Office's Macro-economic Time Series, to GIMMS (the cartographic software) and to a suite of software which provides access to census small area statistics. Perhaps these software products are not formally 'expert systems' but they perform the same function in many respects by embodying procedual expertise and making this available to end-users. Again this investment is cost-effective, as it lessens the number of (essentially trivial) enquiries about access.

More recently, as RRL Scotland, attention has been focussed upon methods of providing access to spatially-referenced data using graphical facilities. The RRL Browser currently exists as a prototype, using the new techniques of pointing-and-clicking on a succession of menus, the text of which indicate either the content of datafile or the function of software which may thereby be selected. This owes much to the development work that went into CARTONET and PHOTONET, which are software systems for bibliographically controlling sheet maps and air photographs, respectively. Both use graphical searching techniques to exploit the spatial referencing implicit in the item

Bibliographic Control

The ESRC/NSF Joint Report noted that

',.the continuing lack of comprehensive bibliographic information on machine-readable data sets is a major cause for concern ... a serious impediment to academic users gaining access to data resources ... ' (page 13, op. cit.)

'Whereas effective access to relevant literature is seen as a sin qua non of scholarly research, a large source of equivalent information (research data) is practically inaccessible because of the inadequacy of guides to its exi stence.' (p24, Appendi x 2 op. ci t. )

As researchers, we need to know what data exists and how to locate it. Nasitir put it rather strongly:

'If we had to choose one function that was more important than any other, I suppose it would be the union listing of data resources' (Nasitir, 1982)

But before this could come to pass there had to be agreement on how to cope with computer-readable research files. In 1976 Tannebaum had noted that 'the problem of efficient cataloguing ... is unsolved' (Tannenbaum, 1976). The book by Dodd was the 'long-awaited contribution' to Cataloguing Machine-Readable Data Files (Dodd, 1982). It brought research data files to the attention of librarians in the US, offering an interpretative guide

Page 57: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r [ [

r f C'

[

[

[

[

L L [:

L

L L

to the revised Anglo-American Cataloguing Rules (AACR2). In the UK it went largely unnoticed. The ESRC Data Archive was collaborating with the European Archives on a standard 'study description'. There was work funded by the British Library but this was on microcomputer software (Tagg and Templeton, 1983). In 1985 Sue Dodd gave seminars in Edinburgh and London. These coincided with work on cataloguing digitised map data by Sarah Tyacke and by David Rhind for the 'Chorley Committee' and led to serious examination of Dodd's work.

Shortly after these seminars, the Computer Files Cataloguing Group was set up by Marcia Taylor and Bridget Winstanley with funding form the ESRC to co-ordinate developments. The result of about two years of discussion was a report from working parties chaired by Peter Burnhill and Ray Templeton. This is being published as a 'Guide' to cataloguing computer files in the UK·. It conta ins further comment about the issues discussed by th is group and, in a second part, it details the descriptive fields making reference to AACR2 and to UKMARC, the coding scheme used by librarians to exchage bibliographic information.

A summary of the descriptive fields is set out in an appendix to this paper. The appendix also contains an abbreviated list of fields for 'bibliographically' describing data and software, respectively.

The argument so far is in some ways simple: in order to be able to find the data resources needed one has had to help create the information that will fill the catalogues. There are new demands to be made of university and research librarians. Hopefully in the near future it will also fall to those librarians to catalogue the materials and undertake bibliographic searches. It should be clear that we are not there yet. The fields for the description are available, as test models at least. There is therefore the prospect of creating local holding catalogues, shared union catalogues of data resources, and perhaps there may even be the makings of a national bibliography of computer files.

Why bother with all this? Researchers are busy people and such meta description takes time. There are several reasons why. Conducting an inventory of available data will reveal exiting new possibilities for research. Cataloguing one's own holdings improves internal communication and saves time when there is staff turnover. It is also possible to turn such meta data to good effect in retrievals to integrate data across datastes. Even where data cannot be released, publicising its existence advertises expertise and promotes collaborative work. Cataloguing will require resources but it is sound strategic investment. Funding and resources are scarce but the alternative is wasted time and effort duplicating programmes of map digitising, and trying to negotiate access to data which has a.lready been secured for the academic domain should clearly be avoided. Cataloguing also clarifies the terms on which data are made available, by making these terms explicit and ordinary. The benefits of a union catalogue shared between the Regional Research Laboratories would soon become evident.

The ESRC/NSF Joint Report recommended that the research councils should now amend their conditions of grant to include:

'The grantholder will deposit in the (appropriate) Archi ve a "t it 1 e page" , record descri pt ion, and abstract for any data set produced as a result of the award of the grant, whether or not the data set itself is deposited in that Archive. The constituents of the record should follow the guidelines set out in ( an attached document)" (page 24, op. cit.)

Page 58: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[ (

I.

t:

l r: L.

L

This sets aside the issue of data publication but it does call for full citation of data. As prospective users of the catalogue which may contain these meta descriptions, the Regional Research Laboratories would do well to examine how they would like the data described.

Page 59: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r [~

p [

r r ,

X'

r t [

L r L [

L L

t

I L L

References

Aitkin, M., Anderson, D., and Hinde, J. 'Statistical modelling of data on teaching styles'J. Roy Statist Soc (A) Vol 144 (4) 1981

Battin, P. 'The Electronic Library - A Vision of the Future', EDICOM Bulletin, Summer 1984 Vol 19 (2)

Burnhi1l, P. 'Towards the development of data libraries in the UK', unpublished mimeo, University of Edinburgh, April 1985 (Prepared for the Committee of Librarians and Statisticians and taken in evidence by the ESRC/NSF Joint Working Party op. cit.)

Dodd, S. Cataloguing Machine-Readable Data Files, American Library Association Chicago 1982

Finch, S. and Rhind, D. Cartographic and remote-sensing in the United Kingdom London: Britsh Library, 1986 information guide 6)

digital databases (British Library

Nasitir, M. 'Increasing computerized networks', California, May 1982,

availability of machine-readable data files through IASSIST Proceedings, Annual Conference, San Diego,

Rowe, J. in Forward to Dodd (op. cit.)

Tagg, W. and Templeton, R. Computer software: supplying it and finding it London: British Library 1984 (Library and Information research report 28)

Tannebaum, E. SSRC Newsletter London 1976

Cataloguing computer files in the UK: standards (Review Draft, October 1988) Joint Files Cataloguing Group, Edinburgh and London

a practical guide to Report to ESRC Computer

ESRC Research Funding, Edition 2, London, March 1988,. ISBN 086226 2038

Large-Scale'Data Resources for the Social Sciences, Report of the British American Joint Committee to the Economic and Social Research Council (UK) and the National Science Foundation (USA) June 1985

Page 60: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[ [

r [

L [

r' r. c [

L r L r L L

t L L

Appendi x 1.1 An abbreviated list of fields required to 'bibliographically describe' data: an example 'data capture

computer-readable research form' .

A 'multi-part computer work' may comprise more datafile, to include code-books and other

than just a documentation

single and

sometimes several data and program files.

Title (not filename) +

(Section 1.1 of 'Guide', see below)

Sub-title, & other title info

'Author' ie responsibility for intellectual content, creation of dataset, etc

(Section 1-.1

Edition (Section 1.2 (release, version, etc) + Date of edition + Responsibility for edition

Publisher/Distributor (Section 1.3

Terms of availability (Section 1.5

Date of deposit of data with ESRC Data Archive (+ Study Number. if known)

Description/Purpose (Section 2 i e i nte 11 ectua 1 content of item, with spatial and temporal coverage, and indicating how data were generated

Other comments (Sections 3 & 4 eg technical dependency upon computing environment, indication of size or online access arrangements

" "

" "

" "

" "

" "

" "

(A full list of fields and comment on their definition is 'Cataloguing Computer Files in the UK: A Practical Guide Review Draft, ESRC Computer Files Cataloguing Group 1988. obtained form the authors.) -

set out in to Standards'. Copies may be

Page 61: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r ['I

r r r [

r c (

t r -" -

c [

f r ~,

t [

L

Appendix 1.2 An abbreviated list of fields required to 'bibliographically describe' computer software: an example 'data capture form'.

computer work' include user

files.

A 'multi-part file, to and program

may comprise more than just a documentation and sometimes

single program severa 1 data

Title (Section 1.1 of 'Guide', see below) + Sub-titl e, & other title info

'Author' (Section 1.1 ie responsibility for intellectual content, creati-on of software, etc

Edition (release, version, etc) + Date of edition + Responsibility for edition

Publisher/Distributor

Terms of availability

Description/Purpose ie intellectual content of item, function, etc

(Section 1.2

(Section 1.3

(Section 1.5

(Section 2

Other comments (Sections 3 & 4 eg technical dependency upon computing environment, indication of size, or online access arrangements

" "

" "

" "

" "

" "

" "

(A full list of fields and comment on their definition is 'Cataloguing Computer Files in the UK: A Practical Guide Review Draft, ESRC Computer Files Cataloguing Group 1988. obtained form the authors.)

set out in to Standards', Copies may be

Page 62: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

r r [

.i' r [

r [

[

{ [

L [

f [

L L [

Appendix 2 A Summary of fields for bibliographically describing computer fil es.

A full list of fields and comment on their definition is set out in 'Cataloguing Computer Files in the UK: A Practical Guide to Standards', Review Draft, ESRC Computer Files Cataloguing Group 1988. Copies may be obtained form the authors. The text below was adapted from an abstract made from the editor's pre-publication version.

1. IDENTIFICATION AND AVAILABILITY

This covers the information required to identify uniquely a dataset or program. It should be sufficient to enable someone to find out how to obtain a copy of the item.

The item to be described is first defined by a statement of title and responsibility.

Give the title of the item (dataset, utility or program) and include any other title(s) by which the item is generally known. Immediately after title add [computer fileJ. (If uncertain about the title, supply a title in square brackets, [J. This may be a brief description, or an informal title by which the work is commonly known. Constructed titles should attempt to be descriptive of the subject content: titles of data files might, for example, include reference to time and place.)

Give a statement of responsibility, that is the name or names of the person(s) or organisation(s) responsible for the intellectual content of the item. (Examples of principal responsibility might be: the creators or designers of a program, the principal investigators responsible for creating a data file, Examples of secondary responsibility might be: sponsors, authors of accompanying documentation, agency for data collection. A short word or phrase clarifying the nature of the responsibility can be added and should correspond to the terms which appear in the item - eg 'author', 'programmed by', 'graphics by', etc.)

Give a number or phrase which uniquely identifies this edition commonly used will be release, version, level, update or. revision or the edition may be defined by a date (eg Nov 1984 release).

words (eg 2.2)

('An edition occurs when there is any change in the intellectual content of the file, including additions or deletions; a change in the programming language; a change upgrading or improving the efficiency of the file. A change in the physical carrier or changes relating to the character code, or to blocking or recording densities would not constitute a new edition' ISBD (CF) 2.1.1. The edition statement may also be accompanied by a statement of responsibility for the revision, update, etc.)

Give details of the act of 'publication': that is, the means by which the computer work was first made public. Give the name of the publisher where this is clearly indicated.

(For works which are not formally published, for example data or program files which are circulated among universities, colleges and research centres, it may be necessary to refer both to the distributor and to the depositor.)

Give the series or collective title, where appropriate. A series is a 'group of separate items related to one another by the fact that each item bears, in addition to its own title proper, a collective title applying to the group as a whole.

Page 63: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

c r t r r [

[

[ [

r ['

t L L L L

(The concept of a 'series' may be particularly apt for some computer works especially where, for example, many data files are produced by a single agency from a single source. One example might be the Census small area statistics files published by the Office for Population Census and Surveys.)

Give the International Standard Book Number (ISBN) or equivalent, if known

Give the terms on which the item is made available; when for sale, include the price. Where appropriate include such statements as 'free for academic research and teaching', 'Shareware', 'Public Domain (PD) Software', etc.

2 SUBJECT AND CONTENT

This covers the intellectual content or function of the computer work,to meet the special needs of prospective users of research data and program software. Distinguish between: Program files, Data files, Mixed data and program files, then distinguish further, using the various qualifying terms listed in the 'Guide to indicate content, function and purpose.

Provide an Abstract. This may be a brief summary of the content or function of the item described (normally in 150 words or less) or preferably, 'systematic summary'.

(Special libraries require greater descriptive detail about content and function, as has been found with map materials and audio-visual material. This is particularly true of data that are collected or made available for research purposes. The objective should be to provide a comprehensive description of the study or procedures which gave rise to the data and which govern their interpretation; or, in the case of computer software, of the functionality or embedded algorithms in software.)

For data:

Record, where relevant, the population of interest; that is, the aggregate of persons or objects described by the data; indicating geographical and chronological coverage (including areal units or periodicity, if appropriate), and the units of collection (or analysis) .

Record the method or instrumentation used to collect or generate the data, including reference to source of information (or respondent), especially where this is not otherwise evident and, if appropriate, give bibliographic citation, or equivalent, of printed or other material. It will often be useful to describe or name the encoding conventions used, and sometimes to give the bibliographic citation of published coding schemes. Record a description of any sampling methods used in the data collection, including reference to the target sample size, and to the sampling fraction. Record the study design resulting from the sampling/measurement combination and any use of non-response or sampling weights, or any imputation scheme.

Record number of units described in the data (ie number of cases in sample survey).

For map data and other geo-spatial representations:

Record, where relevant, the scale, projection, and grid system used on map data.

Page 64: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r ~ f'l r-1-'

1. [

r [ [ ( [~

i. e I, ! ~-

[,

1 [.

If a widely accepted subject classification scheme is in use, information, or else record subject keywords which describe work.

3 TECHNICAL INFORMATION AND PHYSICAL CHARACTERISTICS

provide this the computer

The purpose of this section is to supply information about characteristics of the file and technical requirements that govern its use.

(Such information includes details of hardware and software requirements. A description of physical charactersitics is not required for a data or software that are only available by remote access. The intention should be to provide a description which is sufficient to be helpful, rather than attempt to provide comprehensive descriptions covering all known conventions.)

Note both the designation of the item and theextent (size and format) of the item(s).

(Designation will also have been recorded under Subject & Content. The indication of size or extent should be recorded, as appropriate: for a data file, as the number of records and/or expressed as bytes (or kilobytes, etc); for a program, as the number of lines of code (or statements) and/or bytes. This may be recorded exactly or approximately.)

State the required computing environment and any other technical information. This may include limitations on peripheral equipment, both for input and output, and any computer modifications (eg graphics card, co­processor chip, etc.) necessary.

(For datasets, indicate dependency upon software. For software, indicate whether intended mode of use is batch, interactive or both.)

Describe, where appropriate, the physical characteristics of the item; that is, the name and number of physical parts (in Arabic numerals) constituting item.

(Clearly, this is not required for files available only by remote access.)

Where documentation is not regarded as an integral part of a multi-part work, and therefore has not already been described or cited, a description of accompanying material may be included. Such documentation includes instruction manuals, codebooks, and any other printed or machine-readable documentation.

4 ACCESS AND MANAGEMENT INFORMATION

This section contains information about the location and the means by which the item described can be accessed.

(Whether all or only some of these fields are used would be a local policy decision, and depend upon the nature of the intended catalogue and the service environment. Provision is made for recording information on the management of the item and of relevant aspects of its processing history.)

Give information on the location and extent of the holding. This includes location name, location address, service name of host computer and locally­allocated name (which may be the filename).

Page 65: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r ~J

r r r [

r r r L [i

[

L L L [

L L t L

Give details of registration procedures and conditions for access. This includes local access regulations and/or restrictions, including address for registration, electronic (email) address for registration, royalties or other charges, and a data acknowledgement (disclaimer) statement, if appropriate.

Give information on local documentation and support and describe how "the the item may be used online. Include postal and electronic address information (written as (Network)" alphanumeric address), the local filename, and other useful information such as the local log-in and log­off and a statement about any online help on how to use the item described.

Page 66: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

c r r

,

r '.

r [

L L ['

L 1 r r 'L

t L

... .. ::

'.

orait Guide to !ianaqinq Infor.nl!L· '.on ae a Resource Version 2

Data Dictionaries A data dictionary ia a facility for lDaintaininq descriptioil8 about

. the data held by an orqanisation. Typically it r,ecords data itell1 name8, descriptions, size and structure. Data. dictionaries were developed mainly to ensure standards in data naming conventions and to enab1a the impact of proposed changes to the data base to be analysed.

In ita simplest form the data dictionary may be a list of data items and definitions kept on paper in a. fOlder. At the o.ther end of the spectrum it will include the latest information (automatically up dated as the database develops) about the data Structure (in a relational database the tables, and possibly the file and records as well, together with their relationships) the users, security (access rights) and integrity rules and complete definitions of every screen form, report, graph, chart and application.

The information in a data dictionary is normally stored as a database. Its structure will depend on the product. If it has been specifically designed-to manage a particular database product then it will usually have the saIDe structure as that database;- the dictionary supporting a relational database will, for example, be presented as twO-dimensional tables, but there is no guarantee of this. An interface will be available providing access, to question, manipulate and modify the dictionary although lan']uaga may not be pa;cticularly user-friendly and a. working knowledge of the database it supports will alJnost certainly be necessary. ,

" .4. For describing into=ation

A common description language is needed both for describing information requirements and for describing the 'subject matter' or content fililds on the information inventory, 10 that they maT e~ilT be compared.. Internal consistency of terDlinoloqy in the. inventory is essential for two reasons-~ (i) duplica.tions can. be: identified, and (ii) the inventory itsal£ can- be int2=Oqa.tad. and:.: understood by IDany types of en.qui.rar.

Each inventory-record will contain particulars of intormatio~ content or subject matter. Access to the inventory, and itl! physical or logical ordering will depend upon the "indexinq terms" used to describe content. The sources and media included the audit survey will, of course, be diverse, but the subject matter covered will be similar in many cases. It should also show a close match with the information defined as required by information needs studies. For this reason, i.t is important that the details about information content derived from those supplied by users in the the audit survey are recorded in a consistent way and the same terminology is used for defining requirements.

The technique for describinq information is called indexing. The different approaches to indexing that are available are outlined briefly below. ~ greater or lesser degree of precision and centrol can be imposed by different methods of indexing.

Page 67: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r [

r

r f' r [

r r L L r L L ! r L

Dre.::t: Guide ':0 Managing Informa"tion as a Resource Ve=sion 2

4.4.1. Approache. to indexing

Indexing is the shorthand expression of the ideas contained in 4-docUlllen~ or other information source. Concep1: or subjec~ indexinq_ involves analySing the subject matter of a document, and choosing appropriate terme to represent it. The effectiveness of subject indexing as a means of identifying and retrieving the material required by users depends upon having a properly-constructed index language. The choice ot the right index ':erms, and the systematic arrangement of those terms, are therefore tasks which need careful thought. This is true whether or not the index is handled, and enquiries are made, usinq a computer. By contrast, word indexing is description of the document as a whole by reference to individual words or phralles which occur in the text, The index to be found at the back of a book is an example of a word or phrase index in its Simplest form.

Manual indexing methods rely on either skilled indexers, euch as librarians or the use of some form of controlled vocabulary. Computerised indexing may be either free-text or use a controlled vocabulary,

Pree-text indexing Documents are read or keyed onto a database, either in full, or aa self-contained summaries or abstracts. Every word, except for IIhort or common ones which may be excluded, then becomes an index: term for the document in. which it occurs. . In searching, the fres­text retrieva~ software interrogates the index and locates those documents which contain the chosen aearch-term(s) in whatever context they happen to be present-. With many of the software retrieval packages now available, this search process can be refined by: restricting the search to a named field within the documen~ eg. title; or by using Boolean logiC (ie, the use of the operators 'and', 'or' and 'not') to combine search terms; or bot~ together.

Natural language Thi.s is a torm of free-text (word) indexing in. which only pre­eelected terms are indexed.. It can applied to either manual. or computer-based sYS1:9lIIs. Indexing terms may be taken from the. document ti.tle_or headin~, from an appended indeXing field, from an abstract or from, the main ten itself. Althougb. the selectioTL process is straightforward and involves relatively little document analysis, natural language indexi.nq will not necessari.ly result in effective retrieval unless the index terms are very carefully selected.

Controlled vocabulary Controlled vocabUlary indexing involves constructing a set of approved terms to describe all the concepts encompassed within a particular subject area. From this prescribed list, index terms, or 'descriptors', are assigned to each information item. Vocabulary control ensures a measure of discipline and consistency in the indexing process, and can help to avoid the retrieval prOblems that might arise, for example, from the indiscriminate use of synonyms. It can take the form of either:

(1) a straightforward list of prescribed or authorised terms; or

(il) a thesaurus.

Page 68: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r ~,

r r r [\

r .

[

c r [

[ {

L L t 1.1

L r

Draf~ Guice ~o Y.ar.aging Information as a Resourct Version 2

4.4.2. Inda:dng with a th •• aurus

A thesaurus is a controlled vOCabulary in which the indexing terms ara az:ranged hierarchically to ahow the logical relationships between them. The usual ralationships are:

- Synonym (equivalent) - Broadar term (higher level) - Narrower term (lower level) - Related term (similaz:)

A thesaurus can serve as very usetul aid to the detailed analysis of information content. It allows the same information held in different sources to be described consisterttly, and can therefore be applied equally to books, registered files and other non­bibliographic intormation media. Retrieval of a particular item of information will depend upon the terms which the indexer has allocated to it from the thesaurus. There is no theoretical limit to the number of terms that can be used: in this, as in other ~eas, judgement is required in deciding on the level of det~l necessary to ensure that users' needs are met.

A number of thesauri have been developed in government, (a well­known example is used in POLIS, the parliamentary document database) and some of these have been published. There are a fey which are of general application, but most have been tailored to suit a. particular group of enquirers and are therefore limited in: their Subject scope. Standa.rd or 'off-tha-shelf' thesauri are unlikely to latisfy the specific needs of any government department, but the development of a customised thesaurus can be a long and intellectually-demanding exercise which may require the specialist skills of someone like 4 profeSSional librarian. Nevertheless, in a scientific, technical or any other specialised environment, a well-structured thesaurus can provide the effectiva vocabulary control without which any retrieval system - compu.te:r­ba.esed or manual - will be unable to function fully effectively •.

A thesaurus can best be developed, controlled and raaintained on: iL computer. If it. is to be paper-based, some thought should b9 given as to ho~ best to manage and provide access to the volume of printed material that will accumulate as new terms are added and existing ones are reorganised. Where the thesaurus is held on computer and is available via on-line display, this will not be a problem, but it will still be important to be able to amend and manipulate the constituent terms as and when necessary.

4 • 4 • 3.. Appl.ications of indexing techniques

The information inventory to ba found in the library, for example is tha catalogue, which holds details of every item (book, periodical, report, etc.) in stock. A traditional card catalogue will be arranged in alphabetical order of author's name or title or subject. The field chosen for ordering a card catalogue is ~ignificant since that becomes the "key· for retrieving the items. With a computerised catalogue many fields on the record may be searchable and used in combination for retrieval. A library catalogue ~ecord will usually contain

Page 69: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r r

r [

[

r [

[

L

L [

L L L [

L

D.raft. Guide- tc' Managing In£or.nation as a Rescurce Ve=ion 2

- -1" K: - publisher - publication date - author ~ number of pages - classification number (implying shelf location) - number of pages - indexing tenlls.

The terms used by the librarian to describe content wiil normally be derived f~om both a classification schema and from a rel~ted vocabulary of indexing terms. The classification scheme is used primarily for logical and physical location ot items; the index terms are used by enquirers to identify items which are relevant to their area of interest or work. .

A libra -st Ie catalo e of com ter files A project was carried out t e Computer Files cataloguing Group (CFCG) under the aegis of the Economic and Social Research ~ , CounCil, using trained librarians to catalogue computer files~ ~~' The exercise· is part of a broad strategy to encourage wider access to, and exploitation of, machine-readable databases. The CFCG's task was;

- to identify and develop cataloguing standards for computer files ;-

- to imElemant a' triu tmion catalogue

- to disseminate the Group's york to a wider audienca

The outcome of the Group's work was the publication i~ October 1988 of a draft report entitled "Guide to cataloguing computer files in. UK: a practical guide to standards'. The Guida bring~ together a number of references to terms and practices recommended.~ for descriJJing material. involving computer file~} The app=ac~ (! 1-? has been to follow and extend the well-establisl1!1d library . convantiona used to create catalogue record.!r for books r _ _ .

periodicals- and other item!!: req:u.irl.ng systematic bibliographic:­controL.

The- Guide concentrates on defining: relevant descriptive- fi&lds of: information about computer files. These fields ara organi.sed for cataloguing pll-rposes under fouz broad headingsl

(i) identification & availability

(ii) subject & content

(iii) physical characteristics & technical information

(iv) access & management information_------ GK.J The Guide does not attempt to address the problem of entering (ry \~ catalogue info~ation into cataloguing software, or of the _____ ~ retrieval and display of the catalogue information. _____

[For further information about the CrCG Guide and its use contact: ?eter 3urnhill (Edinburgh University) 031-667 1011 extn 4371/2)

Page 70: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

t I.

r [

r r r r

[

[

f r [

{

[

L L L

Towards the Development of Data Libraries in thl: U.K.

By

Peter Burnhill, Data Library Services, C.A.S. T., University of Edinburgh

Page 71: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

( '(

r L r r r r f L

[

C r ,

C f l

L

! L

L L

TO~lards the Development of Data libraries in the U.K.

Peter Burnhill, Data library Services, C.A.S.T., University of Edinburgh

Data has been described as "a genera 1 term used to denote any or a 11 facts, numbers, letters and symbols which refer to or describe an object, idea, condition, situation or other factor" (5. Dodd 1982) .• Clearly this is quite wide and describes much that anyone would want to ana lyse. The word library is derived from the Latin word liber, originally the rind between the wood and the bark, the medium on "hich the information ;,as recorded before the invention of paper. At one t1me the reader of a book had to know how to treat that particular medium, but after a while all that >las needed were 1 iteracy and the right to use a 1 ibrary. Access software and analysis software now free the researcher from having to worry too much about the physical characteristics of machine-readable data held in a data library.

In this paper 1 look at data libraries from each of t"o directions: from the point of view of those who want to use the data, and from the point of view of those who generate the data; that is, from the point of view of data analysts and data producers. The paper also includes a rough historical sketch of the development of data libraries in the academic (mostly social scientific) sector; a discussion of the importance of bibliographic control and the provision of an on-line meta-database, ('data about data') ,. and highl ights the trend towards access to the data that produce statistical tables.

This paper was prepared in April 1985 for the Committee of librarians and Statisticians, a body sponsored by the Library Association and the Royal Statistical Society.

Furthe~ enQui~ies about the content of the paper, or about the CAST/EUL Data Librar.Y Services should be addressed to the author at The Data library, CAST, 1 Roxburgh Street, Edinburgh, EH8 9TA, (telephone 031-667-1011 X 4371/2).

Page 72: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

f

r [

r r [

[

[

L (

L L

A Oata Analyst's View -

When using a data library, the data analyst may be motivated either by the need to provide information for managers and decision makers, or by the wish to contribute towards some longer term research enterprise. Either way, the data analyst asks something I ike the following series of questions;

1. Would the problem in hand benefit from empirical evidence?

2. Are there data available which could shed light on this problem ?

3. Where is the database located ?

4. How may I negotiate access?

- permissions - mode of access - payment or funding implications

5. What is the provenance, status and quality of the data?

- questionnaire - target population - sampling scheme - non-response

6. Can I obtain codebooks and allied documentation?

7. How may I re-cast my problem so that these data can contribute?

8. What software is available for data retrieval, manipulation, analysis and presentation?

9. Could I use this software myself?

10. How may I obtain hard copy of the results from the analysis?

11. What would be the cost in time and money?

On-line data libraries and appropriate telecommunications are about enabling these questions to be satisfactorily answered. at one's desk and within a relatively short period of time.

Some definitions

Databases come in all shapes and sizes. but I would like to follo',o/ Caudra Associates Directory of On-line Databases in attempting to distinguish between reference databases and source databases (8. Croni.o 1981), where I t;:rt:I'I!;Il\.t: J(2taGc1~t!~ l..uIltJ.wj~!::! IJiiJl iul]i~Q~I(.; UdtdtJd!)e~ dmJ UUIt!f" 'rt:!r~rTdl' or signposting material; and SOUrce databases contain information which is itself the focus of the enquiry. (It is acknowledged that the distinction is not hard and fast.) Clearly data libraries need both. The data analyst is interested in the data in the source databases, but will need to consult some reference database during the search, in order to discover >that data exist, to ascertain where the data are located, and, perhaps, to obtain the relevant codebook and allied documentation.

Page 73: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r I

r [

[

f

r r [

r [

r f [

(

r t L L

Another useful demarcation is between text and numeric data, highlighting the notion that for some analysts text is taken as data in the enquiry, e.g. in conte'nt analysis, analysis of syntax, etC. The focus in this paper, however, is on numeric data, with one tangential remark later on about the potential value of databases containing survey Questions. Caudra Associates refer to numeric, text-numeric, properties and'full text under the heading of source databases.

It may also be useful to make some other definitional points, before going on to look at data libraries from the perspective of data producers. The first is that on-line databases are those where the data can be retrieved interactively (or seemingly so) from a remote computer via a telecommunications network. The second is that public-use data are those data that are available to users outside the originating agency, although the 'public' for any given dataset may be restricted in, some way: for example, where data has been made available for academic research purposes only, or where use is by subscription or some other means of payment. SOO1etimes only those databases which are available without restrictions are considered publiC (Meyers and Rockwell, 1980).

According to the second edition of Anglo-American Cataloguing Rules (AACR2) a machine-readable data file (MRDF) is defined as any information encoded by methods which require the use of a machine (typically, but not always, a computer) for translation. MRDFs encompass both the data and the programs used to process the data. (~Iore recently the more elegant generic terms 'computer-readable file' or just 'computer file' are being used.)

The terms 'data archive' and 'data library' imply difhrent but complementary functions. Anyone particular institution 1s likely to be a mixture of the two, The ESRC Data Archive at the University of Essex, increasingly plays the role of a national clearing house for MRDFs, analogous to the wholesaler or the British Library (lending Division). Indeed it has become the major agency for the storage and dissemination of government-generated data. The distribution to researchers at diffe,-ent computing sites about the country is generally by magnetic tape; although there is now a service offered for distribution by floppy disk (suitable fnr mirrnrnm~lt~r~) and thJ?r~ are plans for d~ta distribution Jcros, th~ Network.

Data libraries on the other hand, provide on-line services to (local or external) 'remote' users who access the holdings of the data library through interactive programs. In many instances the acceSs programs have some 'user-friendly' interface which sets up the command language for the computing task to be done in batch or interactive mode. The analogy here is with the retail store (in contrast to the wholesaler) or the walk-in library (in contrast to inter-library loan). Readers/users may browse or take books off the Shelves themselves; similarly, user-friendly access software allows users to do their own analysis.

In data libraries then, a selective range of data files physically reside in a central location, and users 'travel' across telecommunication networks to directly access and analyse the data. Data archives, on the other hand, hold a (relatively) comprehensive range of data files and physically distribute data to the end user, or to a data library.

Page 74: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r-,

[

r r-

r r c, [

[

l [

L r L

L L L L L

(

A data p'roducer's view

wnik.. ~~ ~ 0.. \ ~~J 1\.if ~V/i0 I

National Statistical Offices and national Census Offices constitute a considerable cartel of data producers. They are not, of course, the only producers of "quality" data. Commerce, industry and other public industries generate data of great potential interest to researchers, quite apart from the considerable throughput of numeric data coming from the market research and academic research sectors. However, government statistics are produced at public expens~ and so it is reasonable that these data shouTd be made publ1cly available, and in a form which is suitable for statistical analyses. In replying to the Royal Statistical Society's debate on the Rayner Revi~w, Sir John Boreham (then Head of the Government Statistical Service, GSS) stressed that

"We will continue to make available the statistics that we compile and the analyses that we do. Also we shall use the new media (microfiChe, magnetic tape, on-line access to data bases etc) to the limit of our ability so as to make the material available cheaply, quickly and conveniently'

(JRSS (A) 145, 1982)

J.R. Wetzel (of the U,S. Bureau of the Census) has referred to five timely. low-cost forms of data dissemination:

a) printed reports b) microfiche (containing statistical tables) c) customised or special tabulations d) computer summary tapes e) public-use micrvdata tape files

CO . It'\)"", \ (IASSIST Proc. 1982)

The first form of data proviSion I isted is traditional, the second is modern but still within the province of mainstream reference libraries. The third sometimes results in printed tables which are stored in libraries, but may be very familiar to government statisti,ians and survey analysts as wads of computer output. The last two forms of data provision listed are in computer-readable form, and therefore require some form of data library.

Micro- and aggregate-data

Statistical information in statistical reference sections of libraries or information offices has traditionally been held in summary form, most often in tables. In two-way tables the rows and columns are typically classifications of some "explanatory" or background factor (eg. occupation, area of residence, etc) and the cell entries are commonly counts, arithmetic means (including percentages). This holds true for multi-way tables also. Data analysts are, however, often interested in obtaining access to the individual data, as a 'case by variable' matrix array. This is sometimes referred to as 'micro-data', where the data-set accessed contains "information on individual respondents or organisations such as schools and companies" (Hakim, 1982). In contrast there are summary or 'a99r~9ate data', which are data-sets containing statistics calculated from the micro-data. Summary statistics are, however, also products of p:lr"ticu'~ ... thco"ioo (r1'H10e eHpl ;eit cu· 1"1«:1'1:) .".n'i.::h O\-C ;nv6'k~d fi,"3t t,:, define certain phenomena as constituting data and then to legitimate the use of certain classifications and methods of summary. Moreover, the data which are recorded and summarised may be theorised to have been the product of specified underlying (social or physical) forces and processes. There are also products of unrecognised and ephemeral factors including those

Page 75: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

r-

[

r

r [

[

[

L L L L

L L

associated with the measurement procedures. The aggregate data, as published In tables, are therefore not 'facts', but 'products'.

What the academic data analyst wishes, is to get beneath the summary data and have access to micro-data (or to data at a low level of aggregat ion, and classification), in order to investigate alternative explanations and different theories about the processes that gave rise to the data. This wish is of course only partially realised as the micro-data are themselves products of theory and measurement, as well as the real world.

Wetzel, of the U.S. Bureau of the Census, reported that

"Public-use micro-data samples present census or survey data in individual record form as desired by many policy analysts and researchers who want to create their own tables, 'regression analysis, simulation models and so forth. Certain geographic information and respondent characteristics information is suppressed to ensure that the identity of any particular person or housing unit is not disclosed... Public-use micro-data samples have proved enormous 1 y popu 1 ar and have grea t 1 y advanced soci a 1 sc i ence research" •

(IASSIST Proc, 1982)

The release of data in this form opens up the possibility of a more intensive re-ana1ysis of the data as the data may be re-c1assi fied according to many different perspectives. Persons other than academics may also wish to produce alternative accounts, thus enriching debate in a representative democracy (McPherson, Raab and Raffe, 1978). In 1978 the President of the Royal Statistical Society (RSS), Dr Henry Wynn, made "Freedom of Statistical Information" the subject of his inaugural address:

"The foundation for active and intelligent public debate and participation in policy-making is the release of information by Government. The pub1 ic has a need for and a right to this information, just as much as the Government it elects. It needs' the information to judge the performance of Government and to estab1 ish new areas of action."

Access to micro-data in manY instances impl,ies access to data in machine­readable form, in order that further statistical computation is feasible and cost-effectiv~.

This is clearly consistent with the sentiments expressed by a former head of the Government Statistical Service (GSS), Sir Claus 110ser, on the occasion of his presidential address to the RSS in 1979:

"National statistical offices will increasingly be p"essed to make all the data on which policies are based, and by which they are judged, available to the public at large. Total openess will be expected, partly because there will be greater public participation in decision-making and partly because iJllil<1ti~lIce w ilh secrecy is 1 ike1y lo grow.

"Th1s is all to the good and justif1es our efforts in recent years to disseminate all our data (except in the rare cases where secrecy constraints stand in the way) openly and fully.

Moreover, the pub1 ic of the 1980's will be better educated and will more consistently challenge the decisions of government. They will expect to monitor government success

Page 76: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r [

r­[

r

r r [

r L c. [ ,

[

(

L L L L

and w ill ex p e c t rea d i 1 y ace e s sib 1 e, con v en i en tan d intelligible statistics with guidance on quality of data and on meaning ...

"The aim must be to ensure effectiv·e marketing of priced statistical information and dissemination of free material; to provide a well-publicised clearing house service for outside enquiries; and to prepare· for publication such digests, guides and news bulletins as are best organised centrally; to organise fr-ee paid publ icity for the encouragement of wider use of statistics and improvement of public understanding; to develop channels of communications with intermediary bodies, such as the press, libraries, professional organisations and publicity. Many technical developments can help in the various dissemination tasks, including Prestel, YDU's linkages to data banks, etc.

"Data should be made freely available to academics so as to pennit research which, in turn, can result in better statistics, better teaching and more suitable recruits for the GSS."

Modes of acceSS/dissemination

Wetzel had spoken, as a data producer, of the dissemination of statistical summary data and micro-level data on magnetic tape. Consider next the modes of access that suit data analysts. These include:

a)

b)

c)

d)

the receipt of the entire (annonymised) data file, for a survey (say), on magnetic tape (or some other medium, e.g. floppy disk, video disk, etc) for mounting on the end-user's local computer.

the despatch of summary statistical records, again on magnetic tape, OJ' like, for local access.

the receipt of some sub-set, or extract file, of (a) or (b).

direct on-l ine access of the enquirer's computer terminal to the computer hosting the database, typical1y to a data library.

e) 'resident' or- .... ·J),.iviit:uy~u· dt:(,;t:~~ to·the-ti·dt-cl--i-i-I-~~-,- d"i:-tiT~-­

producer's site or through some appointed gatekeeper.

f) . contracting with some intermediate agency for specific analyses, e.g. speCial tabulations, maps or other display material.

In Britain we have major uses of each mode. For example, the Labour Force Survey and Family Expenditure Survey are distributed as complete data files (less identifying information, of course) for secondary analysis. Data from the Population Censuses are widely distributed as cell totals in tables for each of several (small) areas i.e. as summary statistical records. The Census Offices also provide sub-setting services (c) and are able to provide, at cost, special tabulations (f). OPCS and the Social Statistics Unit, City University are funded, by the ESRC, to offer 'resident' or 'accompanied' access to the a Longitudinal Study (combining data for the same individuals from a 1% sample of the 1971 and 1981 Census, and from the NHS Central Register). The Scottish Education Data Archive, in its earlier operations also offered this mode of access.

I~ I~Tf~e are a few examples in this country of direct on-line access to public data residing on a computer other than one's local computer but this mode is becoming more widespread. A good example outside this country is the

Page 77: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r ['

r [

[

l [

L r L L L L L

use of the CANSIM database of Statistics Canada. This contains more than one third of a million time-series and many cross-sectional data-sets. -Originally, in 1968, it was intended for use by the Statistics Canada staff; in the early 1970's access was extended to other public agencies; and in 1974 it was made available to the public at large, latterly through independent secondary distributors (II.M. Podehl, 1982, Hamilton, 1982). The example of direct on-line access in this country with which I am now most involved, is that provided to academic research workers for access to the Population Census small area statistics, held on-line by the Regional Computer Centres. For example, the census data at the University of Manchester Regional Computing Centre is accessed from up to 32 different academic institutions and the CAST/EUL Data library Service, at Edinburgh about which more is said below, provides access to census, time-series and cross-sectional survey data, on the computing network of the Edinburgh Regional Computing Centre, principally to the academic staff and students in the central belt of Scotland.

The use of on-line access from 'remote' terminals has several advantages. First, the researcher need not go through the delay and considerable trouble of sending for a magnetic tape in a format suitable to the local computer (and these formats do vary widely), hilving the tape successfully mounted by local computing staff (which is not always straightforward). "making back-up copies to protect against catastrophe, and (in some cases) designing an analysis file. Moreover, the (Data Library) host computer often provides software that is particularly well suited to the database of interest, and there are often knowledgeable personnel at the end of the telephone line who Can provide assistance." (Meyers and Rock'''ell, 1960.) An important feature of data libraries is the ease with which the data analyst is able to combine access to the relevant source database with the use of appropriate software, for retrieval, statistical analysis and high quality output of results - as tables, graphs or maps, for ex~ple.

One may contrast the apparent freedom and advantage associated with either magnetic tape distribution to the local site or direct on-line remote access (to a data library), with the commissioning of statistical analysis from the data originator or some other data centre. This is the most restricted mode of dissemination in that the "process is inflexible, the researcher cannot easily follow-up clues as they turn up in the data ... ; is often slow and expensive .•. ; .. and is subject to misunderstandings that can increase cost and delay... In addition, the researcher is always a step removed from the research", which may result in a loss Of understanding about the d~tabase. (Meyers and Rockwell, 1980). However. this bureau service, if fast and relatively cheap, can be cost-effective for the experienced analyst and perhaps the only practical alternative for those who do not want to use a computer or conduct their own analysis.

The development of data libraries

The development of on-line information services takes advantage of the new computing and telecommunication technology through appropriate application software. However, future data library development will depend critically upon the combined efforts and expertise of three sorts of practitioner. These are the Reference Librarian, the Applied StatistiCian, and the Software Engineer. No particular order of importance is implied, although as data libraries mature in this country, the Reference Librarian may become the first point of contact for users, either 'over-thE-desk' or 'across-the-network'. We should expect the Reference Librarian to aid the user in the search and location of a data set which meets the objective of the. research; that the expertise of the Applied Statistician will be in the summary, and sense, which is made of the information in the data sets; and th"t the Software EnOinppr will support both the Reference Librarian and

Page 78: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r [

r-

r r l L [

[

[

[

L [

(

L L

L L *\

the Applied St~ti5ticiiln by providing ilccess to the medium on which the data are held. The Software Engineer also supports the user directly through software packages for reference purposes and for stdtist~ca1 analysis, and by implementing networked (te1e-) communications.

:..t/.. 0"-) HOlolever, to date, the history of data M-chiv).Rg and data libraries, in \, CMada and the United States as well as ~,61as been one of initiative on -k the part of quantitative social scientists:

'Data Archives and because of the ease of transmitting machine-readable data, local data libraries, have existed for almost 30 years. They were initially established by social scientists to provi"de wl.despread and economical ac~e.~ tu data for secondary analysis •••• .' (Judith Rowe. 1982) •

In thi~ c he Social Science Research Council began funding the SSRC Data Ban 1967Jas an archive at the University of Essex. In 1972 it changed 1 a~to the Survey Arch i ve. Although not the fi r 5 t data ar'dliv~ ur ils k i",J, tt,. funding of the Archive at EHex was an innovat ive move for the (then) SSRC to take; it is only regrettable that since then the Archive has had to facQ. thl! rQpeatQd uncQrta'intiQs of grant funding reviews. In the mid-1970's, there were a number of ather machine-readable data archives created. On commenting on the founding. of the Rand Corporation's Oata Facility in 1974, which was set up as a central clearing house for the acqUisition, dissemination, control and storage of MRDFs, Jacqueline McGee suooested "that many other similar large dat.~ r.ollf.r.tion projects elsewhere had led to the establishment of today's archives· (IASSIST (4) 1982). About this time, in 1976, the Australian Consortium for Social and Political Research Inc., (ACSPRI) was formed to act as a clearing house and promote the interest of data libraries in the universities. In 1977 the Princeton-Rutgers Census Data Project also lead to the formation of an archive.

.It is acknowledged that any serious history of data archives would conSloer tne contr1butlons made by the ~oper Centre (founded in the post­World War Two period at Williams College, Mass.), the Steimetz Archive (A-nsterdam, circa 1960), the Zentralarchiv fur Empirische Sozialforschung (Cologne) and the Inter-university Consortium for Political and Social Research (Hich igan).-

About the same time as the Survey Archive was established, much influenced uy th" Depdrlmenl of Government at Essex, the Department of Politics at Strathclyde University established a library of machine-readable data files as part of a Data Processing Unit. From about 1971 this became the Social Statistics Laboratory, but still as part of the Politics Department with the principal remit of serviny members of staff in that department and allowing the Department to offer interesting data in its undergraduate and postgraduate teaching. There is other evidence that points tO'Nard pol Hical scientists as thedrTYlng force In the U.K. during the 1970's the interest in quantitative methods and survey research did not have the same effect on Sociology, (Quite the reverse), except in one or t>lO places. For example. in 1975 the Centre for Educational Sociology, UniverSity of Edinburgh, brought together three large national surveys of leavers from secondary schools to create the Scottish Education Data Archive, through which the data were made available to all with an interest for their analysis. It has since added data from the surveys it regularly conducts for the Manpower Services. Commission and the Scottish Education Department.

There was also interest shown by the geographers, this time in the analysis of the small area statistics from th~ 1971 Population Census. Although

D ,.1-- i.d,"l

e.-\-- () C-.::

Page 79: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

[

[

r I r r C'

["

[

t, L

L L L L L

there were cons1d~rable delays before the release of the machil1e-readable data from the Census, a number of research groups established local data facflities to store and access these data. These included the Census Research Unit at Ourham and, as all initiative with the University Library, the Data Library in the Program Library Unit (PLU) at Edinburgh. The Data

.. Library had to some extent already been in existence as part of an earlier PLU/Geography Department initiative to hold copies of the annual Agricultural Census.

The emergence of data archives and libraries in the U.K. has been far from planned. First. we "do not have a national data archive of a form Similar to some other European countries. and the development of regional or local archives situated in univerSities or similar 'institutions is poorly advanced" (T. Jones, IASSIST Proc. 1982). Second, as has been the case internationally for data archives and data libraries, "they usually existed outside the Library and were totally lacking in library procedures for collection, management or bibliographic control" (Judith Rowe. 1982). Third, development of large scale secondary analysis in the U.K. has b~~n closely associated with the attitude of particular governmmnt statistical offices, and in particular with the public release by the Census Offices of magnetic tapes containing the Small Area Statistics.

It would be instructive to consider the different\ strategies adopted by Census Offices on either side of the Atlantic, and the effect that government. and the Amer i can concept of the pub 1 i c doma i n has had on da ta library development.

Documentation

Documentation is an elastic term, and for machine-readable data files the term covers a wide range of fields. Dodd has drawn attention to th~

. potential value, for cataloguing,' of printed abstra'ctsto describe machine-readable data-sets. She also notes that "by examining the abstract a reader should be able to determine whether or not the file's complete technical documentation is needed or wanted." (5. Dodd, 1982). Unfortunately there is little agreement, or indeed discussion. in this country about the content or need for such abstracts. In addition to other bibliographic information, the abstract include entries for the unit of analysis, source of information, universe or description of target population. type of sample and sample design. date or period of data collection. number of variables, observations or records. etc. For sample survey data. one might want to add (intended) target sample size and achieved samp1e size; highlight the need for weights for unbiased estimations (e.g. to counter non-response bias); special procedures for standard error computation (e.9. where the sampling scheme differs from simple random sampling); and sampling fraction. There is clearly scope here for a contribution from app1 ied statisticians and survey methodologists.

The data analyst requires this information about a data-set in order to make sat isfactory use of it. These requirements made demands en the d~ta producers, but they are demands which are quite consistent with the wishes

Page 80: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r-

r [

r r r [

r [

[

[

r L

(

! L L L

of data producers' that the data they have generated are properly respected in the secondary analysis. The quality of the documentation is the most critical aspect for secondary analysis:

"In the absence of good documentation an otherwise valuable data set 1s unusable for all practical purposes. Documentation tells the researcher what is in the data files and how it is organised and structured, how the d~ta are coded, and how they may be accessed.... (T)he potential user must examine carefully for such surprises as the use of numeric codes (such as 99) to indicate missing data for one variable and valid for another; a screen question that totally alters a meaning of an item; a question that did not work, producing unrel iable data; and so on." (Meyers and Rockwell, 1982.)

The Importance of Cataloguing f1ROFs

One of the first Action Groups to be set up at the series of meetings held in Edinburgh during 1976 to formally establish IASSIST (International Association for SOCial Science Information Service and Technology) was on classification. In the six years prior to that, a sub-committee of the American Library Association had been meeting on the matter. The SSRC Survey Archive, which had played a major part in the formation of IASSIST, ·reported that "the problem of efficient .cataloguing of information •.... is unsolved" (Tannenbaum, 1976). The book Cataloguing f·iachine­Readable Data Files, by Sue Dodd who was involved lii1lie-pr-e1iaraToi'Twori< just ment loned, has been described as "this long-awaited contribution to cataloguing practises (which) moves MROF to bibliographic legitimacy at last" (IASSIST Newsletter 7 (2) 1983). In that work she asserts that:

'Information stored in a computer-readable form will soon become a legitimate library resource available to those patrons who need it. Cataloguing data files and computer programs is a first step. Several academic research libraries (Yale and Princeton Univenities and the University of British Colombia) have already taken" this step, and OCle (On-line Computer Library Centre) may be the first bibliographic utility to incorporate the newly formatted MARC (Machine-readable Catalog) format for machine-readable datafiles into its network system.' (Sue Dodd, 1982.)

The ancillary documentation accompanying a computer file is important in cataloguing and in the absence of adequate information about the machine­readable data file, from the computer file itself or its carrier, it may be the only source of information. A sub-committee of the American Library Association reported:

"while the item in hand is usually the primary source of information in cataloguing, a machine-readable data file often affords little that provides useful bibliographic information. In contrast to that of some other non-book materials, the MRDF container is frequently not labelled at all or it has a casual or unrel iable label. For these reasons, the most useful and reliable source of information are usually found in documenation external to the HROF and its container." (ALA 1976).

Page 81: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

['

[

[

[

(

[

L [

[

L

L L

L

Dodd reports that in the absence of "an internal us~r label, documentatIon becomes the chief source of information and plays an important role in the cataloguing process." Documentation may be produced concurrently with the development of a particular file or program, or it may be revised and re-issued by a distributor once the data file or program has become an archival or marketable cOPi and made available to the general public. In either case, documentation should consist of the following sflctions: titl~ page and related informat"onj preface or introduction; processing summary; a list of variatesj and appendices. The title page should contain information sufficient for the identification of printed documentation, the canputer file (I~RDF) being described, and orfgin!; of both. Thi; approach could embrace statistical tables also.

Analogous to the Cataloguing-in-Publication (CIP) scheme, Is a possible Catalugu'ing- in-Soun;~ s"h~lIIe for th~ MRDFs whereby major data producers can provide cataloguing information at the early stage of a file's development. Clearly this is a situation in \,hich the Government Statistical Service, the British Library, the Inter-Universities Sofh/are Committee and the Research Councils could play leading roles. Dodd reports that the National Opinion Research Centre (NORC), the U.S. Bureau of the Census and the Institute of Social Research at the University of Michigan, have each in i t i ated such a sch erne.

Similar comment could be made on the promotion of a scheme for identifying th" r.orrp.r.t bibliographic reference by which secondary analysts of data might cite in publications. The editors of learned journals could be encouraged to take a lead here. •

A Union Catalogue for Data Libraries

Bibliographic control over the data files themselves would be a major step, but so would the proviSion of an easily accessed on-line catalogue of 'public' databases:

"In recent years, there has been a strong movement among data activists towards standardising documentation and increased bibliographic control of MROFs with the hope that, ultimately, international union I istings of avai lable data may be produced." (R G Jones, 1982).

"What is needed is a union catalogue of all known disseminators of MROF, and some efficient means to access information on what n.ew data files are being created. The movement by ICPSR and the Roper Centre towardS on-line remote access to their inventories ;s a major step towards information retrieval." (L G M Ruus, 1980).

"If we had to choose one function that was more important than any other, I suppose it would be the union listing of data reSources." (M Nastir, 1982).

With the advances made towards the creation of cataloguing standards, networked communications and a Registrar for Data Protection this is a practical consideration for the UK.

Page 82: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r

[

r r [

(

[

[

[

l L

L

L

eanc! us ion

"There is a great deal of numeric data already on-line, but Europe, and more especially the United Kin~dom, has made little contribution to this abundance" (Collins I982). A major uncertainty haunts the user when a desired piece of statistical information is not found. Is this because it does not exist, or because the source has not been discovered?" (Hamilton 1982). The time may now be right to press for a cooperative venture wh1ch may include the following priorities:

(1) Agreement and adoption within the UK of a cataloguing system for machlne-readable data f1les (MRDFs) and computer f1 les 1n accordance with AACR2, see S Dodd (1982).

(2) Agreement and wide-scale adoption within the research community (broadly defined) of conventions for MROF Abstracts.

(3) The compilation and provision of onp. or morp. nn-linp rAfprpnrp database containing the machine-readable (MARC format) catalogue of MRDF whieh may be accessed via data libraries, and through the existing provision of on-line bibliographic search faei lities rllrrgnfly nffQ~GA hy ~~/~rnn~~ 'ibrar4Q~.

(4) Subject matter search facilities for MRDF, including provision for indexing on variate-names ('variables') and quest,ionnaire items.

Ta ilplece

When I set out to write this for the Committee of Librarians and Statisticians .1 had intended to· include some comments·· on ·computer networking and on the development of software for data libraries. This would have provided an opportunity to illustrate these comments with examples from the Oata Library Service supported by the Centre for Appplication Software and Technology (incorporating the Program Library Unit) and the Edinburgh University Library. In part fulfillment a description of the services of the Data Library is set oul in Appendix I. Appendix 2 contains a brief overview of SASPAC, the access software written for the 1981 Population Census small area statistics, which was to have been a prelude to a discussion of access software for tables held in

. machine-readable form. This software would of course differ from the more conventional statistical analysis packages which are used to produce tables from individual level data.

Page 83: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

L

References

American Library Association (1976) "Final Report of the Cataloguing Code Rev i s i on Corrm it tee"

,ERIC ED 11972r

Sir John Bareham (1982) in written reply to "The R~ynflr rpvipw nf Gnvprnmpnt Statistical Services"

(G. Hoinville and T. M. F: Smith) Journal of the Royal Statistical Society (AJ 145 (2) 1982

8. Cronin (1981) "Databanks", AsLib Proceedings 33 (6) 1981

Maria Collins (1982) in Proceedings of the 30th Annual Study Group, Oxford Librdry Associdtion Reference. Special dnd Information Section

Sue Dodd (1982) Cataloguing Machine-Readable Data Files: An interpretative manual Amerlcan library Associatlon, Chlcago 1982

Geoffrey Hamilton (1982) "Access to Statistics: A Survey and call for act ion" in Proceedings of the 30th Annual Study {;rotlO, Oxford library Association Reference, Special and Information Section

Catherine Hakim (1982) Secondary Analysis: Social Records: a guide to data sources and methods wlth examples -----­Contemporary 50C131 Research (5), Geo. Allen

R.G. Jones (1982) "Estahlishino M 'AII~traliM ~od.al Science Data Archive; Progress and Plans"

T.W. Jones (1982)

IASSIST NeWSletter 6 (1)

"Recent developments in the availability readable data in Britain" IASSIST Conference Proceedings 1982 .

of machine-

A.F. McPherson, C.D. Raab and D.J. Raffe (1978)

"Social explanation and political accountability: two related problems with a single solution"

paper presented to Annual Conference of British educational Research Association. Leeds, 1978. (Also found, in part, in Chapter 17 of J. Gray, A.F. /-1cPhersonandD. Raffe Reconstructions of Secondary Education :Theory, Myth and Practlce Slnce the War, RRp 1983

D.E. Meyers and R.C. Rockwe11 (1984) "Large-scale Data Bases: who produces them, how to obtain them, what they contain" in D.J. Bowering (ed) Secondary Analysis of Available Data Bases, PE22. JOSSle Bass Inc. 1984

Page 84: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

L r [

[

[

[

L [

l L l. l L

Sir Claus Moser (1980) "Statistics and Public policy: The Address of the President'. Journal of the Royal Statistical Society (A) ~ (7) 1980

Judith Rowe (1982) in forward to S. Dodd op. cit.

E. Tannenbaum (1976) SSRC Newsletter July 1976

J.R. Wetzel (1982) "Data Dissemination Policies and Practices of the U.S. Bureau of the Census". IASSIST Conference Proceddings. 1982.

Page 85: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r r r r (

\ [

r [

[

(

[

[

[

(

L [

l L

.. \,-- _.

APPEflDIX 1

Edinburgh University Data library

Introduction

The Data Library at the University of Edinburgh offers the research comnunity on-line access to facilities for the analysis of census, time­series, sample survey, and related data. These data are held on the Edinburgh Regional Computing Centre's network of ICL 2900 and VAX mainframe computers, in an environment that is well suited to (remote) multi-access interactive computing, and is rich in software for statistical analysis and graphical display. The 'official statistics' data holdings of the Data Library include:

A very full range of (small) area statistics relating to the 1971 and 1981 Population Censuses for Scotland, England and Wa 1 es ;

A series of Parish summary and grid square data from the Annual Agricultural Census, for England and Wales;

Survey data, Including the General Household Survey, Scottish School Leaver Survey, British Social Attitude Survey, Women in Employment Survey;

The CSO Macroeconomic Databank;

The Scottish Input/Output Tables for 1979;

The OS Gazettee~; .

The Postcode Directory for Scotland;

A full range of digitised boundary files.

Information about thQ holding. of thQ Data Library arQ h&ld in an on-lin~ view system, DATALIB. (CALL ED.BUSH (or some other locally determined call sign) and then enter OATALIB (to the prompt User:).) OATALIB and the holdings themselves are accessible on-line by users from the University communities of Edinburgh, Glasgow and Strathclyde •. and by a range of users across the PSS/JANET computer networks. or across the telephone network (using an accoustic coupler/modem). In providing an on-line data library service, the Data library relies upon the expertise, facilities and staff of the Edinburgh Regional Computing Centre (ERCC). the Edinburgh University library (EUl) and the Centre for Applications Software and TeChnology (CAST). It also draws heavily upon the past experience of what )/as known as the Program library Unit (PlU).

The Data Library and PlU

The Data Library formally came into being in the mid-1970's in order to provide computing facilities and access to the small area statistics from the 1971 Population Census (Scotland). The machine-readable data from this Census were purchased for Edinburgh University's academic research and teaching community by the University Library. The data were managed and were made accessible on the ERCC's network of mainframe computers through programs·written by the staff of (what was then called) the Program Library Unit (PlU). PLU was a specialist unit within Edinburgh with a national role for the conversion and maintenance of statistical packages for ICL computers in universities. It had also gained considerable experience in

Page 86: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

[

r (

\ [

r [

[

[

[

L [

l [

L L L

writing census-access software and had familiarity with computer-aided , ............ i,,':1 "VIII """, t.. VII t..J.c; Ulllluul "''::I' l\ou1 LUI u, el:;lI~U~ (I!II~'!,I"~ u .. ,J WU''I;~)

which .it had carried out in association with the Department of Geography. (The mapping programs, CAHAP and GII~MS are in widespread use today.) Four programs were written for accessing the 1971 Population Census, together with DATAC, a user-friendly interface designed with the novice computer user ('whose expertise lies outwith computing') in mind.

The PLU played a significant U.K. role in the dissemination and promotion of secondary analysis of the (small) area statistics produced from the 1981 Population Censuses. First, the application software that was commissioned by LAM SAC (Local Authorities Management Services Advisory Council) for the retrieval and manipulation of the machine-readable tabular output from the Census was deSigned and written (under subcontract to Durham University) by staff at the PLU. The SASPAC project, as it was named, 'Ion the British Ccmputer Society's Social Benefit Award, and is widely used by local government officers and academic research staff.

Second, PLU played an active part in the two consortia who, on behalf of the academic community, purchased the 1981 Population Census statistics from the offices of the two Registrars-General. In particular, PLU formed the consortium to purchase the data from the Scottish Census, and arranged for these to be deposited with the ESRC Data Archive for general distribution. Later, as part of the Inter-University Software Committee (IUSC) '5 working party on census data. the Data Library at Edinburgh became one of the six regional and national computer centres to act as a census da ta I ibrary for academ i c res earch purposes. The Genera I Reg i s ter Offi ce (Scotland) also granted a Census Agency Agreement to the University of Edinburgh in order that the Data Library could provide services to commercial user~, Including academics conducting contract research and policy analysts in central and local government. The Oata Library can provide similar commercial services for the Annual Agricultural Censuses for Scotland and for England and Wales.

CAST, EUL and ERCC

The Centre for App 1 i cat ions Software and Techno logy (CAST) was es tab I i shed in 1983 as an outcome of the review of the Program Library Unit. occasioned by the retiral of its Director. CAST now has about 40 staff: in addition to what are now referred to as the Data Library Service and the Program Library Service. CAST also has ~xpertise in database design and management. numerical algorithms, statistica" computing, graphics, application software evaluation, conversion and development (for mainframe and micro computers) and survey methodology. (CAST continues to provide some software services under the name PLU.) The Data Library can therefore call upon specialists for advice or project work on a range of relevant activities. In particular, the provision of access software with attractive user interfacing (which has for example been written for the CSO Macroeconomic Databank, the Agricultural Census and the as Gazetteer) is central to the development of the Data Library, which is why the service is housed at CAST.

The Library of the University (EUL) is one of the major university libraries in the U.K. It has about 200 ful I-time and part-time staff. It has a large reference collection including sections for official statistics and for maps. It is currently undertaking the on-line cataloguing of its holdings, and makes extensive use of on-line bibl iographic search facilities (including DIALOG. ERIC etc). It has begun to grasp the nettle of cataloguing machine-readable data files, and many of the computer terminals installed in the library for use by its 'users' are terminals also connected to the local computing network. 80th the official statistics and the map collections provide necessary and complementary

Page 87: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

r [

r [

[

r [

[

[

L [

[

f l I l L L

,;

Pr'ovlsl0n to the Data Libra .. y. Fo .. example, r'esear'chers vfishing to use the small area statistics from the Census may consult the maps in the map referenc! section in order to discover where enumeration district boundaries lie, say, and consult the published tables from the Census in the statistical.reference section prior to conducting an analysis of the maChine-readable small area statistics, and per'haps pr'oducing a high quality schematic map of their' own.

The Edinburgh Regional Computing Centre (ERCC) was founded in 1966, and has maintained on its network of mainframe computers an operating system called the Edinburgh Multi-Access System (EHAS) specifically designed to provide an interactive computing environment for a scattered population of users. This form of provision was considerably in advance of the 'star' network ~yst,,"s which w .... ~ loit~,· 'd~v~lup~u "l tlll~ "aliullal duu l'~ylu'H,l sit~s. Thi ERCC therefore has conSiderable expeden~e in both interactive (and batch) computing, the provision of on-line help information and exper'lence in communications between different mainframes. The EMAS operating system can also claim to be a particularly 'friendly' operating system, '.hen compared to the alternatives available, espeCially for the first time user from another computing site.

General Institutional and Technological Environment

The University of Edinburgh, with over 1500 tea~hing and research staff and over 10,000 students, has a widely dispersed campus. It is partly for this reason that Edinbur9h has a tele-communications network that is arguably one of the most advanced in UK universities. The EDtlET network is a multi­mode packet switching network linking EMAS with VMS and UNIX operating systems on a range of mini-computers. The network also offers access to a central (disk) filestore, printing and graphical facilities, an electronic mail service, and a (re9ional) gateway to the British Telecom Package Swit~hing System (PSS) and the UK academic network (JANET). This is shown in the figure below. ER~C is now experimenting with integrated speech and data facilities on its network, evaluating the merits of an early access to the full ISDN (Integrated Service Digital Networking) being piloted by BT in London, Man~hester and Birmingham.

ERCC and CAST (together with the Artificial Intelligence Applications Institute and the Wolfson t~icro-Electronics Institute) form the 'applications and technology transfer' ring about the University's School of Information Technology. The latter is a federation of the Departil1ents of Artificial Intelligence, Computer Science and Electrical Engineering; each of which is an established international leader in IT, and which together are attracting inward investment to 'Silicon Glen', The 'applications' ring will benefit from the fundamental work which is taking place on Intelligent Knowledge Based Systems, 'intelligent front ends', the relation between databases and programming languages, graphics, digital communications, and chip and circuit design.

One area of application for these technological and engineering advances lies in the field of the retrieval of reference information. The University library has embarked upon a full automation programme, using MARC format records. Special ised computing equipment is integrated into the ERCC communications network, thus making possible the world-wide access to the catalogues and reference material. Access in the reverse direction to such on-line bibliographic databases as Dialog, Eric, Blaise, etc, is already part of the library'S service. Another area of application, perhaps the most widespread and mainstream use of computers, is the retrieval, analysis and display of numerical imformation. This is the area of application software to which CAST makes its contribution, most obviously in the fields of data base management systems and statistical analysis packages, where it ~rovides evaluation and conversion ser'vices (of

Page 88: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r- r-

GEe .... , .......... ....

Oa.lWUo'yJANET PSEGEC"~

.. ee l .... "'10\1· .. .....,og.

r---'

e~rcpPOP.lll0

r---"

-­a'~f oec.eotJs

~

.r- r F..:hIlCUO'!1"/ f'wtlIodI~TCPP~I"Ot ~A_It'ftC>I"J~~""::!/

UlII'IIh~~ICPt>()PI.(13 ~~ ~alEdortIu~~CAMn:C"MJ ~

fACCUllSHCA""reCPAO L/./"Y" SCHIt'~I..wCAMIECP"O

SaIIhtAlII1.h'uN~t 1\Q'_'~Ii"'!lu,,,,,,,.,., U;PP(M'!"~)

" .... 0II~1''f C ...... ' .. '1oIOfl Noo\hcl'I'I n"wl\ICh SIIIlo(;lO rCI' 1"fll'11.;:1

w.. •• ttl IW <.lllUftt.lColIIOO-", "'A!Ift( ......... 1 eM ... 1 ~C 1'0 .. 1)

r---" r-"'

""""­BtMIII<fta..--1!on

!

r---, r-; !---, ,"""""'" ,...--" ~ -----, '~ -----, --,

Edinburgh Regional C?m~uting centre" EDNET Communications Network

No i:ONET hoel' .r. ~bIe a:JNfTh~. \MS (U".' e= fClPM-e-.w1 : ..,.. 8tnistl TelKO'IIs ~pss HInI~""'. 33 j2'6.: ; ".d.,,~ I s-k. 1_ Au$I~', c.nacs..' ~~ )4 (66. ; ~.,....a...w.,.. . fllKlod.F'..-.c-.G.mQfIY.I"..... ~~~~ '521 (1253 •• ~"';" I I .H,p~1'I, ~. $00--. Uro.fMt.... t61. .:.:.:.:: tiMlSytI.-N , SMuwnd.USA.,". Or.p""*,, (18.' .•••••• ,

~~.El)NET~N.". D' at:IQHI.IOMW1.itl""""OOIInM,, ~ ....... I5IheUJe .... her.ac-'ClIllllO; IAel!wMSp

-"'" --....... C>~~ .....

ommulliry .... ~~ Is '"'''' M ioin'.adtnoor-..roM)lI;(JAttEl) • J.F.L~"'If'4I' ~I"S ·~.SC......,'OfI.""'I .. Ioft .. ~ttD

P.,.~' ... ,.II M.,.ll .. I· ...

... ~.~~., ... ::.:,. ..... ,. ~I ..... M'I

r EIlNET I

Soci~~ TCPt>Of'lVIO ROY:i'~ POPI • ..::o EACCA.T.3 TCPPOrl104 EACCA.r.Z rCPPOf"I'n::li ARCCA.I.' ICPPOPum fACe T,IIMOQI CAM.1ECfl'AD ,000,,-.. ICPPflt>ullo • EOCCu«)fQes.. TCPPo..:'ll.\ll OIdCOlI. CAMIEXPAQ

BoISo11"*US ....... c.wn::cp...o M~onl'bf'''' leppOI'll .... E.rtkIMlibt.y lCPPOPtl."2)

""""'-"'-" IO>PUf"llI~l

eo.r.o.lIe,s.c;­'CPPOPI"'!)

---, ---,

Page 89: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

r r

r. [

L [

[

L r. [

L L.

packages originating from the university communities of the U.K. and U.S.A.). the design and development of its own products. and other collaborative projects, such as the Computer Board's graphics initiative (with Salford and Leicester Universit.ies) and the 'computers and teaching' initiative (with the Departments of Economic and Social History and ~!eteorology) •

Current Staff

The staff for the Data Library Service are drawn from the University Library \EUL) as well as from CAST. Staff with major responsibilities, and who regu arly spend part of their time.in this area, include a manager, a senior computing officer, an administrative/computing assistant, a librarian and three computing officers, the latter with specialist responsibilities for programming, computer-assisted cartography and spatial analysis. These staff have skills in database management, statistics, survey analysis and design, cartography and have experience of work in the social sciences, Government and the physical sciences.

Work-in-progress

The Oata Library at Edinburgh is currently undergoing a re-organisation in order to provide an expanded range of services. This is partly in preparation for the launch of a Scottish Data Centre, but is also partly in response to the wish to foster data library development more generally within the UK. The cataloguing of machine-readable data files has been identified as a priority area, along with the provision of an on-line union catalogue of data collections, indicating the existence and location of data collections. Some preliminary discussions have been held with Sue Dodd (who is willing to give two workshops in the UK this autumn) and with the staff of the ESRC Data Archive who (it is now learnt) have made considerable progress in the compilation of MARC-compatible study descriptions of the Archive's data collections. We also recognise that the development of data libraries depends crucially upon the existence of a national data clearing house, with secure long-term funding, and upon agreement on the data library/clearing house relationship. We therefore have an interest in promoting an organisation like the Data Archive at Essex.

One of the features of Edinburgh's Data Library is the provision of user­friendly interfaces to ~ccess software. We have also become conscious of the value of friendly interfaces to mapping packages, and are in the process of deSigning UHapIT which will, for example, create the commands to create a GIMMS map, using digitised boundary ~ata from a machine­readable library.

Funding for these activities comes from two major sources. First, the University provides finance for core staff. Second, CAST is able to generate revenue from various external activities. These include, for example, the data processing and access· management for the Edinburgh District Counci l's 'Homeless Survey'; consultancy to the Scottish Office, and analysis of the Agricultural and Population Censuses. We are also to seek grant-funding for the development of a more 'public' data library.

Page 90: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r [

r r r [

L

[

[

r [

[

[

t L L l

Appendix 2

As stated in the main paper ('Towards the Oevelopment of Oata Libraries in the U.K.'), the US Bureau of the Census actively sought to foster the development of data services in the private and publ ic sectors. Towards this end, it made available CEIlSPAC, softwore by which the public-use small area data files could be read in machine-readable form. In the UK the initiative for developing the access software for the machine-readable tables from the Population Census was taken outside the Census Offices. As a result, a variety of home-grown retrieval programs were ~Iritten at the different sites which took the machine-readable versions of the 1971 Population Census small area statistics. In Edinburgh, for example, a set of four programs were written to give access to the tables from the Scottish CenSus. Moreover, the data had to be pre-processed in order to unpack thQ fairly complQy' format in >Ihieh thQ doh >Iii rQ1Qa~od by GRO{S). Also written was a highly user-friendly interface to those programs for geographers (and others) who had little or no experience of computing.

Prior to the 1981 Population Census, LAMSAC (representing a consort ium of local authorities) lnvit~d t~nd~rs for a retrieval package for the 1921 Population Census small area statistics. Although the contract was i nl t ia 11 y placed at Durham Un i vers ity, where the Census Research Un it has been funded by the Social Science Research Council to advance computer­aided spatial analysis, the design and programming of SASPAC was subsequently carried out at the Program Library Unit (PLU), University of Edinburgh.

The SASPAC project won the British Computer Society's Social Benefit A~lard and SASPAC has been widely implemented. It is in use by local government throughout the UK, as well as by research workers and students in the universities and polytechnics. Somewhat ironically, it is also used within central government, although not yet, (to my knowledge) "ithin the Census Offices. .

The design of SASPAC ensured that it is robust and highly portable, in that it does not require conversion to run on local sites. In order to be truly portable, and require little more than a rudimentary FORTRAN compiler, SASPAC was written usim] SASTRAN, a high-level 'language' which produced a 'primitive' sub-set of (well documented) ANSI 1966 FORTRAN. Towards this same aim SASPAC was written as a set of serial-a~cess programs. It is written and distributed in a form in which theJ·e may be loca 1 enhancement, either to take advantage of local conditions (specific to machine and operating system) in a move towards local optima, or to incorporate extra user facilities.

An important but unintended outcome of the project was that· consultations about the input format for SASPAC had the effect of persuading OPCS to make the data portable, providing the cell counts in a standard and easily accessible form of machine-code ('character'). This meant that the data could also be read by other packages.

Page 91: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r

r r

r [

[

[

[

[

[

[

[

L [

L

L

There are two particular features of SASPAC which could feature in a general facility for retrieving machine-readable statistical tables. First, there are s epara te 'load' and 'ana 1 ys is' program.s, The Load program reads the data directly from the fi les on the magnetic tapes distributed by opes, in the original format, It then validates the data and 'saves' the data in a compact form, as a SASPAC system file. The ecomony in storage space can be considerable - as much as 80~. Restructuring can have the further advantage that retrieval by the Analysis program is faster. This would be especially true for a table retrieval package which stored and structured the data in a 'direct access' file.

The SASPAC Analysis program also includes faciT ities for some (limited) stati,tical computation, tabular output of the data, and for 'saving' a subset of the data in a separate fi le, either as another SASPAC file or as a 'matrix' of characters suitable for use with a statistical analysis package or a mapping program.

The second feature of SASPAC of relevance is the use of 'page-scanning' files. These contain the text which act as labels describing the content of particular tables. For the Population Census tables, these were also read from the opes magnetic tape, along with the numeric data, and 'saved' in compact form us i ng the SASPAC Load program. When pages of tab 1 es are required, via the SASPAC Analysis program, the texl "'IItj /lv",e,'I.: data pertaining to the cell entries for the relevant tables are merged, with the result that the printed tables which result are labelled intelligibly and correctly.

SASPAC was originally deSigned to access standard tables from the 1981 Population Census. However, the latest version, SASPAC4, can read some non-standard tables, and the range of machine-readable tabular data could be extended if the data were first put through a· pre-processor and reformat ted.

Page 92: Edinburgh Research Explorer · 2015. 3. 14. · EDINA, University of Edinburgh, Edinburgh. Link: Link to publication record in Edinburgh Research Explorer Document Version: ... PETER

r-• r . .

r [

[

c [

r [

[

L [

L L [

L L

L L

Table gntrl~$ (on mag. tape)

ego 198 I Pop. Census(OPCS)

"'ill~rpm~, (LOAD) .... ~ ........... "

--"--- -~ system­

flies

- -User's ~

system-flies

(ANALVS IS)

"Hard copy'

ego tcbles

Lflmrr Q[h@ rffilGr?f1rnrnmnmq, @®(!]l1lnmrnCll7@mrf1[1fllnrn,

.1' _..-"-' --' -- --PcgeSccn

table­labels

nm[ham0o, l:a)with or I1Jitllout pre-prot e,,\or