just for the record bibliographic data – where we were, where we are, where we’re going huw...

34
Just for the record Bibliographic Data – where we were, where we are, where we’re going Huw Jones libraries@cambridge

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Just for the record

Bibliographic Data – where we were, where we are,

where we’re going

Huw Jones

libraries@cambridge

“Data about Data”

our metadata

Is it Newton?

NO

Is it Voyager?

NO

Databases!

• UL and Dependents• Departments and Faculties A-E• Departments and Faculties F-M• Departments and Faculties O-Z• Colleges A-N• Colleges O-Z• Affiliated Institutions• Manuscripts

Hooke

Newton

Access Reports

Web Interfaces

Voyager

Where we were

8 databases

University Library: 4 M

Other libraries: 2.5 M

Data problems

Quality

Duplication

Quality - fullness

of 2.5 M records in our databases

1 M short records

Quality – coding

Duplication

Effects

• Difficulty in resource discovery

• Patchy retrieval

• Lack of authority control

• Difficulty with standard deduplication

• Burden on staff time

• Ties us to multiple database model

Where we are now

• Record sharing

• Short record enrichment

• Automated MARC correction

• Authority control

Record sharing

• Departments and Faculties A-E and O-Z moved to a record sharing model

• Drawing up of guidelines for Cataloguing

• Automated tools to change the ownership of 825,000 records

• Legacy duplication of records

Duplicates lists

Short record enrichment

Results

• Of 1M short records

• 200,000 records processed

• 106,175 records updated

• Will enrich half of our short records? 500,000?

Automated MARC correction

• Corrects MARC coding errors where it can do so without ambiguity

• In testing, 70,000 records processed in 2 days

• Over 200,000 errors corrected

Automated MARC Correction

How to get from this …

• =LDR 00472nam\\2200157\a\4500• =001 662002• =005 20071205064734.0• =008 071129s1985\\\\nyua\\\\\\\\\\001\0\eng\d• =020 \\$a9780961751111• =100 1\$aBroecker, W.S.,$d1931-• =245 10$aHow to build a habitable planet ;$cBy Wallace S. Broecker.• =260 \\$aNew York ;$bEldigio Press,$cc1985• =300 \\$a291p $bill $c23cm• =504 \\$aIncludes index.• =650 \0$aAstronomy.• =650 \0$aAstrophysics.

to this!

• =LDR 00453nam 2200157 a 4500• =001 662002• =005 20071205064734.0• =008 071129s1985\\\\nyua\\\\\\\\\\001\0\eng\d• =020 \\$a9780961751111• =100 1\$aBroecker, W. S.,$d1931-• =245 10$aHow to build a habitable planet /$cby Wallace S. Broecker.• =260 \\$aNew York :$bEldigio Press,$cc1985.• =300 \\$a291 p. :$bill. ;$c23 cm.• =504 \\$aIncludes index.• =650 \0$aAstronomy.• =650 \0$aAstrophysics.

Output

• Bib id: 662002• How to build a habitable planet ; By Wallace S. Broecker.• 100: UPDATE: Spaces inserted between initials in subfield _a• 245: UPDATE: By uncapitalised at start of subfield c• 245: UPDATE: Space forward slash inserted before subfield _c• 260: UPDATE: Full stop inserted at end of field• 260: UPDATE: Space colon inserted before subfield _b• 300: UPDATE: Full stop inserted after the p in pagination• 300: UPDATE: Full stop inserted at end of field• 300: UPDATE: Illustration abbreviation has been corrected• 300: UPDATE: Space colon inserted before subfield _b• 300: UPDATE: Space inserted between digits and cm• 300: UPDATE: Space inserted between digits and p in pagination• 300: UPDATE: Space semi-colon inserted before subfield c

Authority Control

• No authority control in libraries@cambridge databases

• Script written to identify unauthorised headings

• Used program to correct headings

Results

• DepFacOZ – 2,243 name and subject headings changed, affecting 41,944 records

• DepFacAE – 620 subject headings corrected, affecting 6,841 records

• Authority check incorporated into Bib Check program

Where we are

Fewer of these:

More of these:

Fewer records

Better records

Where are we going?

• One fully deduplicated database of full, well coded records?

• Catalogue will always be a work in progress

• Improvements to Catalogue important not only to solve current problems but also to support future developments

• Data exists independently of Voyager

• Future developments will rely on quality of data to work effectively– Pushing data out to i.e. discovery layers

(Primo, Acquabrowser), platforms (WorldCat, Talis Platform)

– Linking to data from outside i.e. RSS feeds, reading lists

– FRBR

• Mixture of automated solutions and traditional cataloguing

• Catalogue and the records it is made up of are useful tools for the discovery, location and use of our resources

• We will be ‘Cataloguing’ for a long time to come!