bibliographic data spring cleaning with sierra dna

21
Bibliographic Data Spring Cleaning with Sierra DNA Becky Yoose Discovery and Integrated Systems Librarian Grinnell College

Upload: becky-yoose

Post on 15-Jul-2015

87 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Bibliographic Data Spring Cleaning with Sierra DNA

Bibliographic Data Spring Cleaning with Sierra DNA

Becky Yoose

Discovery and Integrated Systems Librarian

Grinnell College

Page 2: Bibliographic Data Spring Cleaning with Sierra DNA

http://www.flickr.com/photos/alisonlongrigg/3641086760/

Page 3: Bibliographic Data Spring Cleaning with Sierra DNA

Table structure

https://secure.flickr.com/photos/37prime/750293493/

Page 4: Bibliographic Data Spring Cleaning with Sierra DNA
Page 5: Bibliographic Data Spring Cleaning with Sierra DNA

Bib

• Fixed fields• Title? (sometimes)• Author? (sometimes)

Page 6: Bibliographic Data Spring Cleaning with Sierra DNA
Page 7: Bibliographic Data Spring Cleaning with Sierra DNA
Page 8: Bibliographic Data Spring Cleaning with Sierra DNA

Generic Record

• Leader• Control Fields• Variable Fields

Page 9: Bibliographic Data Spring Cleaning with Sierra DNA
Page 10: Bibliographic Data Spring Cleaning with Sierra DNA

p07 p08 p09 p101 9 8 51 9 8 51 9 8 51 9 8 41 9 8 51 9 8 41 9 8 41 8 9 41 9 5 61 8 9 1

To make your life more interesting, the control_field table...

Page 11: Bibliographic Data Spring Cleaning with Sierra DNA

To make your life more interesting, the control_field table...

EXPLAIN

SELECT *

FROM sierra_view.control_field

WHERE p07='2' AND p08='0' AND p09='0' AND p10='1'

LIMIT 10

Total runtime: 123.627 ms

Page 12: Bibliographic Data Spring Cleaning with Sierra DNA

Column Data Type

Comment

record_id bigint Foreign key to record.

record_type_code char Record type code.

record_num int Record number.

varfield_id bigint Foreign key to varfield.

field_type_code char III field type tag.

marc_tag varchar MARC tag.

marc_ind1 char First MARC indicator.

marc_ind2 char Second MARC indicator.

occ_num int The occurrence number of the field among other fields with the same tag. Used when a record contains more than one field of the same type.

display_order int Integer to manage the display order of a list.

tag char Subfield tag.

content varchar Content of the subfield.

subfield_view

Page 13: Bibliographic Data Spring Cleaning with Sierra DNA

Column Data Type Comment

record_id bigint Foreign key to record_metadata

index_tag varchar The itag of an index string (e.g., 'a'=author, 't'=title, 'd'=subject, etc.) for an entry.

varfield_type_code varchar The tag of the variable-length field to index.

index_entry varchar The index entry string.

insert_title varchar A normalized form of the title used to sort index entries.

original_content varchar The non-normalized version of the index entry string.

parent_record_id bigintThe system-generated ID of the parent of the phrase entry's source record.

phrase_entry (selected fields)

Page 14: Bibliographic Data Spring Cleaning with Sierra DNA

Example: Typo of the day

phase_entry

regular expressions

+

Page 15: Bibliographic Data Spring Cleaning with Sierra DNA

Example of one off word surrounded by spaces

SELECT index_entry, record_key

FROM sierra_view.phrase_entry

WHERE

index_tag='t' AND

varfield_type_code='t' AND

type3='' AND

index_entry ~* '(^|\s)fom\s' AND

index_entry !~* '(^|\s)fom\ssic\s'

Page 16: Bibliographic Data Spring Cleaning with Sierra DNA

woodhouse 1615 a plaine almanackeor prognostication for the yeare of our lord god 1615 being the third fomleape yeare conta b1719695

countrey messenger or the faithfullfoot post communicating his vveeklyintelligence fom the severall parts of the kingdome a b1819566

multiple choices after school findings fom the extended service schools initiative b1439991

Which one?

Page 17: Bibliographic Data Spring Cleaning with Sierra DNA

http://www.flickr.com/photos/visionsbyvicky/3369136077/

Page 18: Bibliographic Data Spring Cleaning with Sierra DNA
Page 19: Bibliographic Data Spring Cleaning with Sierra DNA

$demo

https://github.com/GrinnellCollegeLibraries/typooftheday

Page 20: Bibliographic Data Spring Cleaning with Sierra DNA

Possibilities

• Authority headings

• Subfield misbehavior

• Series statements

• Others...

Page 21: Bibliographic Data Spring Cleaning with Sierra DNA

Thanks Questions?

[email protected] @yo_bj