bibliographic data spring cleaning with sierra dna

Post on 15-Jul-2015

87 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bibliographic Data Spring Cleaning with Sierra DNA

Becky Yoose

Discovery and Integrated Systems Librarian

Grinnell College

http://www.flickr.com/photos/alisonlongrigg/3641086760/

Table structure

https://secure.flickr.com/photos/37prime/750293493/

Bib

• Fixed fields• Title? (sometimes)• Author? (sometimes)

Generic Record

• Leader• Control Fields• Variable Fields

p07 p08 p09 p101 9 8 51 9 8 51 9 8 51 9 8 41 9 8 51 9 8 41 9 8 41 8 9 41 9 5 61 8 9 1

To make your life more interesting, the control_field table...

To make your life more interesting, the control_field table...

EXPLAIN

SELECT *

FROM sierra_view.control_field

WHERE p07='2' AND p08='0' AND p09='0' AND p10='1'

LIMIT 10

Total runtime: 123.627 ms

Column Data Type

Comment

record_id bigint Foreign key to record.

record_type_code char Record type code.

record_num int Record number.

varfield_id bigint Foreign key to varfield.

field_type_code char III field type tag.

marc_tag varchar MARC tag.

marc_ind1 char First MARC indicator.

marc_ind2 char Second MARC indicator.

occ_num int The occurrence number of the field among other fields with the same tag. Used when a record contains more than one field of the same type.

display_order int Integer to manage the display order of a list.

tag char Subfield tag.

content varchar Content of the subfield.

subfield_view

Column Data Type Comment

record_id bigint Foreign key to record_metadata

index_tag varchar The itag of an index string (e.g., 'a'=author, 't'=title, 'd'=subject, etc.) for an entry.

varfield_type_code varchar The tag of the variable-length field to index.

index_entry varchar The index entry string.

insert_title varchar A normalized form of the title used to sort index entries.

original_content varchar The non-normalized version of the index entry string.

parent_record_id bigintThe system-generated ID of the parent of the phrase entry's source record.

phrase_entry (selected fields)

Example: Typo of the day

phase_entry

regular expressions

+

Example of one off word surrounded by spaces

SELECT index_entry, record_key

FROM sierra_view.phrase_entry

WHERE

index_tag='t' AND

varfield_type_code='t' AND

type3='' AND

index_entry ~* '(^|\s)fom\s' AND

index_entry !~* '(^|\s)fom\ssic\s'

woodhouse 1615 a plaine almanackeor prognostication for the yeare of our lord god 1615 being the third fomleape yeare conta b1719695

countrey messenger or the faithfullfoot post communicating his vveeklyintelligence fom the severall parts of the kingdome a b1819566

multiple choices after school findings fom the extended service schools initiative b1439991

Which one?

http://www.flickr.com/photos/visionsbyvicky/3369136077/

$demo

https://github.com/GrinnellCollegeLibraries/typooftheday

Possibilities

• Authority headings

• Subfield misbehavior

• Series statements

• Others...

Thanks Questions?

yoosebec@grinnell.edu @yo_bj

top related