shirley rodgers james jackson sanborn

55
Enhancing Access to Databases – LITA Forum, Norfolk 2003 Shirley Rodgers and James M Jackson Sanborn Enhancing Access to Databases: MultiSearch and Database Relevancy the Integration of Two Collaborative Projects Shirley Rodgers James Jackson Sanborn

Upload: isaac-workman

Post on 03-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Enhancing Access to Databases: MultiSearch and Database Relevancy— the Integration of Two Collaborative Projects. Shirley Rodgers James Jackson Sanborn. Database Access Problems. Locating and selecting appropriate database - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Enhancing Access to Databases:MultiSearch and Database Relevancy—the Integration of Two Collaborative Projects

Shirley Rodgers

James Jackson Sanborn

Page 2: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Database Access Problems

• Locating and selecting appropriate database

• Multiple searches through multiple database interfaces resulting in multiple result sets

Page 3: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Old Database Approach

Access to databases was clunky and non-intuitive.

– Alphabetical list

– Subject lists that were long and also alphabetical

Page 4: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Old Subject Page

Page 5: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Multiple Search Problem

Database

interface

search

Database

interface

search

Database

interface

search

Etc.

interface

search

Page 6: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Multiple Search Problem

Patrons demanded solution– Old “Locate Databases by Keywords” – 79% of searches failed (>6k)

• geodesic domes• stem cell and optical nerve• goat milk spider silk• factors that explain marital happiness when

spouse lives in nursing home

Page 7: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Two Problems, Two Solutions

Database Relevancy Project

Page 8: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Database Relevancy

Goals:

– Intuitive display of databases

– Improved subject access

– Maintainable solution

– Leverage existing data

Page 9: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Database Relevancy

Plan:– Sort databases by relevancy within subject

area

– Provide additional information for databases ‘important’ to a subject area

– Automatically generate lists

Page 10: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Technical Details

Data:

– Drawn from catalog• MyLibrary Subject Headings (690 $x)• Descriptive notes (520 $a)• URL (856 $u)

– Three levels of relevancy assigned • Core, Narrow, Broad (690 $R)• Assigned at the subject level

Page 11: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

MARC transformed to XML using Perl Module MARCPM

Page 12: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

XML transformed multiple times using XSLT - processed through Saxon, called by brief Perl scripts.

Page 13: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Why XML

• Much easier to manipulate using XSLT than using Perl to directly manipulate MARC

• Simpler to use than importing MARC into a 2nd database and using ColdFusion

• Easy to test on desktop then move to production

Page 14: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Limitations of XML/XSLT

• Multiple versions of MARC.XML

• XSLT has limited string processing functionality

• Need Perl to handle multiple file generation based on hash value pairs

Page 15: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Detail of Record

Page 16: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Detail of Record

Page 17: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Page 18: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Page 19: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Page 20: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Collaboration

• Stakeholders brought in early

• Subject specialists from Collection Management and Reference – Gave input on “look and feel” issues and

functionality– Given final say on database relevancy

• Technical development in DLI and Systems departments

Page 21: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Two Problems, Two Solutions

MultiSearch Project

Page 22: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

The beginning of MultiSearch

• BlueAngel MetaStar for indexing in-house collections and GIS

• Wanted to learn Java and JSP

• Testing it with other Z39.50 servers

Page 23: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

The beginning of MultiSearch

• Prototype of cross-searching 2 major database vendors

• How many vendors support Z39.50?

Page 24: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How can I use what I prototyped?

• Static list of databases – subject and alphabetical

• Database relevancy pages created using XML/ XSLT

• JSP can access XML files

• The projects came together!

Page 25: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How is XML used?

• JSP Xtags can access XML files

• Subject pages use XML and XSLT to display information

<xtags:style xml='<%=xmlfile%> ‘xsl='<%=xslfile%>'/>

Page 26: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Databases listed using XML/XSLT

Page 27: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How is XML used?

• List of databases to search created from parsing XML file

<xtags:forEach select="//record">

<xtags:variable id="url856" type="string" select="field[@type='856']/subfield[@type='u']"/>

<xtags:variable id="dbtitle" type="string" select="./field[@type='245']/subfield[@type='a']"/>

Page 28: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How is XML used?

• List of databases to search created from parsing XML file

<xtags:forEach select="//record">

<xtags:variable id="url856" type="string" select="field[@type='856']/subfield[@type='u']"/>

<xtags:variable id="dbtitle" type="string" select="./field[@type='245']/subfield[@type='a']"/>

Page 29: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How is XML used?

• List of databases to search created from parsing XML file

<xtags:forEach select="//record">

<xtags:variable id="url856" type="string" select="field[@type='856']/subfield[@type='u']"/>

<xtags:variable id="dbtitle" type="string" select="./field[@type='245']/subfield[@type='a']"/>

Page 30: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Search targets obtained from XML file using Xtags

Page 31: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Querying the Z39.50 targets is easy!

Working with the data you get back is another story!

Page 32: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differences

• Authentication– Username/passwords– IP authentication

• Z39.50 attributes– Word & WordList – Any & Anywhere

Page 33: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesData Formats

– Marc

Page 34: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesData Formats

– SUTRS (Simple Unstructured Text Record Syntax )

Page 35: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesData Formats

– SUTRS (Simple Unstructured Text Record Syntax )

• Requires special processing to parse the “blob” and display the data

• Can’t merge, de-dup or sort these records

Page 36: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesSource information (773 field)

• Contains the journal title, ISSN, year, volume, issue, and pages

• Used for E-Journal Finder and SFX• Vendors use different subfields for this

information

Page 37: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesSource information (773 field)

773$t Pet Product News 773$x 0899-2177 773$g May 1997, v51, n5, p64(2)

Page 38: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesSource information (773 field)

773$x 0003-0031 773$t American-Midland-Naturalist. 2003, 149: 1, 104-120; 39 ref.

Page 39: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesSource information (773 field)

773$x 0003-0031 773$t American-Midland-Naturalist. 2003, 149: 1, 104-120; 39 ref.

period year

volume issue pages

title

Page 40: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesSource information (773 field)

Aquatic Toxicology [Acquit. Toxicol.]. Vol. 59, no. 3-4, pp. 163-175. 24 Sep 2002.

Review of Palaeobotany and Palynology, 119 (1-2) pp. 93-112, 2002

Indian-Journal-of-Animal-Sciences. 2002, 72: 12, 1122-1124; 10 ref.

History-and-Theory. My 02; 41(2): 250-263

Page 41: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesSource information (773 field)

773$x 0003-0031 773$t American-Midland-Naturalist. 2003, 149: 1, 104-120; 39 ref.

Challenge – Get from this:

Page 42: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

To This:

Page 43: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How is this accomplished?

• Study patterns in the 773 field for the database

• Write SFX source parsers for each format to parse the 773 field into separate field for ISSN, ISBN, volume, issue, start page and end page

Page 44: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

How is this accomplished?• Store parser name for each database in a

database

• Lookup parser name and pass it to SFX in the OpenURL

sfx.lib.ncsu.edu:9003/ncsu?sid=MULTISEARCH:zsilver2&issn=1068-5472&isbn=&atitle=Phalaenopsis+orchid+plant+named+%27Anthura+Gold%27.&pid=US-pat-Plant.+%5BWashington%2C+D.C.+%3A+U.S.+Patent+and+Trademark+Office%2C+1976-.+May+21%2C+2002.+%2812%2C639%29+3+p.

Page 45: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Vendor differencesFull Text

• 856$u - link or pdf• 900$a Magazine: Horticulture, December 2002

900$a SLIP INTO THE HOLIDAYS 900$a Whether you're in a mood to celebrate or not,

Page 46: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Rollout of Service

• Group created to design page layouts and functionality

• Decided to display all databases on results page, not just ones with Z39.50 search capabilities

• Provide link to search the non-Z39.50 databases directly

Page 47: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Rollout of Service

• Load tests to measure performance with more users

• Production – August 19th, 2002

Page 48: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

MultiSearch & Database Finder Usage Statistics

April 2003 - Hits

• Homepage 272,583

– Database Finder 44,813

• Subject Pages 38,676

–MultiSearch 13,372

Page 49: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Post Rollout

• Continued to work to add other vendors with Z39.50 access

• Changed the look of the subject page to make MultiSearch more noticeable

Page 50: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

MultiSearch Version 2.0

• Converted from E-Journal Finder to SFX

• Advanced Search – allow users to select databases to search

• Merging, sorting, and de-duping results

Page 51: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Page 52: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

MultiSearch Version 3

Non-Z39.50 databases

• Screenscaping vendor sites to search get number of results

• Link to vendor site for results

• Only do this for core databases

• Time consuming to maintain

Page 53: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

MultiSearch Version 3

Download capabilities

• Mark citations for download to a file, email or bibliographic software

Page 54: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Looking forward

Good service and tool for patrons for today

Technology is changing. New protocols coming. It will only get better and hopefully easier

Page 55: Shirley Rodgers James Jackson Sanborn

Enhancing Access to Databases – LITA Forum, Norfolk 2003Shirley Rodgers and James M Jackson Sanborn

Two Problems, Two SolutionsOne Service

Demo of

Database Relevancy

&

MultiSearch