resource discovery (metadata and searching) working group report

28
Resource Discovery (metadata and searching) Working Group Report

Upload: annabelle-black

Post on 26-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Resource Discovery (metadata and searching) Working Group Report

Resource Discovery(metadata and searching)

Working Group Report

Page 2: Resource Discovery (metadata and searching) Working Group Report

Issues discussed

• What kinds of resources should EMELD provide search services for?

• What should the design be for an EMELD search interface?

• How can EMELD get good metadata into its search database?

• What level of metadata should be exposed?

Page 3: Resource Discovery (metadata and searching) Working Group Report

What resources?

• Anything that might be of value to the endangered language's linguist.– Language data– Tools– Advice (including reviews)– People– "Gateway" websites

Page 4: Resource Discovery (metadata and searching) Working Group Report

What resources?

• But, there's no reason to rely on this working group for "what".

• A questionnaire distributed via Linguist

Page 5: Resource Discovery (metadata and searching) Working Group Report

What resources?

• Two kinds of best practice resources

• Resources with best practice metadata– These resources can be discovered– Non-digital resources encouraged– Digital resources discouraged, but allowed

Page 6: Resource Discovery (metadata and searching) Working Group Report

What resources?

• Best practice digital resources

• All digital resources encouraged to be of this type

• Benefits– Enhanced search features (due to document

interoperability)– Special "BP globe of approval" √

Page 7: Resource Discovery (metadata and searching) Working Group Report

What resources?

• Side Note– Best Practice "approval" system should be tied

into a larger system through which digital resources could be listed as "publications"

– A topic for another working group? (Perhaps OLAC?)

Page 8: Resource Discovery (metadata and searching) Working Group Report

What resources?• Issues which need to be addressed

• Metadata for resources interesting to linguists but which are not linguistic data

• Needed: Best practice metadata standards for– Tools– Advice– People– ...

• Test: EMELD could see how it would classify everything in BPU.

Page 9: Resource Discovery (metadata and searching) Working Group Report

How to search?

• Assumption: Metadata and data is distributed

• Query Language– Metadata: OLAC standard– Data from interoperable documents: A new

standard

Page 10: Resource Discovery (metadata and searching) Working Group Report

How to search?

• Resource Query Language Ideal– A generalized query protocal used across the

linguistics community– A series of "methods" to be defined can be

called on these resources to retrieve structured linguistic data matching query parameters

Page 11: Resource Discovery (metadata and searching) Working Group Report

How to search?

• Problems implementing ideal– No clear sense as to what "methods" are

needed.– One solution: Examine results from

questionnaire

Page 12: Resource Discovery (metadata and searching) Working Group Report

How to search?

• Problems implementing ideal– Very few repositories allow their data to be

accessed in a generalized way– First step: Encourage documentation of

repository data access systems and develop a metadata standard for this

Page 13: Resource Discovery (metadata and searching) Working Group Report

How to search?• Long term implementation issues

– An OLAC Query Language Protocol• A well-defined linguistic query language

• A system for "packaging" queries

– Linguistic data search registry• Linguistic sites register they are data access sites

• They also register implemented search methods

– EMELD will archive best-practice documents for data access for data creators not capable of implementing the query protocol

Page 14: Resource Discovery (metadata and searching) Working Group Report

How to search?• Pilot project

– Take some small subset of resources• Data inputted via Field

• Nijmegen? SIL? AIATSIS? AILLA?

– Take FIELD search out of FIELD– Search over that small set of resources– Ideally, keep both resources in separate

databases to begin to develop query interchange protocol

Page 15: Resource Discovery (metadata and searching) Working Group Report

How to search?• Another project: Grammatical thesaurus

– Develop a grammatical thesaurus that gives common synomyns for a given grammatical term (Ex. oral stop, plosive)

– This could then be used to allow a user's search to be expanded to include synonyms for a given term.

– In all likelihood, there are other applications of this.

Page 16: Resource Discovery (metadata and searching) Working Group Report

How to search?• Search interface

– EMELD should implement a VISER-like service for access to its database

– There are two distinct kinds of searches• Resource location

• Resource data search

Page 17: Resource Discovery (metadata and searching) Working Group Report

How to search?• Search interface

– The details of the search interface implemented by EMELD are hard to conceive of until more resources can be accessed through it

– A questionnaire can help with this area too.• EMELD could ask people to try the search and

evaluate it

• Starting with the people in this room

Page 18: Resource Discovery (metadata and searching) Working Group Report

Getting the data

• Sticks– EMELD Ambassadors– Assisted by Linguist Spider

Page 19: Resource Discovery (metadata and searching) Working Group Report

Getting the data

• Carrots– Support harvesting metadata in document

headers for submitted URL's.– Resources with best practice metadata can be

referenced using some standard EMELD URI which can be used as a reference

– These resources could be posted and advertised on Linguist(but consult Baden first)

Page 20: Resource Discovery (metadata and searching) Working Group Report

Getting the data

• Juiciest Carrots (Best Practice resources only)

– "Preferred" EMELD URI's– Marked as such in a search– Could undergo "advanced" search techniques– Be peer-reviewed and vetted by LDRA

(Linguistic Digital Resource Association)*

*This organization does not exist, as far as I know.

Page 21: Resource Discovery (metadata and searching) Working Group Report

Granularity

• Right now there are no recommendations for the granularity of exposed metadata records– Large archives, for example, have hierarchical

structure, one level of which must be isolated (the IMDI session, for example)

– Cutting-edge archives don't work well with the resource=object model. Their resources are "created" based on the user's needs

Page 22: Resource Discovery (metadata and searching) Working Group Report

Granularity

• The lack of recommendations on this issue inhibits metadata creation

• Granularity makes a big difference as to what content is searchable

• Two different audience's in need of advice– "Real" archives (a.k.a. trusted repositories)– Individuals

Page 23: Resource Discovery (metadata and searching) Working Group Report

Granularity

• Recommendation: EMELD should encourage IMDI and OLAC to devise best-practice recommendations for granularity

Page 24: Resource Discovery (metadata and searching) Working Group Report

The questionnaire

• Two broad kinds of questions:– What kinds of things would you like?– What kinds of would you hate hate?

(Dafydd's Corollary)

Page 25: Resource Discovery (metadata and searching) Working Group Report

The questionnaire

• Part one: Search capabilities– How do you want to conduct your search (google-

style, directory-style, pull-down menus...)?– What kinds of searches are you doing already on

other sites?– Search within results? (We wanted this.)– Thesaurus-based search

Page 26: Resource Discovery (metadata and searching) Working Group Report

The questionnaire

• Part Two: Search content– Free entry (like Google)– Feature-based entry– Statistical questions– Phonetic characters– Geographical search– Time search– ...

Page 27: Resource Discovery (metadata and searching) Working Group Report

The questionnaire

• Part Three: Results– Google-like results– Journal abstract search-like results– Restricted results (only return web sites, .pdf

documents, ...)– ...

Page 28: Resource Discovery (metadata and searching) Working Group Report

The questionnaire

• Format– Online submission– Combination multiple choice (for the uncreative)

and free form (for the creative)– Encourage people to envision the search of the

year 2503