review of dbms for linguistic purposes

15
July 15-18, 2004 E-MELD 2004 Linguistic Databases & Best Practice 1 Review of DBMS for Linguistic Purposes Marisa Ferrara & Steven Moran Eastern Michigan University

Upload: tanuja

Post on 07-Jan-2016

26 views

Category:

Documents


2 download

DESCRIPTION

Review of DBMS for Linguistic Purposes. Marisa Ferrara & Steven Moran Eastern Michigan University. Project Purpose. Linguists specializing in language documentation confront the problem of how to digitally organize and store their data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Review of DBMS for Linguistic Purposes

July 15-18, 2004E-MELD 2004

Linguistic Databases & Best Practice1

Review of DBMS for Linguistic Purposes

Marisa Ferrara & Steven Moran

Eastern Michigan University

Page 2: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

2

Project Purpose

Linguists specializing in language documentation confront the problem of how to digitally organize and store their data

Best practice recommends that linguists create an archival copy in text format with XML markup

However, most linguists use database software to create a working format

A review of DBMS for linguistic purposes with best practice in mind has not, to our knowledge, been done

Page 3: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

3

Goals

Evaluate database software according to criteria relevant to linguistic documentation Not to select the best software overall

Ongoing project Feedback is appreciated on other

software and our criteria

Page 4: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

4

DBMS and their interfaces

Database software Can be developed explicitly for language

data Shoebox, FIELD

Can be developed for general purposes FileMaker Pro, MS Access, MySQL

The first type is often an interface built on top of a general purpose DBMS

Both types are used by linguists and will be evaluated according to the same criteria

Page 5: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

5

Software

All software was either tested on Windows XP or Mac OSX Unix software was left out of this evaluation

The software was chosen from our experience with E-MELD as well as recommendations from various field linguists

14 software applications were chosen for evaluation Access 2003 askSam 5.1 Emdros 1.1.17 Excel 2003 eXist 1.0 FIELD FileMaker Pro 7 Kura 2.0-1-2.1.2 LinguaLinks Workshops MATES MySQL 5.0 PostgreSQL 7.4 Shoebox 5.0 Word 2003

Page 6: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

6

Test data

We evaluated each software by inputting original data

Data was from Sisaala Western [SSL] and collected by Steven Moran (2003) Typological features include

SVO word order Left-headed NP Contrasting High/Low tonal system

Data included a 20 entry subset of the lexicon that was archived in an Excel 2002 file with the following fields

ID Form Gloss Gram Cat Comment Source Ref Date

2562 o fa ka poɔlla sick; he was sick vp:intrans:pst:3PsgCletus Basing

120 25-Jul-03

2573 bɛ/nna/ diarrhea npCletus Basing

120 25-Jul-03

Page 7: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

7

Problems encountered

Unicode We had to find a way of either inserting the characters

or a function for cutting and pasting Syllabic Tone

More than one tone mark per lexical entry Link tables

Morphological breakdown Verb phrases had to be marked in a consistent manner Author’s shorthand included common delimiters that

posed a problem for importing and exporting in certain formats

Semantic categories For consistency, link tables should be used

Could be problematic FIELD worked best, but missing semantic fields can not

be added

Page 8: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

8

Criteria

We developed three different categories of criteria that we considered essential for databasing linguistic information General Information Technical Information Ability to Handle Linguistic Data

Criteria include but are not limited to those used by other evaluations BIFoCAL Open Source Database Software Comparison

Page 9: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

9

General Information

Developer/Release Date Price Licensing options Platforms Other software needed Help functionality/Tutorial availability Support

Page 10: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

10

Technical Information

Database Type Pre-defined DB design? ACID-compliance Data integrity Collaborative or single user

Network connection necessary? SSL Access?

Web accessible? Programming Interfaces/API Imports and Exports

XML (best practice) Other formats

Page 11: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

11

Ability to Handle Linguistic Data

Designed exclusively for linguists? Unicode compatibility (best practice) Special character input method/ease of input Search functionality Allows input and interlinearization of primary text Ability to link primary text to lexicon Multi-Dictionary Format Ability to export lexicon in a presentation format Ability to export grammar in a presentation format Audio/Video/Image support Ability to add missing features Overall Assessment

Pros Cons Recommended for…

Page 12: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

12

Not evaluated

We could not evaluate all software we chose due to a variety of reasons

Software that is too technical Emdros 1.1.17

• User must program an interface PostgreSQL 7.4

• GUI and CygWin difficult to setup on a Windows machine Software that is bad practice

MS Word 2003• Already reviewed by BIFoCAL

askSam 5.1• Not Unicode compliant

Software still in development or unavailable Kura 2.0-1-2.1.2

• Still discussing installation with developer MATES

• Unavailable on the web (still in development)? LinguaLinks Workshop

• Will be reviewed shortly

Page 13: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

13

Ongoing Commitment

This evaluation of DBMS for linguistic purposes is ongoing

Our evaluation will be linked to the School of Best Practice Toolroom Software reviews Users can add their opinions to this system

We welcome any suggestions regarding our criteria and other software that should be considered Email us!

Page 14: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

14

Thank you!

Any questions?

Page 15: Review of DBMS for Linguistic Purposes

July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice

15

References

Anonymous. 2004. “Open Source Database Software Comparison”. Retrieved June 1, 2004 at http://www.geocities.com/mailsoftware42/db/ . BIFoCAL. 2003a. “Software functionality for non-technical users”. Retrieved July 1, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareDims.html . BIFoCAL. 2003b. “Questions to Help Evaluate Linguistic Tools”. Retrieved July 1, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareQuestions.html . Buszard-Welcher, Laura. 2003. “Shoebox: A review of it as a tool for digitizing linguistic data”. Berkeley Initiative for Computer Assisted Linguistics (BIFoCAL). Retrieved June 1, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/ShoeboxRev.html . E-MELD School of Best Practice. 2004. Retrieved July 1, 2004 at http://www.emeld.org/school . Engelberg, Miriam. 2000. “Choosing between Microsoft Access and FileMaker Pro”. Retrieved June 30, 2004 at http://www.techsoup.org/howto/articlepage.cfm?ArticleId=207&topicid=6 . Ethnologue. 2004. “Sisaala, Western: a language of Ghana”. Retrieved July 1, 2004 at http://www.ethnologue.com/show_language.asp?code=SSL . Frieb, Werner. 2003. “XML Databases compared”. Retrieved June 21, 2004 at http://www.studierstube.org/world/xml_databases_compared.html . Good, Jeff. 2003. “Microsoft Word: A Review of it as a tool for digitizing linguistic data”. Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/WordRev.html . Holub, Martin and Pavel Míka. 2001. “MATES – an experimental linguistic database system”. Proceedings of the IRCS Workshop on Linguistic Databases. Retrieved online June 1, 2004 at http://www.ldc.upenn.edu/annotation/database/papers/Mika_Holub/21.2.mika.pdf . Nerbonne, John. 1998. “Introduction to John Nerbonne (ed.) Linguistic Databases”. Stanford: CSLI. 1-12. Retrieved June 1, 2004 at http://odur.let.rug.nl/~nerbonne/papers/intro-db.pdf . Rempt, Boudewijn. 2002. “Kura”. Retrieved June 1, 2004 at http://www.ats.lmu.de/kura/manual.pdf . Sprouse, Ronald. 2003. “Filemaker Pro: A review of it as a tool for digitizing linguistic data”. Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/FileMakerRev.html . Walkenback, John. 2004b. “Excel 2003 Review”. Retrieved July 1, 2004 at http://www.j-walk.com/ss/excel/xl2003.htm . Webopedia. 2004a. “Database Management System”. Retrieved July 6, 2004 at http://www.webopedia.com/TERM/D/database_management_system_DBMS.html . Webopedia. 2004b. “SSL”. Retrieved July 6, 2004 at http://www.webopedia.com/TERM/S/SSL.html .