review of dbms for linguistic purposes
DESCRIPTION
Review of DBMS for Linguistic Purposes. Marisa Ferrara & Steven Moran Eastern Michigan University. Project Purpose. Linguists specializing in language documentation confront the problem of how to digitally organize and store their data - PowerPoint PPT PresentationTRANSCRIPT
July 15-18, 2004E-MELD 2004
Linguistic Databases & Best Practice1
Review of DBMS for Linguistic Purposes
Marisa Ferrara & Steven Moran
Eastern Michigan University
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
2
Project Purpose
Linguists specializing in language documentation confront the problem of how to digitally organize and store their data
Best practice recommends that linguists create an archival copy in text format with XML markup
However, most linguists use database software to create a working format
A review of DBMS for linguistic purposes with best practice in mind has not, to our knowledge, been done
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
3
Goals
Evaluate database software according to criteria relevant to linguistic documentation Not to select the best software overall
Ongoing project Feedback is appreciated on other
software and our criteria
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
4
DBMS and their interfaces
Database software Can be developed explicitly for language
data Shoebox, FIELD
Can be developed for general purposes FileMaker Pro, MS Access, MySQL
The first type is often an interface built on top of a general purpose DBMS
Both types are used by linguists and will be evaluated according to the same criteria
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
5
Software
All software was either tested on Windows XP or Mac OSX Unix software was left out of this evaluation
The software was chosen from our experience with E-MELD as well as recommendations from various field linguists
14 software applications were chosen for evaluation Access 2003 askSam 5.1 Emdros 1.1.17 Excel 2003 eXist 1.0 FIELD FileMaker Pro 7 Kura 2.0-1-2.1.2 LinguaLinks Workshops MATES MySQL 5.0 PostgreSQL 7.4 Shoebox 5.0 Word 2003
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
6
Test data
We evaluated each software by inputting original data
Data was from Sisaala Western [SSL] and collected by Steven Moran (2003) Typological features include
SVO word order Left-headed NP Contrasting High/Low tonal system
Data included a 20 entry subset of the lexicon that was archived in an Excel 2002 file with the following fields
ID Form Gloss Gram Cat Comment Source Ref Date
2562 o fa ka poɔlla sick; he was sick vp:intrans:pst:3PsgCletus Basing
120 25-Jul-03
2573 bɛ/nna/ diarrhea npCletus Basing
120 25-Jul-03
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
7
Problems encountered
Unicode We had to find a way of either inserting the characters
or a function for cutting and pasting Syllabic Tone
More than one tone mark per lexical entry Link tables
Morphological breakdown Verb phrases had to be marked in a consistent manner Author’s shorthand included common delimiters that
posed a problem for importing and exporting in certain formats
Semantic categories For consistency, link tables should be used
Could be problematic FIELD worked best, but missing semantic fields can not
be added
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
8
Criteria
We developed three different categories of criteria that we considered essential for databasing linguistic information General Information Technical Information Ability to Handle Linguistic Data
Criteria include but are not limited to those used by other evaluations BIFoCAL Open Source Database Software Comparison
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
9
General Information
Developer/Release Date Price Licensing options Platforms Other software needed Help functionality/Tutorial availability Support
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
10
Technical Information
Database Type Pre-defined DB design? ACID-compliance Data integrity Collaborative or single user
Network connection necessary? SSL Access?
Web accessible? Programming Interfaces/API Imports and Exports
XML (best practice) Other formats
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
11
Ability to Handle Linguistic Data
Designed exclusively for linguists? Unicode compatibility (best practice) Special character input method/ease of input Search functionality Allows input and interlinearization of primary text Ability to link primary text to lexicon Multi-Dictionary Format Ability to export lexicon in a presentation format Ability to export grammar in a presentation format Audio/Video/Image support Ability to add missing features Overall Assessment
Pros Cons Recommended for…
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
12
Not evaluated
We could not evaluate all software we chose due to a variety of reasons
Software that is too technical Emdros 1.1.17
• User must program an interface PostgreSQL 7.4
• GUI and CygWin difficult to setup on a Windows machine Software that is bad practice
MS Word 2003• Already reviewed by BIFoCAL
askSam 5.1• Not Unicode compliant
Software still in development or unavailable Kura 2.0-1-2.1.2
• Still discussing installation with developer MATES
• Unavailable on the web (still in development)? LinguaLinks Workshop
• Will be reviewed shortly
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
13
Ongoing Commitment
This evaluation of DBMS for linguistic purposes is ongoing
Our evaluation will be linked to the School of Best Practice Toolroom Software reviews Users can add their opinions to this system
We welcome any suggestions regarding our criteria and other software that should be considered Email us!
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
14
Thank you!
Any questions?
July 15-18, 2004 E-MELD 2004Linguistic Databases & Best Practice
15
References
Anonymous. 2004. “Open Source Database Software Comparison”. Retrieved June 1, 2004 at http://www.geocities.com/mailsoftware42/db/ . BIFoCAL. 2003a. “Software functionality for non-technical users”. Retrieved July 1, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareDims.html . BIFoCAL. 2003b. “Questions to Help Evaluate Linguistic Tools”. Retrieved July 1, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/SoftwareQuestions.html . Buszard-Welcher, Laura. 2003. “Shoebox: A review of it as a tool for digitizing linguistic data”. Berkeley Initiative for Computer Assisted Linguistics (BIFoCAL). Retrieved June 1, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/ShoeboxRev.html . E-MELD School of Best Practice. 2004. Retrieved July 1, 2004 at http://www.emeld.org/school . Engelberg, Miriam. 2000. “Choosing between Microsoft Access and FileMaker Pro”. Retrieved June 30, 2004 at http://www.techsoup.org/howto/articlepage.cfm?ArticleId=207&topicid=6 . Ethnologue. 2004. “Sisaala, Western: a language of Ghana”. Retrieved July 1, 2004 at http://www.ethnologue.com/show_language.asp?code=SSL . Frieb, Werner. 2003. “XML Databases compared”. Retrieved June 21, 2004 at http://www.studierstube.org/world/xml_databases_compared.html . Good, Jeff. 2003. “Microsoft Word: A Review of it as a tool for digitizing linguistic data”. Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/WordRev.html . Holub, Martin and Pavel Míka. 2001. “MATES – an experimental linguistic database system”. Proceedings of the IRCS Workshop on Linguistic Databases. Retrieved online June 1, 2004 at http://www.ldc.upenn.edu/annotation/database/papers/Mika_Holub/21.2.mika.pdf . Nerbonne, John. 1998. “Introduction to John Nerbonne (ed.) Linguistic Databases”. Stanford: CSLI. 1-12. Retrieved June 1, 2004 at http://odur.let.rug.nl/~nerbonne/papers/intro-db.pdf . Rempt, Boudewijn. 2002. “Kura”. Retrieved June 1, 2004 at http://www.ats.lmu.de/kura/manual.pdf . Sprouse, Ronald. 2003. “Filemaker Pro: A review of it as a tool for digitizing linguistic data”. Berkely Initiative for Computer Assisted Linguistics (BiFoCAL). Retrieved June 30, 2004 at http://faust.linguistics.berkeley.edu/~jcgood/bifocal/FileMakerRev.html . Walkenback, John. 2004b. “Excel 2003 Review”. Retrieved July 1, 2004 at http://www.j-walk.com/ss/excel/xl2003.htm . Webopedia. 2004a. “Database Management System”. Retrieved July 6, 2004 at http://www.webopedia.com/TERM/D/database_management_system_DBMS.html . Webopedia. 2004b. “SSL”. Retrieved July 6, 2004 at http://www.webopedia.com/TERM/S/SSL.html .