fulbright grant: research proposal

4
STATEMENT OF PROPOSED STUDY OR RESEARCH Pamela Fox | Australia | Computer Sciences A World Wide Web of Comparative Linguistics Overview: The goal of this project is to develop a visual interface that allows the user to input a desired word in an available language and interactively view the etymological history of that word displayed visually on a map of the world. There are several objectives to this project: developing an intuitive visual interface for displaying etymological information to non-linguists in an entertaining yet educational way; increasing a sense of global connectedness by showing how languages are intertwined with one another; discovering the best techniques for describing to a computer database how words in different languages are connected to one another; realizing the most efficient method for storing words and discovering connections. System Description: For example, after inputting “pass” in English, the system would draw an arrow from England to France, which would display “passer,” and then to Italy, which would display “passus.” Mousing over any word would display information on its meaning and language (e.g. “step” and “Latin” for “passer”). By clicking on a word from one of the non-modern languages and choosing the "Find Descendants option", the system would then find all words that derived from that word. For example, clicking on the Latin “passus” would then show the user that the Spanish word “paso” and that another English word “pace” were derived from it. The user would then realize that modern-day “pace” and “paso” share a related meaning because of their common ancestor, and can be called cognates, words or morphemes related by derivation, borrowing, or descent. The system would include an option that would force the system to automatically search for and display all the possible cognates to the user. For words that are simply borrowed from another language with no derivational change (called “loan words”), a visually distinct arrow would be used to represent that type of connection. It is of note that typical dictionaries only give ancestral etymology for a given word, but since this system will combine the etymological knowledge of every

Upload: wuzziwug

Post on 14-Oct-2014

3.541 views

Category:

Documents


2 download

DESCRIPTION

linguistics research

TRANSCRIPT

Page 1: Fulbright Grant: Research Proposal

STATEMENT OF PROPOSED STUDY OR RESEARCHPamela Fox | Australia | Computer SciencesA World Wide Web of Comparative Linguistics

Overview: The goal of this project is to develop a visual interface that allows the user to input a desired word in an available language and interactively view the etymological history of that word displayed visually on a map of the world. There are several objectives to this project: developing an intuitive visual interface for displaying etymological information to non-linguists in an entertaining yet educational way; increasing a sense of global connectedness by showing how languages are intertwined with one another; discovering the best techniques for describing to a computer database how words in different languages are connected to one another; realizing the most efficient method for storing words and discovering connections.

System Description: For example, after inputting “pass” in English, the system would draw an arrow from England to France, which would display “passer,” and then to Italy, which would display “passus.” Mousing over any word would display information on its meaning and language (e.g. “step” and “Latin” for “passer”). By clicking on a word from one of the non-modern languages and choosing the "Find Descendants option", the system would then find all words that derived from that word. For example, clicking on the Latin “passus” would then show the user that the Spanish word “paso” and that another English word “pace” were derived from it. The user would then realize that modern-day “pace” and “paso” share a related meaning because of their common ancestor, and can be called cognates, words or morphemes related by derivation, borrowing, or descent. The system would include an option that would force the system to automatically search for and display all the possible cognates to the user. For words that are simply borrowed from another language with no derivational change (called “loan words”), a visually distinct arrow would be used to represent that type of connection. It is of note that typical dictionaries only give ancestral etymology for a given word, but since this system will combine the etymological knowledge of every word, it can easily provide descendant information as well (the forward links).

System Extensions: These are several extensions I’ve already thought of; I expect more to arise while I am conducting the research. When the information is available to do so, a word could be broken down into its roots, and the user could find the evolutionary information of just a root. In some cases, this will reveal connections between words that would not otherwise be found when querying the entire word.

In my search for related work, I was upset that I couldn’t find linguistic visualization programs on my own until a veteran of the field gave me links. It seems to me that research with an interactive nature should be made easily available for its audience to use and evaluate, and so I would code my program’s visual interface in a web-accessible format (e.g. Flash, Java). To further increase its interactivity, I would make it possible for certain users (e.g. linguists) to input their own linguistic knowledge into the system. So the project would also be an experiment in the programming of dynamic and interactive programs, which I see as a new paradigm for research in the Internet age.

Related Work: The Tower of Babel (A) is an interactive etymological database, but it doesn’t display the information visually, and is clearly only designed for a technical crowd because it expects knowledge of specialized jargon and isn’t user friendly.

The Visual Thesaurus (B) is a fun interactive visual tool for displaying synonymy/antonymy between words in the English language. Kirrkirr (C) is a similar

Page 2: Fulbright Grant: Research Proposal

STATEMENT OF PROPOSED STUDY OR RESEARCHPamela Fox | Australia | Computer SciencesA World Wide Web of Comparative Linguisticsproject, a visual dictionary for indigenous languages that offers semantic and some limited translation information in various multimedia formats. VerbOcean (D) is a project which finds precedence links between English verbs by mining the web and displays the connections (e.g. discuss-> pursue -> support ->approve).

My proposed project would combine more dimensions of information into one view than any of the above visualization projects: temporal (word:time), spatial (word:location), semantic (word:meaning), and parent-child (word:word). The challenges posed by the storing, processing, and visualization of cross-lingual data in so many dimensions is probably why such a project has never been attempted before, but I am prepared to face them and resolve any related issues.

Sponsor: If granted a Fulbright, this project will be conducted at the University of Melbourne in Australia under the supervision of Professor ----, whose research focuses on computational models for linguistic information. I found the sponsor after a summer research mentor in computational linguistics recommended that I check out Melbourne’s “Human Language Technology” group. Prof. --- is director of that group and specifically researches computational models for linguistic information. The group’s interests are in discovering the best ways to digitally store, process, and perform computations on linguistic information—perfectly aligned with my project’s interests. After researching related computer science/linguistics departments worldwide, I’ve concluded that this particular research center and professor will provide the best intellectual support and resources. Language-wise, several criteria make Australia a perfect country to conduct the research in, as it is English-speaking in modern times but still has highly studied aboriginal languages. A primarily English-speaking country is best for the project as the English language is (in)famous for the extremely varied origins of its words and our lexicon would make for the most interesting starting point for this project. Once the feasibility of storing and visualizing the information for languages like English whose ancestors spread across the globe, it will be an interesting extension to see how the system could deal with and display the connectedness of the much more isolated Australian aboriginal languages.

Timeline: I would conduct the research over the span of two Australian semesters, beginning February 2007 and ending November 2007. This aligns with the University of Melbourne schedule so that I may also choose to enroll in classes relevant to the research simultaneously. I would begin the research by first locating the necessary etymological resources (e.g. dictionaries), and then I would work on discovering the best way to convert and store those potentially diverse resources in one database, and then developing connection algorithms and creating the actual visualization program. After I’ve developed an initial prototype, I will work on efficiency and extensions (mentioned above) that will add to the uniqueness of the research. As suggested by my sponsoring professor, I will be submitting at least one paper detailing the results of my research to an academic conference during that time.Cited WorksA: Tower of Babel: An Etymological Database Project http://starling.rinet.ru/ B: The Visual Thesaurus http://www.visualthesaurus.com/ C: Kirrkirr http://www-nlp.stanford.edu/kirrkirr/D: VerbOcean http://semantics.isi.edu/ocean/