2004 lti student research symposiumvitor/srs/2004_lti_srs_brochure.pdf · the observant of you may...
TRANSCRIPT
PREFACE
Welcome to the Language Technologies Institutes's second annual Student Research
Symposium! Our first symposium, held last year, was very successful, with a collection
of high-quality student presentations ranging the wide spectrum of research problems in
language technology related areas. We think the program selected for this year's
symposium is equally exciting and high in quality, and we hope you will agree, once you
hear the talks!
The program this year was once again selected in a competitive process. We received a
total of 15 submissions, out of which nine abstracts were selected for presentation by the
selection committee. The committee consisted of four faculty members (Alan Black,
Robert Frederking, Alon Lavie, and Alex Rudnicky) and one graduate student (Benjamin
Han). We believe the selected program is a true reflection of the diverse high-quality
research in which the graduate students of the LTI are engaged.
The observant of you may note that we have added a few new features to the SRS
program this year. We have invited Betty Cheng, the prize winner of last year's
symposium, to give a keynote presentation, where we hope to hear about how her
interesting research on using language modeling tools for understanding the structure and
interactions of biological proteins has progressed. Another new feature is the printed
program, which you are holding in your hands. We have also added two honorable
mention cash prizes, in addition to the best presentation award. The awards will be
presented at a brief ceremony at the end of the day, so be sure to stick around!
We wish to thank the faculty and students that have volunteered to serve on the panel
that will select the winning presentations. We also thank Chris Koch for helping with the
logistics of the symposium. Special thanks to Catherine Copetas for her key role in
producing the programs, posters and publicity for the SRS.
We trust you will enjoy the presentations of the second LTI SRS, which we hope will
become an annual tradition in the years to come.
Benjamin Han and Vitor Carvalho Alon Lavie
SRS Student Organizers Faculty Advisor
Language Technologies Institute
2
2004 Student Research Symposium
3
PROGRAM
Time Event Speaker Title
8:30 Breakfast (Provided)
9:00 Talk 1 Jonathan
Brown
Retrieval of Authentic Documents for
Reader- Specific Lexical Practice
9:30 Talk 2 Wen Wu Incremental Detection of Text on Road
Signs from Video
10:00 Coffee Break
(Provided)
10:30 Talk 3 Kenji Sagae Using Dependencies for Easy, Fast and
Accurate Grammatical/Functional Analysis
11:00 Talk 4 Guy
Lebanon
Hyperplane Margin Classifiers on the
Multinomial Manifold
11:30 Talk 5 Antoine
Raux
Maximum Likelihood Adaptation of Semi-
Continuous HMMs by Latent Variable
Decomposition of State Distributions
12:00 Lunch (on your own)
13:30 Talk 6 (KEYNOTE) Betty Yee-
Man Cheng
Language Technologist's Approach to
Understanding G-Protein-GPCR
Interaction
14:00 Talk 7 John
Kominek
On the Road to High Quality Universal
Speech Synthesis
14:30 Talk 8 Nikesh
Garera
Towards a Personal Briefing Assistant
15:00 Coffee Break
(Provided)
15:30 Talk 9 Luo Si Federated Search in Uncooperative
Environments
16:00 Talk 10 Satanjeev
Banerjee
Automatically Detecting the Structure of
Human Meetings
16:30 Break
16:45 Best Presentation
Award and Closing
Ceremony
• SRS web site: http://www.cs.cmu.edu/~vitor/srs
Language Technologies Institute
4
2004 Student Research Symposium
5
Jonathan C. Brown [email protected]
Retrieval of Authentic Documents for Reader-Specific Lexical Practice
When a teacher gives a reading assignment in today’s language learning classrooms, all
of the students are almost always reading the same text. Although students have different
reading levels, it is impractical for a single teacher to seek out unique texts matched to
each student’s abilities. In this presentation, I describe REAP, a system designed to
assign each student individualized readings by combining new techniques in reading
difficulty estimation [1] and detailed student and curriculum modeling [2] with the large
amount of authentic materials on the Web. REAP is designed to be used as an additional
resource in teacher-led classes, as well as to be used by reading comprehension
researchers for testing hypotheses on how to improve reading skills for L1 as well as L2
learners. I describe how researchers can use this tool to get fine-grained control over
selection of reading materials, so that they can more easily test these new learning
hypotheses.
Vocabulary acquisition is the primary factor we use in matching texts to a student’s
abilities. These abilities are modeled as a histogram of words. We also model each
desired curriculum level as a histogram of words, learned from a corpus of texts that the
students would normally read. Differences between the student model and that of the
next desired skill level indicate where the student needs to focus. The system can also
prioritize different criteria during the search. For instance, the system can retrieve
documents based solely on the vocabulary terms needed to progress toward the next
level, thereby focusing on curriculum. REAP can also take into account other goals, such
as student interests, special topics, or an upcoming test, all represented as word
histograms. This allows teachers and researchers to decide what they want the students to
focus on for each session.
1. K. Collins-Thompson and J. Callan. (2004.) "A language modeling approach to predicting
reading difficulty." In Proceedings of the HLT/NAACL 2004 Conference. Boston.
2. J. Brown and M. Eskenazi. (2004.) "Retrieval of Authentic Documents for Reader-Specific
Lexical Practice." In Proceedings of InSTIL/ICALL Symposium 2004. Venice, Italy.
Language Technologies Institute
6
Wen Wu [email protected]
Incremental Detection of Text on Road Signs from Video
Automatic detection of text from video is an essential task for video indexing and
understanding. In this talk, we focus on the task of automatically detecting text on road
signs from video. Text on road signs carries much useful information necessary for a
driver’s safely driving and efficient navigation. Automatically detecting text on road
signs can help to keep a driver aware of the traffic situation and surrounding
environments. Such a multimedia system can reduce driver’s cognitive load and enhance
safety in driving, which is especially useful for elderly drivers with weak visual acuity.
In this talk, I will present a fast and robust framework for incrementally detecting text on
road signs from natural scene video. The new framework makes two main contributions.
First, the framework applies a Divide-and-Conquer strategy to decompose the original
task into two sub-tasks, that is, localization of road signs and detection of text.
Corresponding algorithms for the two sub-tasks are proposed and they are smoothly
incorporated into a unified framework through a real-time feature tracking algorithm.
Second, the framework provides a novel way for text detection from video by integrating
2D features in each video frame (e.g., color, edges, texture) with 3D information
available in a video sequence (e.g., object structure). The feasibility of the proposed
framework has been evaluated on 22 video sequences captured from a moving vehicle.
The new framework gives an overall text detection rate of 88.9% and false hit rate of
9.2%, which makes it possible for it to be applied to a driving assistant system and other
tasks of text detection from video.
1. W. Wu, X. Chen and J. Yang. Incremental Detection of Text on Road Signs from Video with
Application to a Driving Assistant System. To appear in ACM Multimedia, New York, USA,
2004. (Oral Presentation).
2004 Student Research Symposium
7
Kenji Sagae [email protected]
Using Dependencies for Easy, Fast and Accurate Grammatical/Functional Analysis
Modern statistical syntactic parsers have achieved very high levels of accuracy over the
past ten years, and we have begun to see their impact on several areas of language
technologies, such as question answering, machine translation, and semantic-role
labeling. Because the Penn Treebank (PTB) is widely used for training of such parsers, it
is common to associate PTB-style constituent trees with statistical parsing. However,
there are instances where other syntactic representations would be easier to use, and just
as useful (if not more). One such instance is the assignment of grammatical relations (or
even PTB function tags) to words. In this case, dependencies are not only more
comfortable to understand and faster to annotate, but also easier to process and largely
just as effective.
I will discuss a simple representation based on lexical dependencies, which I have been
using in the syntactic analysis of parent-child dialogs. I will present a simple
deterministic algorithm for dependency parsing, and show the accuracy of the
dependencies it produces is very close to the accuracy of current PTB constituent
statistical parsers (91% vs. 93%). Although PTB constituent parsers have a slight edge,
they are quite complex. I will show that a dependency parser that performs almost as
well can be surprisingly simple and fast.
I will also discuss how these dependencies can be used to determine PTB function tags
(such as subject, predicate, temporal, beneficiary, locative, etc). The current state-of-the-
art on assigning function tags to text is the work of Blaheta (2000, 2003), and it uses
(among other features) PTB parse trees nodes. I will present results that are very similar
using no constituent information, only dependencies. Both methods achieve an overall
accuracy of about 87% in function tagging (not counting .NULL. tags). Blaheta.s method
is slightly better on tags classified as .grammatical. (subject, predicate, etc), while the
dependency approach is slightly better on .form/function. tags (temporal, locative,
manner, etc).
This approach to function tagging can also be used to label all dependency arcs, when
training data is available. In fact, a relatively small training corpus (less than 10,000
words) can be used to produce a system that assigns a grammatical relation label to every
dependency arc with an accuracy of about 90% in a corpus of parent-child dialogs.
Language Technologies Institute
8
Guy Lebanon [email protected]
Hyperplane Margin Classifiers on the Multinomial Manifold
Linear classifiers are a mainstay of machine learning algorithms, forming the basis for
techniques such as the perceptron, logistic regression, boosting, and support vector
machines. A linear classifier, parameterized by a vector , classifies examples
according to the decision rule
following the common practice of identifying x with the feature vector !(x). The
differences between different linear classifiers lie in the criteria and algorithms used for
selecting the parameter vector based on a training set.
Geometrically, the decision surface of a linear classifier is formed by a hyperplane or
linear subspace in n-dimensional Euclidean space,
where
!,! denotes the Euclidean inner product. (In both the algebraic and geometric
formulations, a bias term is sometimes added; we prefer to absorb the bias into the
notation given by the inner product, by setting xn = 1 for all x.) The linearity assumption
made by such classifiers may be justified by its solution to the fundamental learning
tradeoff between complexity of models and restricted expressiveness.
However, we show that implicit in this argument is the presence of Euclidean geometry.
If the data is not well described by Euclidean geometry, the main motivation for linear
classifiers fails and a generalization of linear classifiers, adapted to the present geometry
is expected to perform better. In this work, we generalize the notion of linear hyperplane
and margin to arbitrary Riemannian geometries. The natural generalization of logistic
regression is then defined and its properties are examined. We focus our attention on the
Fisher geometry of the multinomial manifold that forms a natural geometric space for text
documents. The resulting generalization of logistic regression is shown to outperform its
Euclidean counterpart on several standard text classification tasks.
2004 Student Research Symposium
9
Antoine Raux [email protected]
Maximum Likelihood Adaptation of Semi-Continuous HMMs by Latent Variable Decomposition of State Distributions
Hidden Markov Models, the single most used method for speech recognition, involve two
types of parameters: transition probabilities, which model the temporal aspect of speech,
and output distribution parameters (usually means, variances and weights of Gaussian
mixtures), which capture the spectral properties of sub-phonemic units, each unit being
equivalent to a state in the model. In Continuous Density HMMs (CDHMMs), each state
has its own output distribution, independent of that of other states. While this makes for
powerful models, it implies the use of a large number of Gaussians, since there are
typically on the order of several thousand states and tens or hundreds of Gaussians per
mixture. This requires a large amount of training data and makes the use of such models
computationally expensive. On the other hand, in Semi-Continuous HMMs (SCHMMs),
all the states share a single set of Gaussians and only the mixture weights depend on the
state. Compared to CDHMMs, SCHMMs are more compact in size, require less data to
train well and result in comparable recognition performance with much faster decoding
speeds. Nevertheless, the use of SCHMMs in large vocabulary speech recognition
systems has declined considerably in recent years. A significant factor that has
contributed to this is that systems that use SCHMMs cannot be easily adapted to new
acoustic (environmental or speaker) conditions. While maximum likelihood (ML)
adaptation techniques have been very successful for CDHMMs, these have not worked to
a usable degree for SCHMMs. In this talk, I will present a new framework for supervised
ML adaptation of SCHMMs, built upon the paradigm of Probabilistic Latent Semantic
Analysis (PLSA). We use PLSA to decompose the probability distribution of each
Gaussian given the state (i.e. the mixture weights) according to a latent variable. The
decomposition is performed using a variant of the Expectation Maximization algorithm. I
will show how our approach is equivalent to smoothing the mixture weight matrix
obtained by retraining the original model on a small amount of adaptation data.
Experiments on non-native speech recognition in the framework of the Let’s Go spoken
dialogue system demonstrate the effectiveness of this method.
Language Technologies Institute
10
Betty Yee-Man Cheng [email protected]
KEYNOTE: Language Technologist’s Approach to Understanding G-Protein-GPCR Interaction
String alignments and n-grams are commonly used in language technology applications,
such as machine translation, information retrieval, speech recognition and synthesis. In
machine translation, alignment can yield high accuracy if the source and target languages
have similar word order. However, if the two languages have very different word order,
getting a correct alignment can be difficult and an n-gram based MT system may perform
better. Likewise, a correct alignment of protein sequences can yield high accuracy in
prediction problems. But segments or “words” in the protein sequence can shuffle in
their linear order while preserving their orientation in 3D space and therefore the
protein’s function or “meaning” as well.
The superfamily of proteins in this study, G-protein coupled receptors (GPCR), are
important in pharmacological research as they are the target of approximately 60% of
current drugs on the market (Muller, 2000). Coupling with G-proteins, these receptors
regulate much of the cell’s reactions to external stimuli. Abnormalities in this regulation
can lead to cancer, Alzheimer’s, Parkinson’s and other diseases. Identification of the
type of G-proteins that can bind to a particular GPCR can provide information on the
causes and symptoms of the disease the receptor is involved in.
Previous studies on predicting the family of G-proteins that can couple to a given GPCR
sequence have focused on the intracellular domains of the receptor sequence, either using
alignment-based features (Cao et al., 2003; Qian et al., 2003), n-gram features (Moller et
al., 2001) or physiochemical properties of the amino acids (Henriksson, 2003). From the
roles of alignments and n-grams in MT and their analogy to the protein language, we
have chosen to combine alignment and n-gram information in a hybrid prediction method
using a k-nearest neighbours (k-NN) classifier on sequence alignment similarity and a k-
NN classifier on Euclidean distance of n-gram counts. Our method outperforms the
current state-of-the-art in precision, recall and F1. Systematic experiments with our
prediction method were able to validate biologists’ hypothesis that most of the coupling
specificity information resides in the 2nd
and 3rd
intracellular loops of the receptor, while
providing evidence for a new hypothesis that the information is more localized to the
beginning of the 2nd
intracellular loop.
1. Cao, J., R. Panetta, et al. (2003). "A naive Bayes model to predict coupling between seven
transmembrane domain receptors and G-proteins." Bioinformatics 19(2): 234-40.
2. Henriksson, A. (2003). Prediction of G-protein Coupling of GPCRs - A Chemometric
Approach. Engineering Biology. Linkoping, Linkoping University: 79.
2004 Student Research Symposium
11
3. Moller, S., J. Vilo, et al. (2001). "Prediction of the coupling specificity of G protein coupled
receptors to their G proteins." Bioinformatics 17 Suppl 1: S174-81.
4. Muller, G. (2000). "Towards 3D structures of G protein-coupled receptors: a multidisciplinary
approach." Curr Med Chem 7(9): 861-88.
5. Qian, B., O. S. Soyer, et al. (2003). "Depicting a protein's two faces: GPCR classification by
phylogenetic tree-based HMMs." FEBS Lett 554(1-2): 95-9.
Language Technologies Institute
12
John Kominek [email protected]
On the Road to High Quality Universal Speech Synthesis
Machine Translation has the Vaquois Triangle -- a famous high-level perspective that
delineates the major approaches to MT, as well as their limitations. You can have either
universality (through an Interlingua) or high quality (Direct translation), but not both. In
between, trying to find a happy medium, reside Transfer techniques.
The field of Speech Synthesis also has such a triangle, with similarly frustrating trade-
offs: either high quality or full flexibility, but not both. In this talk I begin by drawing the
corresponding parallels, explaining where the three major approaches fit in, and their
historical development. These three are unit-selection, spectrogram-based, and
articulatory synthesis.
By directly employing segments of recorded speech, unit-selection synthesis can achieve
excellent voice quality, but at the expense of flexibility. A universal synthesizer, ideally,
can mimic any person in any language, in a full range of styles. Achieving this, though,
demands precise modeling of the human vocal tract and articulators -- as yet an unsolved
problem. In between, spectrogram-based synthesizers offer good controlability, but do
not sound as natural as unit-selection techniques.
Two paths can thus be taken on the road to high quality universal synthesis. One can start
with a flexible synthesizer and attempt to make it sound better. Or one can start with a
good sounding synthesizer and try to make it more flexible. This talk will follow the
second path.
To illustrate, we tackle the problem of "accent transformation" -- changing the accent of
one person to sound more like that of another. This is made possible using CMU's
recently created "Arctic Speech Databases," a parallel corpus of carefully spoken English
sentences. Editions exist for American, Canadian, Scottish, Indian, and Japanese accented
English. Grafting a new accent onto an existing voice is desirable for localizing a
synthesizer to match that of a target region. Or, moving in the opposite direction, by
making a native voice sound foreign, hence "exotic".
2004 Student Research Symposium
13
Nikesh Garera [email protected]
Towards a Personal Briefing Assistant
The preparation of summary reports from raw information is a common task in research
projects. A tool that highlights useful items for a summary would allow report writers to
be more productive, by reducing the time needed to assess individual items. It has further
potential benefit in that it can be used to create user-specific or audience-specific digests.
In the latter case, multiple tailored reports could in principle be generated from the same
input information. With this motivation, we present a design of an adaptive system that
learns to extract important items from weekly interviews by observing the behavior of
human summary authors.
Our application scenario involves a report writer producing digests on a week-to-week
basis and our goal is to make this person more efficient over time. We propose to do this
by presenting the writer with successively better ordered lists of items (such that digest-
worthy items appear at the top of the ordered list).
We identified salient features used for learning in this new domain by studying the corpus
of project interviews. This corpus consisted of weekly progress interviews of project
members collected over a period of 4 months. The features were then annotated in the
corpus and were used as parameters in a regression model. This model is incrementally
trained from user input and is used to reorder items in successive weeks. We measure the
user effort in terms of how far down the user has to go in the list in order to select all
important items in a weekly set.
In our evaluation study, 7 expert subjects (project members, managers) were asked to
create 5-item summaries for 12 successive weeks, using a selection interface. The results
with the assistance of our system show an improvement in average precision by a factor
of more than 2.21 by the end of the learning period as compared to the baseline of no
learning. Other evaluation metrics also show significant improvement. A low inter-rater
agreement (Kappa=0.26) indicates that the subjects are selecting different items and the
learned models are individual. Moreover, the different feature weights in the regression
models for each subject identify their summarization differences. We also report our
ongoing work of automatic feature extraction to make this approach domain independent.
The talk will include a short demonstration of our system showing how the learned
models can be used to populate a template for a standard quarterly report.
Language Technologies Institute
14
Luo Si [email protected]
Federated Search in Uncooperative Environments
Conventional search engines such as Google or AltaVista are effective when an
information source allows its contents to be crawled and indexed in a centralized
database. However, a large amount of information cannot be crawled and searched by
conventional search engines either due to intellectual property protection or frequent
information update. This type of information is valuable. For example, hidden Web
contents that can not be searched by conventional search engines have been estimated to
be 2-50 times larger than the visible Web and are often created and maintained by
professionals.
Federated search provides the solution of the search problem for the information that
cannot be searched by conventional search engines. It includes three sub-problems: i)
acquiring information about the contents of each information source (resource
representation), ii) ranking the sources and selecting a small number of them for a given
query (resource ranking), and iii) merging the results returned from the selected sources
into a single ranked list (result-merging).
This work addresses federated search problems in uncooperative environments such as
the Web where information sources can not be assumed to share their contents or use the
same type of search engine. Empirically effective solutions have been proposed to the full
range of federated search sub-problems such as new algorithms for information source
estimation, resource selection and results merging.
Furthermore, a unified utility maximization framework is proposed to combine the
separate solutions together to construct effective systems of different federated search
applications. This is the first probabilistic framework for integrating the different
components of a federated search system. The more unified view of federated search task
provides a new opportunity to utilize available information. It enables us to configure
individual components globally to get desired overall results of different applications,
which is superior to the simple choice of combining individual effective solutions
together in previous research.
This work advances the state-of-the-art of federated search. The more theoretical
foundation, the better empirically results and the better modeling of real world
applications make the new research a bridge to turn federated search from a cool research
topic to a much more practical tool.
2004 Student Research Symposium
15
1. Si, L. & Callan, J. (2002a). Using sampled data and regression to merge search engine results.
In Proceedings of the 25th
Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval.
2. Si, L. & Callan., J. (2003a). Relevant document distribution estimation method for resource
selection. In Proceedings of the 26th
Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval.
3. Si, L. & Callan, J. (2003b). A Semi-Supervised learning method to merge search engine
results. ACM Transactions on Information Systems, 21(4).
4. Si, L. & Callan, J. (2004). The effect of database size distribution on resource selection
algorithms. In Distributed Multimedia Information Retrieval, LNCS 2924, Springer.
5. Si, L. & Callan, J. (2004). Unified Utility Maximization for Distributed Information Retrieval
in Uncooperative Environments. In Proceedings of the 13th
International Conference on
Information and Knowledge Management, ACM.
Language Technologies Institute
16
Satanjeev Banerjee [email protected]
Automatically Detecting the Structure of Human Meetings
We are interested in automatically extracting the structure of meetings between humans.
Such structure includes the state of a meeting (presentation, discussion, etc), the roles of
each meeting participant (presenter, discussion participator, observer, etc), the
onset/offset boundaries of agenda items, and the onset/offset boundaries of regions of
decisions (such as action items). In this talk we will talk about our current research into
detecting these various aspects of human meetings.
In particular, we will present a simple taxonomy of meeting states and participant roles.
We trained a decision tree classifier that learns to detect these states and roles from
simple speech-based features such as the number of speakers and the lengths of
utterances and speech-overlaps. This classifier detects meeting states 18% absolute more
accurately than a random classifier, and detects participant roles 10% absolute more
accurately than a majority classifier. We will then report on the effect of adding more
advanced features such as the words in the utterances as output by an automatic speech
recognizer, as well as features drawn from other modalities such as the body positions
and face directions of the various participants relative to each other as output by a
camera-image processor.
Finally we will present initial research on agenda item and decision region boundary
detection. Unlike meeting state and participant role detection, the problem of detecting
agenda items and decision regions does not easily lend itself to a typical machine learning
approach, since there are no clear pre-defined classes. However, preliminary observations
of recorded meeting data suggest that different agenda items usually differ highly in both
the pattern of words used in discussing them, as well as in the identities of the
participants involved in the discussions thereof. We will report on our ongoing research
where we draw upon ideas from the realm of topic tracking and leverage the above
characteristics to perform agenda item/decision region detection.