“is the sky pure today?” awkchecker: an assistive tool for detecting and
DESCRIPTION
“Is the Sky Pure Today?” AwkChecker: An Assistive Tool for Detecting and Correcting Collocation Errors. Taehyun Park, Edward Lank, Pascal Poupart, Michael Terry David R. Cheriton School of Computer Science University of Waterloo, Waterloo, ON, Canada, N2L 3G1. ACM UIST 2008. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
1
“Is the Sky Pure Today?”AwkChecker: An Assistive Tool for Detecting and
Correcting Collocation Errors
Taehyun Park, Edward Lank, Pascal Poupart, Michael TerryDavid R. Cheriton School of Computer ScienceUniversity of Waterloo, Waterloo, ON, Canada, N2L 3G1
ACM UIST 2008
2
Motivation
writing aids for non-native speakers
non-native speakers can learn a foreign language's rules for spelling and grammar, but not easy to learn word pairs.
Ex.
take their shoes down vs take their shoes off
more common expression
3
AwkChecker detect collocation errors and suggest alternatives
4
Contributions
Define collocation errors as a function of the relative frequency of phrase usage within a corpus. Presents algorithms for suggesting alternatives based on the specific types of errors made by NNSs.
1. Insertion (I went to home I went home)2. Deletion (I am student I am a student )3. Transposition (he’s talking with his full mouth he’s talking with his mouth full)4. Substitution (pure sky clear sky)
5
Detecting Collocation Errors
acceptability of a phrase e
g(e): frequency of input phraseg(c): frequency of alternative phrasef (e,c): edit distance between e and c
If A(e) is less than a user-customizable threshold, the phrase e is flagged as a collocation error.
6
Evaluation
- User Testing
five non-native speakers had never seen tools such a system before positive reactions employed AwkChecker to check articles and prepositions
pass judgment (to/on) <noun>
7
Automatic Collocation Suggestion in Academic Writing
Jian-Cheng Wu1 Yu-Chia Chang1,* Teruko Mitamura2 Jason S. Chang1 1 National Tsing Hua University
Hsinchu, Taiwan2 Carnegie Mellon University
Pittsburgh, United States
ACM ACL 2010
8
Goals
automate suggestions for verb-noun lexical collocation
Verb-noun collocations are recognized as presenting the most challenge to students (Howarth, 1996; Liu,2002). word choice of verbs in collocations which are considered as the most difficult ones for learners to master (Liu,2002; Chang, 2008).
9
Collocation Inspector
10
Algorithm of ProducingSuggestions
11
Collocation Extraction
Ex. We introduce a novel method for learning to find documents on the web.
We proposed that the web-based model would be more effective than corpus-based one.
Use dependency parser (Stanford Parser)
dobj (introduce-2, method-4)
12
Using a Classifier for the Suggestion task
13
Effective Feature Selection Training algorithm: Maximum Entropy
- Use contextual features(head , ngram)
Ex: We introduce a novel method for learning to find documents on the web.
14
ExampleInput :
There are many investigations about wireless network communication, especially it is important to add Internet transfer calculation speeds.
Result
15
Experiment Training Corpus: CiteSeer (20,306 abstracts, 95,650 sentences)
790 verbal collocates are identified as tagged classes
Test data: randomly select 600 sentences not overlapping with the training set.
16
The YouTube Video Recommendation System
James Davidson 、 Benjamin Liebald 、 Junning LiuTaylor Van Vleet 、 Palash Nandy
Google Inc
ACM RecSys 2010
17
Personalized recommendations user’s previous activity on the site
18
Goals Help users find high quality videos relevant to their interests.
Recommendations should be updated regularly and reflect a user’s recent activity on the site.
Maintain user privacy.
19
Challenges
Videos as they are uploaded by users often have no or very poor metadata (title, description).
Videos on YouTube are mostly short form (under 10 minutes in length)
Many of the interesting videos on YouTube have a short life cycle.
20
System Designseed
1
2
Videos rank using relevance anddiversity.
user
…
Top-N candidates
21
Input Data(seed)
1. videos that were watched (potentially beyond a certain threshold)
2. videos that were explicitly favorited, “liked”, rated or added to playlists
22
Related Videos(candidates) relatedness score
: total occurrence counts across all sessions for videos vi and vj
: global popularity for videos vi and vj
Threshold :overall view count
Top-N candidates of vi
23
Generating Recommendation Candidates
Candidate set:
S: seed setR: related videosn: distance of n from any video in the seed set
24
Ranking
video quality (view count ,commenting, sharing activity…) user specificity (consider properties of the seed video) diversification (videos that are too similar to each other are removed)
25
Evaluation
26
Text Cohesion Visualizer
Chakarida Nukoolkit, Praewphan Chansripiboon Pornchai Mongkolnam, Richard Watson Todd*
Computer Science Program, School of Information TechnologySchool of Liberal Arts*
King Mongkut’s University of Technology ThonburiBangkok, 10140 Thailand
IEEE ICCSE 2011
27
Goals design of a prototype system developed to help analyze the lexical coherence of essays
provide visualized output as writing feedback to users
28
System Flowchart
Preprocessing
Matching keywords
Creating bond table
(Stanford Part Of Speech tagger)
29
Matching keywords count the number of matched words (link) between any two sentences
four types of matching: 1. repetition 2. complex repetition 3. paraphrase(synonyms,hypernyms) 4. pronoun
30
Creating bond table
indicating whether or not there is a bond between sentences.
31
32
six types
33
Conclusion We proposed an application that can detect the cohesion errors in text correctly as experts indicate. The system’s accuracy is at an acceptable level according to expert opinion.
In future work, we first plan to improve the process of matching keywords for more accurate results by augmenting the existing process with more specific linguistic rules.
34
See-To-Retrieve: Efficient Processing of Spatio-VisualKeyword Queries
Chao Zhang 、 Lidan Shou 、 Ke Chen 、 Gang Chen
College of Computer ScienceZhejiang University, China
ACM SIGIR 2012
35
Spatio-Visual Keyword
searches for introductory information about a distant grand church within her eyesight.
36
Goals
visually conspicuous
semantically relevant
document spacephysical space
WYRIWYS(What-You-Retrieve-Is-
What-You-See)
37
Motivation state-of-the-art spatial retrieval methods are mostly distance-based but overlook the visibility of objects.
Italianfood
38
Visibility Analysis
System Flowchart
Ranking Mechanism
39
40
Experiment
Data set: 1.street objects in Los Angeles (contains 131,461 MBRs)
2.Gowalla (consists of 28,867 Web documents)
41
柏安
亞婷 家愷 冠中 ???