“is the sky pure today?” awkchecker: an assistive tool for detecting and

1

“Is the Sky Pure Today?”AwkChecker: An Assistive Tool for Detecting and

Correcting Collocation Errors

Taehyun Park, Edward Lank, Pascal Poupart, Michael TerryDavid R. Cheriton School of Computer ScienceUniversity of Waterloo, Waterloo, ON, Canada, N2L 3G1

ACM UIST 2008

2

Motivation

writing aids for non-native speakers

non-native speakers can learn a foreign language's rules for spelling and grammar, but not easy to learn word pairs.

Ex.

take their shoes down vs take their shoes off

more common expression

3

AwkChecker detect collocation errors and suggest alternatives

4

Contributions

Define collocation errors as a function of the relative frequency of phrase usage within a corpus. Presents algorithms for suggesting alternatives based on the specific types of errors made by NNSs.

1. Insertion (I went to home I went home)2. Deletion (I am student I am a student )3. Transposition (he’s talking with his full mouth he’s talking with his mouth full)4. Substitution (pure sky clear sky)

5

Detecting Collocation Errors

acceptability of a phrase e

g(e): frequency of input phraseg(c): frequency of alternative phrasef (e,c): edit distance between e and c

If A(e) is less than a user-customizable threshold, the phrase e is flagged as a collocation error.

6

Evaluation

- User Testing

five non-native speakers had never seen tools such a system before positive reactions employed AwkChecker to check articles and prepositions

pass judgment (to/on) <noun>

7

Automatic Collocation Suggestion in Academic Writing

Jian-Cheng Wu1 Yu-Chia Chang1,* Teruko Mitamura2 Jason S. Chang1 1 National Tsing Hua University

Hsinchu, Taiwan2 Carnegie Mellon University

Pittsburgh, United States

ACM ACL 2010

8

Goals

automate suggestions for verb-noun lexical collocation

Verb-noun collocations are recognized as presenting the most challenge to students (Howarth, 1996; Liu,2002). word choice of verbs in collocations which are considered as the most difficult ones for learners to master (Liu,2002; Chang, 2008).

9

Collocation Inspector

10

Algorithm of ProducingSuggestions

11

Collocation Extraction

Ex. We introduce a novel method for learning to find documents on the web.

We proposed that the web-based model would be more effective than corpus-based one.

Use dependency parser (Stanford Parser)

dobj (introduce-2, method-4)

12

Using a Classifier for the Suggestion task

13

Effective Feature Selection Training algorithm: Maximum Entropy

- Use contextual features(head , ngram)

Ex: We introduce a novel method for learning to find documents on the web.

14

ExampleInput :

There are many investigations about wireless network communication, especially it is important to add Internet transfer calculation speeds.

Result

15

Experiment Training Corpus: CiteSeer (20,306 abstracts, 95,650 sentences)

790 verbal collocates are identified as tagged classes

Test data: randomly select 600 sentences not overlapping with the training set.

16

The YouTube Video Recommendation System

James Davidson 、 Benjamin Liebald 、 Junning LiuTaylor Van Vleet 、 Palash Nandy

Google Inc

ACM RecSys 2010

17

Personalized recommendations user’s previous activity on the site

18

Goals Help users find high quality videos relevant to their interests.

Recommendations should be updated regularly and reflect a user’s recent activity on the site.

Maintain user privacy.

19

Challenges

Videos as they are uploaded by users often have no or very poor metadata (title, description).

Videos on YouTube are mostly short form (under 10 minutes in length)

Many of the interesting videos on YouTube have a short life cycle.

20

System Designseed

1

2

Videos rank using relevance anddiversity.

user

…

Top-N candidates

21

Input Data(seed)

1. videos that were watched (potentially beyond a certain threshold)

2. videos that were explicitly favorited, “liked”, rated or added to playlists

22

Related Videos(candidates) relatedness score

: total occurrence counts across all sessions for videos vi and vj

: global popularity for videos vi and vj

Threshold :overall view count

Top-N candidates of vi

23

Generating Recommendation Candidates

Candidate set:

S: seed setR: related videosn: distance of n from any video in the seed set

24

Ranking

video quality (view count ,commenting, sharing activity…) user specificity (consider properties of the seed video) diversification (videos that are too similar to each other are removed)

25

Evaluation

26

Text Cohesion Visualizer

Chakarida Nukoolkit, Praewphan Chansripiboon Pornchai Mongkolnam, Richard Watson Todd*

Computer Science Program, School of Information TechnologySchool of Liberal Arts*

King Mongkut’s University of Technology ThonburiBangkok, 10140 Thailand

IEEE ICCSE 2011

27

Goals design of a prototype system developed to help analyze the lexical coherence of essays

provide visualized output as writing feedback to users

28

System Flowchart

Preprocessing

Matching keywords

Creating bond table

(Stanford Part Of Speech tagger)

29

Matching keywords count the number of matched words (link) between any two sentences

four types of matching: 1. repetition 2. complex repetition 3. paraphrase(synonyms,hypernyms) 4. pronoun

30

Creating bond table

indicating whether or not there is a bond between sentences.

32

six types

33

Conclusion We proposed an application that can detect the cohesion errors in text correctly as experts indicate. The system’s accuracy is at an acceptable level according to expert opinion.

In future work, we first plan to improve the process of matching keywords for more accurate results by augmenting the existing process with more specific linguistic rules.

34

See-To-Retrieve: Efficient Processing of Spatio-VisualKeyword Queries

Chao Zhang 、 Lidan Shou 、 Ke Chen 、 Gang Chen

College of Computer ScienceZhejiang University, China

ACM SIGIR 2012

35

Spatio-Visual Keyword

searches for introductory information about a distant grand church within her eyesight.

36

Goals

visually conspicuous

semantically relevant

document spacephysical space

WYRIWYS(What-You-Retrieve-Is-

What-You-See)

37

Motivation state-of-the-art spatial retrieval methods are mostly distance-based but overlook the visibility of objects.

Italianfood

38

Visibility Analysis

System Flowchart

Ranking Mechanism

40

Experiment

Data set: 1.street objects in Los Angeles (contains 131,461 MBRs)

2.Gowalla (consists of 28,867 Web documents)

41

柏安

亞婷家愷冠中 ???

“is the sky pure today?” awkchecker: an assistive tool for detecting and

Documents

collocation inspector

collocation extractionex

nonnative speakers

sky pure

novel method

experiment training

webbased model

training set