faculty: dr. chengcui zhang students: wei-bang chen song gao richa tiwari

20
Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

Upload: magnus-campbell

Post on 26-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

Faculty: Dr. Chengcui ZhangStudents: Wei-Bang Chen

Song Gao Richa Tiwari

Page 2: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

Past projects

• Image Spam Clustering Project– Cluster image spam through common visual

features present in image attachments– Reveal common origins of image spam

Page 3: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

examples

3

These two spam images exemplify illustrations with similar color composition but different layouts.

This example demonstrates illustrations in spam with similar layouts but different color composition.

Page 4: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Ongoing projects:– Phishing website clustering by text and visual

similarity

Page 5: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

Nat West Helpful BonkingAccessibility I HelpGot a question? We can help…

Nat West Helpful Bonking Help 24x7can’t I log in?Accessibility I Help…

RBSThQ Roy& Bank cq3codandMake it happen…

Text Recognized by OCR

Page 6: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

A Sample Cluster for PayPal

Page 7: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

4 Clusters Relate to PayPalCluster ID: 15 (76 Images) Cluster ID: 28 (20 Images) Cluster ID: 49 (13 Images) Cluster ID: 57 (22 Images)

Page 8: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

Dataset Statistics• 8 Days (7-10,17-19 & 22 Feb., 2011)• Total number of phishing website screen-shot images:

1461• Total number of produced clusters (cutoff similarity value = 60%):

156 + 1(ungrouped)

2 3 4 5 6 7 8 9 10 11 13 15 17 18 20 21 22 28 29 32 34 38 42 76 1160

5

10

15

20

25

30

35

40

45

50

Cluster Size (Number of Images)

Coun

t (N

umbe

r of C

lust

ers)

Page 9: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Observations: high cluster purity• Hard to measure completeness• Next step:– Incorporate visual features such as visual layout – Brand

Page 10: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Ongoing projects: – Uncovering auction fraud from eBay transaction

graph - Initial study

Page 11: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Data set: eBay transaction feedbacks– A total of 220,000 (two-hundred and twenty

thousand) users are crawled.• Idea of belief propagation: – Fraudsters create two types of identities - fraud and accomplice, where fraud identities are the ones used eventually to carry out the actual fraud, and the accomplice identities are the ones used to help build the reputation for the fraud identities. This pattern forms a near bipartite core in the transaction graph.

Page 12: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Algorithm:– Each vertex in the transaction graph is labeled by

one of {fraud, accomplice, honest} based on their pattern of interaction with other vertexes.

– Belief propagation (BP) is used to optimize the labeling across the entire graph by maximizing the joint probabilities of all the vertexes.

– Honest user model: Barabasi-Albert model

Page 13: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari
Page 14: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Evaluation results on the sparse eBay transaction dataset– 20% accomplice– 50% fraud???

• What can be improved:– Network too sparse (average degree is ~5, ideally

>=10)– Initial probabilities (1/3, 1/3, 1/3) may not make

sense.– BP seems not to scale well with large graphs.

Page 15: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Projects under plan:– Modeling online user navigation patterns and

detecting anomalies using click stream data

Page 16: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Idea #1: Each user session is represented by an n-dimensional feature vector, where n is the number of Web pages in the session.– The value of each feature is a weight, indicating

the degree of interest of the user in the particular Web page.

– Based on these vectors, clusters of similar sessions are produced and characterized by the Web pages with the highest associated weights.

Page 17: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Idea #2: Markov Model– Pages (or page categories) as states

• Or page+parameters as nodes

– Transition probabilities between nodes• Idea #3: Graph partitioning– Pages as nodes– Edges as connectivity/weight between a pair of pages

• Co-occurrence, time difference, etc.

– Graph partitioning to find groups of strongly correlated pages

Page 18: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Projects under plan:– Novel biometrics

Page 19: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Palm print photo

Page 20: Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

• Touch panel: handdrawing