conversation level constraints on pedophile detection in chat rooms

21
Conversation Level Constraints on Pedophile Detection in Chat Rooms PAN 2012 — Sexual Predator Identification Claudia Peersman, Frederik Vaassen, Vincent Van Asch and Walter Daelemans

Upload: andra

Post on 25-Feb-2016

27 views

Category:

Documents


1 download

DESCRIPTION

Conversation Level Constraints on Pedophile Detection in Chat Rooms. PAN 2012 — Sexual Predator Identification. Claudia Peersman, Frederik Vaassen , Vincent Van Asch and Walter Daelemans. Overview. Task 1: Sexual Predator Identification Preprocessing Experimental Setup and Results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

Conversation Level Constraints on Pedophile Detection in Chat RoomsPAN 2012 — Sexual Predator Identification

Claudia Peersman, Frederik Vaassen, Vincent Van Asch and Walter Daelemans

Page 2: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

2

Overview• Task 1: Sexual Predator Identification• Preprocessing• Experimental Setup and Results• Test Run Results

• Task 2: Identifying Grooming Posts• Grooming Dictionary• Test Run Results

• Discussion

Page 3: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

3

Task 1: Preprocessing of the Data• Data: PAN 2012 competition training set• predator vs. non-predator• info on the conversation, user and post level

• Two splits: training and validation set

• No user was present in more than one cluster

prevent overfitting of user-specific features

Page 4: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

4

Experimental Setup

• Features: token unigrams

• LiBSVM

• Probability output

• Parameter optimization

• Experiments on 3 levels

• data resampling

Page 5: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

5

Level 1: the Post Classifier

• Resample the number of posts

Equal distribution of posts per class

• About 40,000 posts per class in training

• No resampling in the validation sets

Page 6: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

6

Level 1: the Post Classifier (2)• Only output on the post level

• Aggregate the post level predictions to the user level:• LiBSVM’s probability outputs• Predators = average of the 10 highest

predator class probabilities ≥ 0.85

Page 7: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

7

Results for the Predator Class

Scores Post ClassifierRecall 0.93Precision 0.36F-score 0.52

Page 8: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

8

Level 2: the User Classifier• Resampling on the user level

exclude users with no suspicious posts

• Filter: dictionary of grooming vocabulary see Task 2

• Why? • reduce the amount of data • “hard” classification higher precision?

Page 9: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

9

Update Results (1)

Scores Post Classifier User ClassifierRecall 0.93 0.82Precision 0.36 0.88F-score 0.52 0.84

Combine systems?

Data reduction: up to 48.4%

Page 10: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

10

Combining the systems• Weighted voting using LiBSVM’s

probability outputs

• 70% of the weight on the high precision User Classifier

Page 11: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

11

Update Results (2)

Scores Post Classifier

User Classifier

Combined Results

Recall 0.93 0.82 0.85Precision 0.36 0.88 0.84F-score 0.52 0.84 0.84

Page 12: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

12

Level 3: Conversation Level Constraints

• Both users in a conversation labeled as predators

• Our approach: • go back to predator probability output • use the high precision user classifier• Predator probability ≥ 0.75

Page 13: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

13

System Overview

Combined Prediction

Apply Conversatio

n Level Constraints

Final Predator

ID List

Post Classifier

User Classifier

Page 14: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

14

Update Results (3)

Scores Post Classifier

User Classifier

Combined Results

Combined +

ConstraintsRecall 0.93 0.82 0.85 0.85Precision 0.36 0.88 0.84 0.94F-score 0.52 0.84 0.84 0.89

Page 15: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

15

Results on the PAN 2012 Test Set

Scores Combined + Constraints PAN Test Set

Recall 0.85 0.60Precision 0.94 0.89F-score (β = 1) 0.89 0.72

• Future research: • more splits• investigate ensembles

Page 16: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

16

Task 2: Identifying Grooming Posts • From the final predator ID list detect posts

expressing typical grooming behavior

• No gold standard labels What is grooming?

• Predator conversations have predictable stages (e.g. Lanning, 2010; McGhee et al., 2011)

Page 17: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

17

Task 2: Identifying Grooming Posts (2)

• Dictionary containing references to 6 stages:• sexual topic• reframing• approach • data requests• isolation from adult supervision• age (difference)

Page 18: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

18

Task 2: Identifying Grooming Posts (3)

• Resources:• McGhee et al. (2011)• English Urban Dictionary website

http://www.urbandictionary.com/• English Synonyms

http://www.synonym.net/

• cf. user classifier filter

Page 19: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

19

Results on the PAN 2012 Test Set

• Precision = 0.36

• Recall = 0.26

• F-score (β = 1) = 0.30

Page 20: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

20

Discussion

• Use of β-factors to calculate the F-score:• Task 1: focus on precision (β = 0.5)• Task 2: focus on recall (β = 3.0)

• However, in practice:• find all predators (recall in Task 1)• find the most striking posts (precision in

Task 2)

Page 21: Conversation Level Constraints on  Pedophile  Detection in Chat Rooms

Questions?

Contact: [email protected]