using opinion mining techniques for early crisis detection

19
Adrian Iftene, Alexandru Lucian Gînscă ICCCC 2012, 8-12 May, Băile Felix, Oradea, Romania “Al. I. Cuza”, University of Iasi, Romania Faculty of Computer Science

Upload: faculty-of-computer-science

Post on 04-Jul-2015

403 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Using opinion mining techniques for early crisis detection

Adrian Iftene, Alexandru Lucian Gînscă

ICCCC 2012, 8-12 May, Băile Felix, Oradea, Romania

“Al. I. Cuza”, University of Iasi, Romania

Faculty of Computer Science

Page 2: Using opinion mining techniques for early crisis detection

System overview

Data acquisition

Topic detection

Data processing

Identification of opinions

Results

Visualization

Conclusions

ICCCC 2012, 8-12 May, Băile Felix, Oradea

Page 3: Using opinion mining techniques for early crisis detection

3 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Page 4: Using opinion mining techniques for early crisis detection

4 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Scenario: Street protests in Romania (between 13 and 26 January, 2012)

Crawler component, RSS feeds

Scraping: removed links, photos, menus, special characters

Data locally stored

Page 5: Using opinion mining techniques for early crisis detection

5 ICCCC 2012, 8-12 May, Băile Felix, Oradea

The topic is very important in detecting articles reffering to a crisis situation

Latent Dirichlet Allocation: state of the art topic model

Problems: • The number of topics needs to be specified from start

• The results are lists of representative words for each topic resulting for a need for human intervention in interpreting them

Solution: WordNet based similarity measures • WuPalmer

• Lin

• Resnik (best results)

Page 6: Using opinion mining techniques for early crisis detection

6 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Computing the similarity between 2 sets of words

T1, T2 = two sets of words.

sim(t1, t2) = one of the Wu and Palmer, Resnik or Lin similarity measures.

Page 7: Using opinion mining techniques for early crisis detection

7 ICCCC 2012, 8-12 May, Băile Felix, Oradea

LDA results for our street protests corpus when tracking 3 topics

Page 8: Using opinion mining techniques for early crisis detection

8 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Language specific resources that contain cities (Iasi, Bucuresti, Ploiesti, etc.), regions (Bucovina, Moldova, Transilvania, etc.) (Iftene et al., 2011)

Introducing a more localized approach: new resources and rules for street (Iasi, Bulevardul Independentei, Bucuresti, Calea Victoriei, etc.) and smaller inner city regions identification (Pacurari district, center of Iasi, Arch of Triumph Square)

Example of Rules: to identify streets (Street + entity, Boulevard + entity, etc.), to identify small regions (the area between street A and street B or the area of the building A)

Page 9: Using opinion mining techniques for early crisis detection

9 ICCCC 2012, 8-12 May, Băile Felix, Oradea

538 files with 2,806 entities of "street" and “area” types

The overall quality of NE identification component is around 92% and the quality of NE classification component is around 67%

Problems:

◦ incorrect spelling

◦ anaphora resolution

◦ ambigous situations when from the context we cannot conclude that the NE is a person name or a street name

Page 10: Using opinion mining techniques for early crisis detection

10 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Rule based opinion mining system (Gînscă et al., 2011)

Easily adaptible from a crisis scenario to another – in opposition with a statistical approach

Use of manually built resources to identify opinion keywords (good, bad etc.), amplifiers (most, more etc.), diminishers (less, etc.), negation (not, never etc.)

Calculate the valences for groups of feelings and pairing named entities with scores based on the distance, punctuation and context

Use a dedicated vocabulary for a specific crisis situation with 21 initial words (protest, conflict, fight, etc.) + similar words from WordNet (synonyms, hypernyms, etc.)

Page 11: Using opinion mining techniques for early crisis detection

11 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Greedy approach – adding iteratively intermediate green points to the current path until solution cannot be improved

Advantages – we reduce the search space for optimal routes and the Greedy solution is obtained very fast

Disavantages – the Greedy solution is closed to the optimal solution

Page 12: Using opinion mining techniques for early crisis detection

12 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Cumulated sentiment values by days

-40

-30

-20

-10

0

10

20

30

13 14 15 16 17 18 19 20 21 22 23 25

Page 13: Using opinion mining techniques for early crisis detection

13 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Location type entities mentions by day

0

50

100

150

200

250

13 14 15 16 17 18 19 20 21 22 23 25

Page 14: Using opinion mining techniques for early crisis detection

14 ICCCC 2012, 8-12 May, Băile Felix, Oradea

GoogleMaps API

Our algorithm is able to find another path (longer) which passes near the red islands and prefers the ways near the green islands

Thus, at every step is possible to insert penalties when the partial solution crosses red islands (with potential risks) and add bonuses when the partial solution crosses green islands (without potential risk)

Page 15: Using opinion mining techniques for early crisis detection

15 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Page 16: Using opinion mining techniques for early crisis detection

16 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Page 17: Using opinion mining techniques for early crisis detection

When we haven’t green islands we must specify another method to select intermediate points in order to improve the quality of current solution

If in the cases of streets and boulevards the GoogleMaps API is able to put these entities on the map, for specific squares and areas it is not able to do this. In such cases we built an additional resource which specifies the GIS coordinates for them

17 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Page 18: Using opinion mining techniques for early crisis detection

We present a system that can be easily adapted from a crisis situation to another (changing the dictionaries, changing the interest topics)

Efficient topic identification using LDA

Suggestive visualization using GoogleAPI

18 ICCCC 2012, 8-12 May, Băile Felix, Oradea

Page 19: Using opinion mining techniques for early crisis detection

19 ICCCC 2012, 8-12 May, Băile Felix, Oradea