knowledge discovery from texts xwcmxwlc%lxw%l on ...m. roche – keynote speaker – misc’2016,...
TRANSCRIPT
![Page 1: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/1.jpg)
1 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Mathieu Roche
Cirad – TETIS and LIRMM – Montpellier, France
Web: http://www.textmining.biz Email: [email protected]
XWCMXWLC%LXW%L Knowledge Discovery from Texts
on Agriculture Domain
![Page 2: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/2.jpg)
2 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
XWCMXWLC%LXW%L Knowledge Discovery from Texts
on Agriculture Domain
![Page 3: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/3.jpg)
3 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Outline
Part 1 Data Science and Big Data
Part 2 Heterogeneity and textual data
Part 3 Applications in agriculture domain
Part 4 Conclusions and future work
![Page 4: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/4.jpg)
4 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 1 Data Science and Big Data
![Page 5: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/5.jpg)
5 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Big Data
Volume
Velocity
Variety
Variability, Véracity, Value,
Visualisation, Valorization
3V of Big Data
![Page 6: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/6.jpg)
6 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 1
![Page 7: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/7.jpg)
7 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 2 Heterogeneity and textual data
![Page 8: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/8.jpg)
8 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Introduction
Logs
0/1 x B_1404 [WARNING]: "Asynchronous reset/set/load <%item> exists in module/unit"0/1 x B_1405 [WARNING]: "<%value> asynchronous resets in this unit detected"0/1 x B_1406 [WARNING]: "<%value> synchronous resets in this unit detected"0/1 x B_1407 [ERROR]: "Do not use active high asynchronous reset/set/load"
// Total Module Instance Coverage Summary
TOTAL COVERED PERCENTlines 501 158 31.54 statements 501 158 31.54
Policy: DESIGN Ruleset: RESETS<violated>/<checked> x <label> [<severity>]:<message>---------- ------------------- ----------------------0/1 x NTL_RST04 [ERROR]: "A reset signal is not allowed to be used as
Et il meismes en descovri son corage a Lancelot et dist
que, a l'ore que la guerre commença, baoit il a tot le
monde conquerre: et bien i parut, kar il fu a vint et cinc
ans chevaliers et puis conquist il .XXVIII. roialmes [72d]
et a trente noef ans fu la fin de son aage. Mais de totes
ces choses le traist Lancelos ariere et il li mostra bien,
la ou il fist de sa grant honor sa grant honte, quant il
estoit au desus le roi Artu et il li ala merci crier...
![Page 9: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/9.jpg)
9 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Textual data and satellite images [Roche et al. SI'2014]
Heterogeneity of types of documents
![Page 10: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/10.jpg)
10 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• Data and Issue
• Hard Disc (157 188 files)
![Page 11: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/11.jpg)
11 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• Method: Extraction of features [Roche et al. CA'2015]
3 types of features: - thematic features - spatial entities - temporal entities
![Page 12: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/12.jpg)
12 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• (a) Extraction of features: thematic terms [Lossio Ventura et al. ISWC'2014]
• Système de culture • Production • Développement durable • Eau …
• Système de culture • Développement durable • Ressources naturelles • Mise en œuvre …
![Page 13: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/13.jpg)
13 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• (a) Extraction of features: spatial features (SF)
Model
• Global Model: SF is composed of at least one Named Entity (NE) and one variable number of spatial indicators specifying its location. SF can then be identified in two ways:
• Absolute spatial feature (A_SF) one NE with a geo-localization, such as <(spatialIndicator)*, NE of Location> (ex: the city of Constantine).
• Relative spatial feature (R_SF) one spatial with at least one SF (ex: in the south of the city of Constantine).
An R_SF is defined as <(spatial relation)1..*, A_SF> or <(spatial relation) 1..*, R_SF>
Five spatial relation types are considered: orientation, distance, adjacency, inclusion, and geometric which defines union or intersection linking two SFs.
![Page 14: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/14.jpg)
14 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• (a) Extraction of features: spatial features (SF)
Methods [Kergosien et al., IJGIS'2014]
- Symbolic approach: Using rules (Text2Geo) for extracting A_SF and R_SF
- Statistic approach: Using context and IR methods for spatial features disambiguation
![Page 15: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/15.jpg)
15 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• (b) Representation of documents
![Page 16: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/16.jpg)
16 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• (c) Similarity
Global_Sim(vect1, vect2) = α.cosT(vect1, vect2) + (1-α).cosS(vect1, vect2)
with α ∈ [0,1]
cosT: cosine based on thematic features (BioTex) cosS: cosine based on spatial features
Perspective: adding temporal information
![Page 17: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/17.jpg)
17 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• Extension: How to analyse document with more precision?
Example: Disambiguation between location and organisation [Tahrat et al. WIMS'2013]
![Page 18: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/18.jpg)
18 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• Disambiguation between location and organisation
![Page 19: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/19.jpg)
19 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to match documents?
• Disambiguation between location and organisation
![Page 20: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/20.jpg)
20 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 3 Applications in agricultural domain
Animal disease surveillance
In collaboration with CMAEE lab (Control of exotic and emerging animal diseases)
![Page 21: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/21.jpg)
21 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Why the need of Epidemic Intelligence? (1/3)
More than 60% of the initial outbreak reports come from unofficial informal and heterogeneous sources, including sources other than the electronic media, which require verification [Arsevska et al. ISVEE'2015]
![Page 22: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/22.jpg)
22 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Identify signals of new and exotic animal diseases
Verify and analyse Be aware and take precaution measures
Why the need of Epidemic Intelligence? (3/3)
![Page 23: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/23.jpg)
23 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
How to detect disease outbreak on the Web?
• Four animal disease models: African swine fever (ASF), Foot-and-mouth disease (FMD), Bluetongue (BTV), and Schmallenberg virus (SBV)
• First model to study: ASF
![Page 24: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/24.jpg)
24 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology
![Page 25: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/25.jpg)
25 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 1
• Step 1: Data acquisition
![Page 26: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/26.jpg)
26 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 2
• Step 2: Data classification
![Page 27: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/27.jpg)
27 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• Step 3: Information extraction and management
![Page 28: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/28.jpg)
28 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• Step 3: Information extraction (I)
Aim: Automatically detecting key information from Web news articles (country, species, diseases, number of cases, dates, …)
“Since its initial appearance in Poland in February 2014, 72 cases of African Swine Fever have been detected in wild boars and there have been three outbreaks in pigs.” - http://www.thenews.pl
• Use dictionaries (Geonames, HeidelTime, disease names, species names, etc.), and data mining techniques in order to learn extraction rules.
Rules assiociated with case numbers: (number)(species_name,1-3) with support 26% and confidence 83% (number)(species_name,1-2) with support 21% and confidence 100%
![Page 29: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/29.jpg)
29 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• Step 3: Information extraction (I)
- First results for the rule-based approach on the annotated corpus - Classification based on SVM (features are rules) - 3 classes: correct, incorrect, partial - 10-fold cross validation
Type Accuracy (%) Locations 70.6
Dates 71.2 Diseases 93.6
Cases 78.1 Species 89.5
Julien Rabatel, LIRMM, Numev, France
![Page 30: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/30.jpg)
30 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• Step 3: Information extraction (I)
![Page 31: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/31.jpg)
31 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• Step 3: Information management (II)
![Page 32: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/32.jpg)
32 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3 • II. Querying the Web
![Page 33: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/33.jpg)
33 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3 • II. Querying the Web: (a) Terminology extraction
![Page 34: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/34.jpg)
34 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
• II. Querying the Web: (b) Terminology ranking
Methodology - step 3
![Page 35: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/35.jpg)
35 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3 • II. Querying the Web: (b) Terminology ranking
- BioTex Ranking [Lossio Ventura et al. IRJ'2016]:
- A new ranking function to take into account the heterogeneity of the sources (Si) [Arsevska et al. CEA'2016]:
![Page 36: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/36.jpg)
36 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• II. Querying the Web: (c) Terminology validation
Using of a Delphi method [Arsevska et al. LREC'2016].
Delphi method is to reach group consensus with experts (5 to 7 experts for each disease) when knowledge is not sufficient for a given scientific question.
![Page 37: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/37.jpg)
37 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3
• II. Querying the Web: (c) Terminology validation
List of extracted terms identified to characterize Bluetongue virus (BTV) emergence.
In bold are the terms proposed to experts for evaluation
![Page 38: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/38.jpg)
38 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology - step 3 • II. Querying the Web: (d) Association of terms
€
DANDweb =
2 × hit(h AND cs)hit(h) + hit(cs)
[Roche and Prince Informatica’2010 ; Arsevska et al. IJAEIS'2016]
![Page 39: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/39.jpg)
39 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Methodology
![Page 40: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/40.jpg)
40 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 3 Applications in agricultural domain
Sentiment analysis
![Page 41: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/41.jpg)
41 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Sentiment analysis Methods in order to identify sentiments: Towards a sentiment lexicon
Step 1: choice of seeds related to opinions P = {good; nice; excellent; positive; fortunate; correct; superior} N = {bad; nasty; poor ; negative; unfortunate; wrong; inferior}
Construction of 14 corpora related to a specific domain
Step 2: PoS, Association rules, choice of a window
![Page 42: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/42.jpg)
42 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Sentiment analysis Step 3: Statistic selection and web mining.
Statistic measures that consist of measuring the association between seed adjectives and candidate adjectives association based on "hits" from the web (i.e. search engine) and contextual information
Examples of learnt adjectives: great, hilarious, funny, happy, perfect, important, beautiful, amazing, complete, major, helpful
Agriculture domain: {gmo; agricultural biotechnology; biotechnology for agriculture}
Examples of learnt adjectives: green, healthy, enthusiastic, creative, etc.
Laura Vanessa Cruz, San Agustin University, Peru
![Page 43: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/43.jpg)
43 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 3 Applications in agricultural domain
Information Extraction from experimental data
![Page 44: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/44.jpg)
44 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Food science domain
Aim: Knowledge management in food science domain
Challenging issue: Unit recognition and extraction [Berrahou et al. KDIR'2013 ; Berrahou et al. RNTI'2016]
![Page 45: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/45.jpg)
45 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Food science domain Method: - Locating unit (machine learning) - Extracting unit (lexical similarity)
![Page 46: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/46.jpg)
46 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Part 4 Conclusions and future work
![Page 47: Knowledge Discovery from Texts XWCMXWLC%LXW%L on ...M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria 28 Methodology - step 3 • Step 3: Information extraction (I)](https://reader035.vdocuments.site/reader035/viewer/2022070816/5f10514f7e708231d4488351/html5/thumbnails/47.jpg)
47 M. Roche – Keynote Speaker – MISC’2016, Constantine, Algeria
Conclusions
New challenges of Big Data:
- Matching different types of documents (image/text, video/text, and so forth)
- Integration of visual analytics skills [Fadloun, Inforsid'2016]