ow2con'14 - weblab in the land of big data

14
© 2014 Airbus Defence and Space – All rights reserved. The reproduction, distribution and utilization of this document as well as the communication of its contents to others without express authorization is prohibited. Offenders will be held liable for the payment of damages. All rights reserved in the event of the grant of a patent, utility model or design. WebLab in the land of Big Data November 6 th , 2014 CIS/TCOIC4 – Advanced Information Processing Point of contact: Stéphan BRUNESSAUX, Phone: +33 2 32 63 40 55

Upload: ow2-consortium

Post on 25-Jun-2015

265 views

Category:

Technology


0 download

DESCRIPTION

Big data and mass analytical processing have gained a prime position over the past years in business and have touched almost all domains of society. According to the famous Gartner “3Vs” definition, big data deals with Volume, Velocity and Variety. It enables organizations to exploit information that was previously ignored because there was no reasonable way to process it. One talks about the big data revolution but underpinning this revolution is the capacity to process unstructured data such as written documents, images, audio and videos which make up more than 80 percent of available data today. The WebLab platform aims at building systems providing intelligence solutions from the processing of open source data. Such systems face three major challenges: • Managing large amounts of unstructured data, • Managing the data diversity with the appropriate combination of processing tools , • Allowing the most effective tools to be selected for each processing step. Like Big Data, WebLab addresses intelligence needs and deals with Volume, Variety and Velocity. Can we therefore say that Big Data and WebLab face the same challenges? In fact, it can be argued that Big Data and WebLab are complementary. How can we benefit, then, from the best of both worlds?

TRANSCRIPT

Page 1: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

WebLab in the land of Big Data

November 6th, 2014

CIS/TCOIC4 – Advanced Information Processing Point of contact: Stéphan BRUNESSAUX, Phone: +33 2 32 63 40 55

Page 2: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

What is WebLab ?

An integration platform •  based on recognised standards (SOA, Web

Services, Semantic Web)

•  allowing the integration of a selection of software components (search engine, information extraction, translation, knowledge management, graphical representation using maps/networks, etc.)

•  allowing the interoperation of the selected components.

A set of media mining services for document processing to be reused in all the WebLab projects.

WebLab in the land of Big Data

Page 3: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

What is WebLab? A range of projects in the fields of media-mining, NLP & KM

1.0

1.1

1.2 2.0

Internal Studies

WebLab in the land of Big Data

Page 4: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

WebLab Scope

6 November 2014

WebLab in the land of Big Data

From unstructured data...

… to structured and actionable knowledge

Page 5: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

WebLab Scope

6 November 2014

WebLab in the land of Big Data

Page 6: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

WebLab Layers

•  Tools for chain edition & construction

•  Tools for application deployment

•  Tools for services development

Resources and Means for WebLab technology implementation

-  Tutorials & Guides

-  Services and portlets directory

- Tools for chain edition & construction

- Tools for application deployment

- Tools for services & portlets development

- Tools for system tests and performance assessment

-  etc.

Studio

6 November 2014 WebLab in the land of Big Data

Page 7: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

WebLab Offer

6 November 2014

WebLab in the land of Big Data

Page 8: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

Segmentationvidéo

Transcriptionaudio

Vidéo Audio Audio vocal

Texte

Epurationaudio

TraductionExtraction d’information

Texte annotéProvocation de l’Iran ou volonté de réunir autour d’elle les pays arabes de plus en plus désireux de développer l’énergie nucléaire? Réaction probable en tout cas aux propos récents d’Olmert laissant sous entendre que l’Etat juif possédait l’arme nucléaire. Le président iranien Mahmoud Ahmadine jad a en effet offert samedi de partager la technologie nucléaire iranienne avec les pays arables du Golfe, lors d’une rencontre avec un haut responsable koweitien. Pour rappel, le Maroc, l’Algérie, l’Egypte s’ont pas caché leur ambition dans le domaine.

Provocation de l’Iran ou volonté de réunir autour d’elle les pays arabes de plus en plus désireux de développer l’énergie nucléaire? Réaction probable en tout cas aux propos récents d’Olmert laissant sous entendre que l’Etat juif possédait l’arme nucléaire. Le président iranien Mahmoud Ahmadine jad a en effet offert samedi de partager la technologie nucléaire iranienne avec les pays arables du Golfe, lors d’une rencontre avec un haut responsable koweitien. Pour rappel, le Maroc, l’Algérie, l’Egypte s’ont pas caché leur ambition dans le domaine.

Texte traduit

Segmentationvidéo

Transcriptionaudio

Vidéo Audio Audio vocal

Texte

Epurationaudio

TraductionExtraction d’information

Texte annotéProvocation de l’Iran ou volonté de réunir autour d’elle les pays arabes de plus en plus désireux de développer l’énergie nucléaire? Réaction probable en tout cas aux propos récents d’Olmert laissant sous entendre que l’Etat juif possédait l’arme nucléaire. Le président iranien Mahmoud Ahmadine jad a en effet offert samedi de partager la technologie nucléaire iranienne avec les pays arables du Golfe, lors d’une rencontre avec un haut responsable koweitien. Pour rappel, le Maroc, l’Algérie, l’Egypte s’ont pas caché leur ambition dans le domaine.

Provocation de l’Iran ou volonté de réunir autour d’elle les pays arabes de plus en plus désireux de développer l’énergie nucléaire? Réaction probable en tout cas aux propos récents d’Olmert laissant sous entendre que l’Etat juif possédait l’arme nucléaire. Le président iranien Mahmoud Ahmadine jad a en effet offert samedi de partager la technologie nucléaire iranienne avec les pays arables du Golfe, lors d’une rencontre avec un haut responsable koweitien. Pour rappel, le Maroc, l’Algérie, l’Egypte s’ont pas caché leur ambition dans le domaine.

Texte traduit

Collect

Video

Translated text

Text

Segmentation vid é o Segmentation Vid eo Epuration

audio Cleaning Audio

Transcription audio Transcription Audio

国際グリーンピース高山チームは富士山の頂上への支援と福島第一 に原子力災害の被害者のための希望のメッセージを配信します。 日本と世界中の何千人もの人々から

収集した、グリーンピースは、 これらのメッセージは、原子力発電に反対する日本の人々を団結に役立つ、 日本当局はそれらに耳を傾けることを奨励することを期待しています。

An international Greenpeace a l p i n e t e a m d e l i v e r s messages of support and hope for the victims of the n u c l e a r d i s a s t e r a t Fukushima Daiichi to the summit of Mt Fuji. Collected from thousands of people in Japan and all over the world, Greenpeace hopes that these messages will help unite the people...

Traduction Translation

Enriched text An international Greenpeace a l p i n e t e a m d e l i v e r s messages of support and hope for the victims of the n u c l e a r d i s a s t e r a t Fukushima Daiichi to the summit of Mt Fuji. Collected from thousands of people in Japan and all over the world, Greenpeace hopes that these messages will help unite the people...

Extraction d ’ information Extraction information

Alert

An Example of a Processing Chain: From TV to annotations

MOSES

6 November 2014

WebLab in the land of Big Data

Page 9: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

General Pattern of a WebLab Processing Chain

6 November 2014

WebLab in the land of Big Data

Collection

Content Storage

Processing

Indexing Knowledge

Management

Analytics

Search Decision & Reporting

Page 10: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

Big Data technologies in a WebLab Processing Chain

6 November 2014

WebLab in the land of Big Data

Collection

Content Storage

Processing

Indexing Knowledge

Management

Analytics

Search Decision & Reporting

Virtuoso

BIG

Page 11: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

The WebLab Triple « V »

6 November 2014

WebLab in the land of Big Data

V V V OLUME

WebLab enables to exploit the vastest repository of unstructured data available for structuring •  1992: 26 Web sites •  1996 : 100 000 web sites in January

about 230 000 in June •  2000: 19 823 296 Web sites •  2011: 312 693 296 Web sites •  2014: 1 000 000 000 Web sites

Page 12: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

The WebLab Triple « V »

6 November 2014

WebLab in the land of Big Data

V V V

OLUME

WebLab collects and processes various data from various sources

q  Web sites q  Social-networking sites q  News q  Video sharing sites q  Open data q  On-line Database q  Books, Articles, Papers, q  Broadcasting (TV, radio) q  Social Media q  Etc.

ARIETY

Natural Language processing technologies allow extraction of structured information (dates, locations, keywords, personal attributes, events, etc.) and data set buried deep in informal documents (lists, tables, form fields, etc.)

Page 13: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

The WebLab « Triple V »

6 November 2014

WebLab in the land of Big Data

V V

V OLUME

WebLab tries to absorb increasing data flow to process it « on the fly ».

ARIETY

ELOCITY

Page 14: OW2con'14 - Weblab in the land of Big Data

© 2

014

Airb

us D

efen

ce a

nd S

pace

– A

ll rig

hts

rese

rved

. The

repr

oduc

tion,

dis

tribu

tion

and

utili

zatio

n of

this

doc

umen

t as

wel

l as

the

com

mun

icat

ion

of it

s co

nten

ts to

oth

ers

with

out e

xpre

ss a

utho

rizat

ion

is p

rohi

bite

d. O

ffend

ers

will

be

held

liab

le fo

r the

pay

men

t of d

amag

es. A

ll rig

hts

rese

rved

in th

e ev

ent o

f the

gra

nt o

f a p

aten

t, ut

ility

mod

el o

r des

ign.

The reproduction, distribution and utilization of this document as well as the communication of its contents to others without express authorization is prohibited. Offenders will be held liable for the payment of damages. All rights reserved in the event of the grant of a patent, utility model or design.

Thanks for your attention!

6 November 2014

WebLab in the land of Big Data