Transcript
Page 1: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Hanmin JungHead of Dept. of Computer Intelligence Research

KISTI

Big Data Curation & Its Application

Page 2: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Let Me Introduce Myself :-)

2

Page 3: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

� Recent Social Activities (2012~)� 보건복지부빅데이터기술전문가패널� 한국연구재단융합연구우수성평가위원회위원� 국립전파연구원방송통신표준전문위원� 기술표준원빅데이터표준화프레임워크표준연구회위원� ITRC 과제기획위원회빅데이터분과, UI/UX 분과기획위원/기술간사� 교육과학기술부과학기술빅데이터포럼전문가위원회간사� 국가경쟁력강화위원회 ‘범부처 Giga KOREA 사업’전문가위원회위원� 국가과학기술위원회기술영향평가위원회위원� 방송통신위원회빅데이터포럼인력양성분과위원장/자문위원/운영위원� IBS 장비심의위원회, 지식경제부중앙장비심의위원회심의위원� 한국과학창의재단과학창의앰배서더� 지식경제부 ‘SW+인문포럼’멘토� IT VC 전문성강화프로그램전문강사� ISO/IEC JTC1/SC32 전문위원회위원, JTC 1/SC34 표준개발위원회위원� ISO 20022 TC68 Advisory Expert

� 한국정보통신기술협회정보통신표준화위원회 –디지털콘텐츠/SW품질평가/메타데이타/사물지능통신프로젝트그룹위원

� 교육과학기술부차세대정보컴퓨팅분과 –기획위원� 국무총리실정보지식인대회출제/평가위원� 문화체육관광부공공문화정보전문가위원

Let Me Introduce Myself :-)

3

Page 4: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

� Weekly IT Trends (NIPA)� 과학기술빅데이터동향및활용방안 (Vol. 1593)

� 모바일비즈니스인텔리전스기술동향 (Vol. 1573)

� 시맨틱소셜네트워크를구성하는온톨로지어휘기술현황 (Vol. 1560)

� 정부빅데이터분석기술적용동향 (Vol. 1556)

� 제품및기술수명주기활용동향 (Vol. 1550)

� 개체식별관점에서바라본링크드데이터동향 (Vol. 1524)

� 유망기술발굴모델및서비스동향 (Vol. 1521)

� 포지셔닝의측면에서살펴본모바일태블릿컴퓨터동향 (Vol. 1486)

� 테크놀로지인텔리전스동향 (Vol. 1477)

� 추론기술연구동향 (Vol. 1446)

� 트리플레파지토리벤치마킹 (Vol. 1439)

� 차세대 IT 기기와HCI 기술동향전망 (Vol. 1435)

� 시맨틱검색기술동향 (Vol. 1431)

� 시맨틱웹국내특허동향 (Vol. 1420)

� 감성분석과브랜드모니터링기술동향 (Vol. 1396)

� 시맨틱웹이경제⋅사회에미치는영향 (Vol. 1372)

� 웹매핑서비스비교분석 (Vol. 1352)

� 시맨틱웹 2.0 기술동향 (Vol. 1344)

� 국내포털검색시장및특허동향 (Vol. 1341)

� Open API 기술동향 (Vol. 1296)

� 엔터프라이즈검색기술동향 (Vol. 1276)

� 전자상거래검색기술동향 (Vol. 1273)

� 시맨틱웹포털기술동향 (Vol. 1264)

Trend Reports

4

Page 5: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Art Curator

5http://www.chrysler.org/files/resources/gallery-talk-3.jpg

Page 6: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung6

Press

Page 7: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

� Definition

� Active management of data over its lifecycle

� Ensuring that data is trustworthy, discoverable, accessible, reusable, and

fit for use

� E.g. data preparation for analytics

� Curation Types

� Manual curation

� (Semi-)automatic curation

� E.g. data cleansing, record duplication, classification

� Sheer curation (curation at source)

� Integrated in workflow for creating and managing data

Data Curation

7E. Curry, A. Freitas, and S. O’Riain, “Data Curation at the New York Times”, 2011.

Page 8: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

� Steps to Setup the Process

� Identify what data you need to curate

� Identity who will curate the data

� Define the curation workflow

� Identify appropriate data-in & data-out formats

� Identity the artifacts, tools, and processes needed to support the process

Data Curation

8E. Curry, A. Freitas, and S. O’Riain, “Data Curation at the New York Times”, 2011.

Page 9: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Case Studies

9

1. Apple Maps

2. Strategic Foresight

3. Big Data

Page 10: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

http://cdn0.sbnation.com/entry_photo_images/5667445/20120920-DSC_7192VERGE_large_verge_medium_landscape.jpg

Apple Maps

10

Page 11: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Google Maps vs. Apple Maps

11http://www.gadgetreview.com/2013/01/google-maps-vs-apple-maps-ios-comparison.html

Page 12: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Apple Siri

http://www.apple.com/ios/siri/

12

Page 13: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung13

We can see it as much about itas we know

http://investor.google.com/financial/tables.html

� Google’s Financial Table

Page 14: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung14

Nokia’s Burning Ships Strategy

http://www.brightsideofnews.com/Data/2011_5_5/Steven-Elop-Burn-Boats-Strategy-is-a-Cortez-Move-for-Nokia/Hernando_Cortez_BurningBoat.jpg

Page 15: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Result of the Strategy

� Worldwide Market Share

� Worldwide mobile device sales to end users in 2008 ~ 2013

Gartner, IDC Worldwide Mobile Phone Tracker

38.6, 117.937.8, 108.531.6,110.427.1, 106.618.7, 82.914.8, 61.9Nokia

3.5, 12.14.9, 19.13.1, 13.73.2, 13.5ZTE

418.6

41.9, 175.4

3.7, 15.4

8.9, 37.4

27.5, 115.0

1Q2013(%, M. Units)Company

3Q2012(%, M. Units)

3Q2011(%, M. Units)

3Q2010(%, M. Units)

3Q2009(%, M. Units)

3Q2008(%, M. Units)

Samsung 23.7, 105.4 22.3, 87.8 20.5, 71.4 21.0, 60.2 17.0, 52.0

Apple 6.1, 26.9 4.3, 17.1 4.0, 14.1

LG 3.1, 14.0 5.4, 21.1 8.1, 28.4 11.0, 31.6 7.5, 23.0

Sony Ericsson 4.9, 14.1 8.4, 25.7

Motorola 4.7, 13.6 8.3, 25.4

Others 45.3, 201.6 36.1, 142 32.2, 112.5 20.6, 59.1 20.1, 61.5

Total 444.5 393.7 348.9 287.1 305.4

15

Page 16: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung16

Sign of the Result

http://www.google.com/insights/search/

� Google Insights for Search

Page 17: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Strategic Foresight

R. Rohrbeck, H. Arnold, and J. Heuer, “Strategic Foresight in Multimedia Enterprises”, 2007.

17

Page 18: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung18

Hype Cycle

Emerging Technologies Hype Cycle 2012

Page 19: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung19

Social Data

http://bynoy.files.wordpress.com/2011/08/united-noy-weblife-60-seconds.jpg

Page 20: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung20

Machine Data

T. Baer, “What is Big Data? The Reality for Analytics”, OVUM, 2011.

Call data recordsCall data records

Sensory dataSensory data

Web log filesWeb log files

Financial Instrument TradeFinancial Instrument Trade

Page 21: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung21

Crop Yield Estimation

http://www.geovar.com/data/satellite/rapideye/rapideye_sample_image_bands543.jpg

Page 22: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung22

Crop Yield Estimation

https://www.agriskmanagementforum.org/content/improved-supply-intelligence-global-agriculture-markets

South American soybean production forecasts South American soybean production forecasts

Page 23: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung23

Soybeans Futures (CME)

Page 24: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung24

Data Mining Methods

A. Cheung, “Forecast Anything! The Seven Data Mining Models”

Decision Trees

Naive Bayes

Cluster Analysis

Sequence Clustering

Association Rules

Time Series

Neural Networks

Page 25: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung25

Value Pyramid

Search

Clustering

Extracting

DecisionSupport

Forecasting

ScenarioPlanning

Advising

Modified from D. Bousfield & P. Fooladi, “STM Information: 2009 Final Market Size and Share Report”, 2010.

InSciTe Advanced (2011)

InSciTe Adaptive (2012)

InSciTe Advisory (2013~)

1st Ta

rget o

f Data

Curat

ion

Page 26: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Data Curationfor Getting Insights & Foresights

26

1. Crawling Data

2. Extracting Information (Text Mining)

3. Resolving Identities

Page 27: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Crawling Data

� Crawling Full Texts from Google Scholar

27

Page 28: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Crawling Data

� Crawling Web Data by RSS & Google API

28

Page 29: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Crawling Data

� Web Data Statistics in InSciTe Adaptive

29

Source/Year 2001 … 2008 2009 2010 2011 2012 Total

IDC 0 … 1 13 21 313 250 598

Wikipedia 0 … 151,942 181,721 408,017 1,555,867 249,0209 4,975,178

InformationWeek 0 … 470 551 536 388 81 2,316

Gizmag 0 … 1,371 2,301 2,970 2,850 2,520 16,731

Technologyreview 89 … 646 693 758 873 405 5,330

IEEE spectrum 64 … 468 426 385 306 205 3,173

Technewsworld 31 … 696 742 1,396 2,332 2,485 9,843

DiscoverMagazine 2 … 104 51 79 57 49 606

NewYork Times 20,940 … 7,979 4,696 3,971 3,669 2,064 124,100

BBC 3,155 … 2,687 3,158 3,318 5,964 4,243 37,073

Fox News 0 … 1,577 1,965 1,123 2,015 1,872 10,749

CNN 3,594 … 1,154 1,499 1,071 1,308 814 19,769

Thomson Reuters 0 … 450 371 309 338 222 3,408

USA Today 458 … 4,616 3,878 4,630 6,579 4,276 38,750

EtnTws.com 447 … 1,964 2,241 2,099 1,585 844 14,259

Total 28,780 … 176,125 204,306 430,683 1,584,444 2,510,539 5,261,883

Page 30: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Concept

30

Page 31: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung31

� Mining Targets

� Named entities: e.g. PLOs

� Pattern-based entities: e.g. a-mail addresses, phone numbers

� Concepts: abstractions of entities

� Facts and relationships

� Concrete and abstract attributes: e.g. 10-year, expensive, comfortable

� Subjectivity in the forms of opinions, sentiments, and emotions: e.g. positive/negative, angry

S. Grimes, “Text Analytics Overview: Technology, Solutions, Market”, 2011.

Text Mining – Concept

Page 32: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung32

Text Mining – SINDI Architecture

Page 33: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Example

33

Page 34: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Example

34

Page 35: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Example

35

Page 36: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Example

36

Page 37: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Example

37

Page 38: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Text Mining – Application

38

Page 39: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Resolving Identities

Christian Bizer, Tom Heath, and Tim Berners-Lee, “Linked Data – The Story So Far”, 2009.

� URI Aliases

� URIs that refer to the same real-world objects

� E.g. http://dbpedia.org/resource/Berlin (for Berlin in DBpedia)

� E.g. http://sws.geonames.org/2950159 (for Berlin in Geonames)

� Information providers can set owl:sameAs links to URI aliases they know

about

� Resolution of Data Conflicts in Data Fusion

� Choosing a value in situations where multiple sources provide different

values for the same property of an object

39

Page 40: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung40

Resolving Identities

� Example

� 소녀시대, 소시, 少女時代, しょうじょじだい, 少時, Sho Jo Ji Dai, So Nyeo

Shi Dae, SoShi, Girl’s Generation, SNSD, So Nyuh Shi Dae, …

Page 41: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Resolving Identities

http://sindice.com/search?q=Hanmin+Jung

41

Page 42: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung42

OntoURIResolver

� Demonstration

Page 43: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

LOD Project

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.html

� Linked Data

� 31 billion RDF triples, 504 RDF links (2011.9)

43

Page 44: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

InSciTe Advanced (2011)

44

Page 45: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung45

InSciTe Adaptive (2012)

https://play.google.com/store/apps/details?id=net.xenix.inscite&feature=search_result#?t=W10.

Page 46: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung46

InSciTe Adaptive (2012)

Page 47: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

InSciTe Adaptive (2012)

� Data Fact Sheet

� Articles: 22.6 millions (9.8 millions for papers, 7.6 millions for patents, 5.3 millions for Web data)

� All technical areas (2001~2011)

� Named entities: 1.9 millions

� Authority dictionary: 1.5 millions entries

� LOD data: 290 GB (are being connected)

47

Page 48: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Press

48

Page 49: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

InSciTe Architecture

Analytics Models

ETD ModelEmerging Technology Discovery Model

TLCD ModelTechnology Life Cycle Discovery Model

TLC ModelTechnology Life Cycle Model

OntoRelFinder®

Relationship Path Finder

OntoReasoner®

Reasoning Engine

OntoURI®Semantic Knowledge Manager

OntoPipeliner®

Semantic Service Composer

SS&AESemantic Search & Analytics Engine

OntoURIResolver®

Identity Resolver

SINDI-CORE/LINKEntity & Relationship Extractor

TUC ModelTerminology Use Cycle Model

Ontology

Linked Data

OntoFrame

OntoVerifier®

Reasoning Verifier

Web Data CrawlerRSS/Google API

Web Data

Literatures

49

Page 50: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Ontology

DB

Ontology Schema

Ontology Instances

RDF Triples

Portability & Connectibility

For Storing & Managing

For ServicePlanning ServicesDefining ConceptsExploiting Relations

50

Page 51: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung

Big Data Processing in InSciTe

51

Page 52: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung52

InSciTe Homepage

http://inscite.kisti.re.kr

Page 53: Big Data Curation And Its Application

Copyright © 2013 Hanmin Jung53

Thank you

[email protected]

“A lot of times, people don’t know what they want until you show it to them.”

by Steve Jobs

“Many people won’t be convinced until they’ve seen it for themselves.”

by Jakob Nielsen


Top Related