1 foundations vi: discovery, access and semantic integration data mining and knowledge discovery -...

29
1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01 Week 13, November 29, 2010

Upload: madeleine-atkins

Post on 03-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

1

Foundations VI: Discovery, Access and Semantic Integration

Data Mining and Knowledge Discovery - Continued

Deborah McGuinness and Joanne Luciano

with Peter Fox and Li Ding

CSCI-6962-01

Week 13, November 29, 2010

Page 2: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Extra

Page 3: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Knowledge Discovery• Has a broad meaning

– Finding ontologies– Creating new knowledge from

• Previous knowledge• New sources (data, information)• Modeling

• We’ll look at a mining approach as an example

3

Page 4: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Mining• We will start with data but the ideas apply to

information and knowledge bases as well

• Definition

• History

• Our interest

4

Page 5: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

SAM: Smart Assistant for Earth Science Data Mining

PI: Rahul Ramachandran

Co-I: Peter Fox, Chris Lynnes, Robert Wolf, U.S. Nair

Page 6: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Science Motivation• Study the impact of natural iron fertilization process such as

dust storm on plankton growth and subsequent DMS production– Plankton plays an important role in the carbon cycle– Plankton growth is strongly influenced by nutrient availability (Fe/Ph)– Dust deposition is important source of Fe over ocean– Satellite data is an effective tool for monitoring the effects of dust

fertilization• Analysis entails

– Mine MODIS L1B data for dust storm events and identify the swath of area influenced by the passage of the dust storms.

– Examine correlations between fertilization, plankton growth and DMS production

Page 7: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Current Analysis Process

• MODIS aerosol products don’t provide speciation• Locate and download all the data to their local machine• Write code to classify and detect dust accurately [ 3-4

month effort]• Write code to classify and detect other dust aerosols [ 3-

4 month effort]• Write code to segment the detected region in order to

account for advection effect and correlation coefficient [2 months effort]

Page 8: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Analysis with SAM

• Create a workflow to perform classification using many different state of the art classifiers on distributed data

• Create a workflow to segment detected regions using image processing services on distributed data

Bottom line: • Scientist does not have to write all the code to perform

the analysis• Can compose workflows that utilize distributed

data/services• Can share the workflow with others to collaborate, reuse

and modify

Page 9: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Conducting Science using Internet as the Primary Computer

Page 10: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Mash-ups Example: Yahoo Pipes

Page 11: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Data Mining in the ‘new’ Distributed Data/Services Paradigm

Page 12: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Too many choices!!

•And that’s only part of the toolkit•ADaM-IVICS toolkit has over 100+ algorithms

Page 13: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

SAM Objectives• Improve usability of Earth Science data by

existing data mining services for research, by incorporating semantics into the workflow composition process.– Semantic search capable of mapping a

conceptual task– Assistance in mining workflow composition– Verification that services are connected in a

semantically correct fashion

Page 14: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Ontology Use

Page 15: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Semi-automated Workflow Composition

Filtering services basedon data format

Page 16: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Semi-automated Workflow Composition

Filtering service optionsbased on both data formatand task selected

Page 17: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Semi-automated Workflow Composition

Final Workflow

Page 18: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Science Motivation• Study the impact of natural iron fertilization process

such as dust storm on plankton growth and subsequent DMS production– Plankton plays an important role in the carbon cycle– Plankton growth is strongly influenced by nutrient

availability (Fe/Ph)– Dust deposition is important source of Fe over ocean– Satellite data is an effective tool for monitoring the effects

of dust fertilization

Page 19: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Hypothesis• In remote ocean locations there is a positive

correlation between the area averaged atmospheric aerosol loading and oceanic chlorophyll concentration

• There is a time lag between oceanic dust deposition and the photosynthetic activity

Page 20: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Primary source of ocean nutrients

WIND BLOWNDUS

T

SAHARA

SEDIMENTS FROM RIVER

OCEAN UPWELLIN

G

Page 21: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

SAHARA

DUST

SST

CLOUDS

NUTRIENTS

CHLOROPHYLL

Factors modulating dust-ocean photosynthetic effect

Page 22: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Objectives

• Use satellite data to determine, if atmospheric dust loading and phytoplankton photosynthetic activity are correlated.

• Determine physical processes responsible for observed relationship

Page 23: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Preliminary Results

Page 24: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Data and Method• Data sets obtained from SeaWiFS and

MODIS during 2000 – 2006 are employed

• MODIS derived AOT

Page 25: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

The areas of study

1

5

6

8

43

2

7

1-Tropical North Atlantic Ocean 2-West coast of Central Africa 3-Patagonia

4-South Atlantic Ocean 5-South Coast of Australia 6-Middle East 7- Coast of China 8-Arctic Ocean

*Figure: annual SeaWiFS chlorophyll image for 2001

Page 26: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Tropical North Atlantic Ocean dust from Sahara Desert

-0.68497

-0.1587

4

-0.856

11

-0.446

7

-0.75102

-0.6644

8

-0.72603

-0.17504 -0.0902 -0.328 -0.4595 -0.14019 -0.7253 -0.1095

Ch

loro

ph

yll

AOT

Page 27: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Arabian Sea Dust from Middle East

0.59895 0.66618 0.37991 0.45171 0.52250 0.36517 0.5618

0.76650

0.69797

0.75071

0.4412

0.8495

0.708625

0.65211

Ch

loro

ph

yll

AOT

Page 28: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Summary and future work• Dust impacts oceans photosynthetic activity,

positive correlations in some areas NEGATIVE correlation in other areas, especially in the Saharan basin

• Hypothesis for explaining observations of negative correlation: In areas that are not nutrient limited, dust reduces photosynthetic activity

• But also need to consider the effect of clouds, ocean currents. Also need to isolate the effects of dust. MODIS AOT product includes contribution from dust, DMS, biomass burning etc.

Page 29: 1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with

Case for SAM

• MODIS aerosol products don’t provide speciation• Why performing this data analysis is hard?

– Need to classify and detect Dust accurately – Need to classify and detect other aerosols (eg. DMS accurately)– Need to segment the detected region in order to account for

advection effects and correlation coefficient.• What will SAM provide?

– Provide capability to create a workflow to perform classification– Provide capability to create a workflow to segment detected regions

Bottom line: • Scientist does not have to write all the code to perform the

analysis• Can compose workflows that utilize distributed data/services• Can share the workflow with others to collaborate, reuse and

modify