information integration, life-cycle and visualization

61
1 Peter Fox Xinformatics 4400/6400 – Week 6, March 4, 2014 Information integration, life- cycle and visualization

Upload: sopoline-abbott

Post on 02-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Information integration, life-cycle and visualization. Peter Fox Xinformatics 4400/6400 – Week 6, March 4, 2014. Contents. Review of last class, reading Information integration Information life-cycle Information visualization Projects? Next…. Information integration. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information integration, life-cycle and visualization

1

Peter Fox

Xinformatics 4400/6400 – Week 6, March 4, 2014

Information integration, life-cycle and visualization

Page 2: Information integration, life-cycle and visualization

Contents• Review of last class, reading

• Information integration

• Information life-cycle

• Information visualization

• Projects?

• Next…

2

Page 3: Information integration, life-cycle and visualization

Information integration• Involves combining information residing in

different sources and providing users with a unified view of them.

• This process becomes significant in a variety of situations both commercial (e.g. when two similar companies need to merge their databases) and scientific (e.g. combining research results from different bioinformatics repositories).

• Integration appears with increasing frequency as the volume and the need to share existing information explodes.

3

Page 4: Information integration, life-cycle and visualization

Information integration• It has become the focus of extensive

theoretical work, and numerous open problems remain unsolved.

• In management circles, people frequently refer to data integration as "Enterprise Information Integration" (EII)” wikipedia

• Is this an information management challenge (rhetorical question)?

• Integration discussion context– Data Integration vs. Data Interoperability

4

Page 5: Information integration, life-cycle and visualization

An example - Geospatial

5

• Much of the work on information integration has focused on the dynamic integration of structured data sources, such as databases or XML data.

• With the more complex geospatial data types, such as imagery, maps, and vector data, researchers have focused on the integration of specific types of information, such as placing points or vectors on maps, but much of this integration is only partially automated.

• The challenge is that the dynamic integration of online data and geospatial data is beyond the state of the art of existing integration systems.

Page 6: Information integration, life-cycle and visualization

Geospatial

6

• The conflation process divides into following tasks: (1) find a set of conjugate point pairs, termed "control point pairs", in both vector and image datasets, (2) filter control point pairs, and (3) utilize algorithms, such as triangulation and rubber-sheeting, to align the rest of the points and lines in two datasets using the control point pairs.

• Typically by human input has been essential to find control point pairs and/or filter control points

Page 7: Information integration, life-cycle and visualization

Vectors on maps

7

Page 8: Information integration, life-cycle and visualization

AcquireData Participate

Data.gov

Use Side

Community of UsersSupply Side

Community of Suppliers

Supply Chain Management – no geo integration focus

Connect Discover

Enable Discovery

Enable Use

Build Dataset

Publish Dataset

Value Chain –data.gov – Integration Context

Access and Interoperability Focused

Courtesy Jim Barrett

Page 9: Information integration, life-cycle and visualization

Typical Spatial Integration

• Data and Information Quality– Temporal – currentness, vintage…– Semantic – meaning of the object and its attributes – Spatial dimensions (X,Y,Z)– Accuracy (positional)– Topology/modeling– Resolution– Representation

• All important qualities – how we attain them will require not only technology but improvement to how we manage

Courtesy Jim Barrett

Page 10: Information integration, life-cycle and visualization

What do users need know to about obtaining geospatial data?

10

Courtesy Jim Barrett

Page 11: Information integration, life-cycle and visualization

Simple supply side questions that are very hard to answer?

• Who produces the information I need?

• Are they “the” recognized authority? How can I tell?

• How often will it be re-published?

– Is the supply predictable and reliable? Can I count on it?

• Do the data have a geospatial characteristic?

– What are its geospatial qualities (specs) and provenance?

– Is it consistently defined in its meaning?

– What is the scope of its coverage?

• Will the data be maintained?

– Geometry and models

– Attributes and metadata

• Where do I get it and in what forms?

Courtesy Jim Barrett

Page 12: Information integration, life-cycle and visualization

They should not have to ask if it has been

integrated?

12

Courtesy Jim Barrett

Page 13: Information integration, life-cycle and visualization

What is stopping us from answering these basic

questions?

13

Courtesy Jim Barrett

Page 14: Information integration, life-cycle and visualization

Barriers to integration

• What is preventing our information from being integrated?– Acquisition:

• Uncoordinated acquisition strategies at national level• Barrier between business data and geospatial data i.e. schools, minerals,• Few means to broker and optimize requirements from consumers

– Production• Quality of our metadata and when and how we get it • Unclear operational roles in a national data framework. (NSDI)• Absence of a granular or meaningful trustworthy chain of authority? • Absence of a schedule to communicate what is going to be happening?

14

Courtesy Jim Barrett

Page 15: Information integration, life-cycle and visualization

AcquireData Participate

Data.gov

Use Side

Community of Users

Supply Side

Community of Suppliers

Supply Chain Management Data Integration Focused

Connect Discover

Enable Discovery

Enable Use

Build / Intra

Dataset Integration

Publish Dataset

Where are the problems occurring in the Value Chain?

Access and Interoperability Focused

DownstreamIntegration

$$$

AmbiguousCataloging

and semantics

Gap in planning view of Acquisition

Gap in what gets

integrated

Courtesy Jim Barrett

Page 16: Information integration, life-cycle and visualization

What we have is many value chains running in parallel. It is hard to do

integration without a systematic collaborative

approach.16

Courtesy Jim Barrett

Page 17: Information integration, life-cycle and visualization

We resemble this!

Courtesy Jim Barrett

Page 18: Information integration, life-cycle and visualization

We need to integrate the supply chain.How can we think about the problem

differently?

18

Courtesy Jim Barrett

Page 19: Information integration, life-cycle and visualization

Organizing Principles

• A supply chain is a system of organizations, people, technology, activities, information and resources involved in moving a product or service from supplier to customer.

• Supply chain activities transform natural resources, raw materials and components into a finished product that is delivered to the end customer. In our case Information.

• In sophisticated supply chain systems, used products may re-enter the supply chain at any point where residual value is recyclable. Supply chains link value chains

Courtesy Jim Barrett

Page 20: Information integration, life-cycle and visualization

Supply Chain Reference Model

Courtesy Jim Barrett

Page 21: Information integration, life-cycle and visualization

AcquireData Participate

Data.gov

Use Side

Community of UsersSupply Side

Community of Suppliers

Supply Chain Management – no geo integration focus

Connect Discover

Enable Discovery

Enable Use

Build Dataset

Publish Dataset

Value Chain –data.gov – Integration Context

Access and Interoperability Focused

Courtesy Jim Barrett

Page 22: Information integration, life-cycle and visualization

Why we need to think differently!

Courtesy Jim Barrett

Page 23: Information integration, life-cycle and visualization

Architect and Design towards integration!

23

Courtesy Jim Barrett

Page 24: Information integration, life-cycle and visualization

Indonesian NSDI

Courtesy Jim Barrett

Page 25: Information integration, life-cycle and visualization

Recall elements/ forms of information

• Structured/ un-structured, content, context

• Presentation and organization

• Syntax-semantics-pragmatics

• Managed, designed and architected.

25

Page 26: Information integration, life-cycle and visualization

Recall elements/ forms of information

• Integration poses an important challenge here– Two forms presented/

organized differently– Different structure,

semantics…

• Information back to data back to information 26

Page 27: Information integration, life-cycle and visualization

Aiding integration• Usually an integration capability is HIGHLY

curated or left entirely to the end user

• If left to the user, the results is a new product which must also be managed and shared

• “I can’t integrate what I don’t understand”

• Key idea: provide for integratability !!!– Standards – formats for sure but also– Metadata– Semantics

27

Page 28: Information integration, life-cycle and visualization

Different contexts?• Relies especially on structural/ use metadata

• Provide different means/mode for integration– E.g. geospatial, uses … well ‘space’, really

surfaces (latitude, longitude)– Geological data integration uses time and feature

(of interest) – why? Yes, things move– Atmospheric science, e.g. chemistry or structure

of the atmosphere may use ‘layers’ or pressure as an indicator for position

– Comparing in-situ with remotely sensed information in many fields, e.g. medicine 28

Page 29: Information integration, life-cycle and visualization

Informatics considerations• Be aware of what means for integration is

and can be used

• This is more than often what leads to new findings, and abductive reasoning… one of our goals

29

Page 30: Information integration, life-cycle and visualization

Life Cycle

30

Page 31: Information integration, life-cycle and visualization

Life cycle - definitions• Life-cycle elements

– Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction)

– Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future (http://www.dcc.ac.uk/FAQs/data-curator)

– Preservation: Process of retaining usability of data in some source form for intended and unintended use

– Stewardship: Process of maintaining integrity across acquisition, curation and preservation

31

Page 32: Information integration, life-cycle and visualization

Definitions ctd.• Management: Process of arranging for

discovery, access and use of data, information and all related elements. Also oversees or effects control of processes for acquisition, curation, preservation and stewardship. Involves fiscal and intellectual responsibility.

32

Page 33: Information integration, life-cycle and visualization

The nature of the challenge• To architect information systems today

– You may play many roles– You may not get all the metadata or information

you need even if you get the data– You will need skills that you were not taught

• To work with end-users today– You may have lots of technical experience– You will need new skills in addressing the

changing use of data and information– One ‘size’ does not fit all

33

Page 34: Information integration, life-cycle and visualization

Acquisition• Learn / read what you

can about the means of acquisition– Documents may not be

easy to find

– Bias in everywhere!!!

• Document things as you go (I know you hate it, but please get over that) 34

Page 35: Information integration, life-cycle and visualization

Curation• From producer to consumer!

• Consider the organization and presentation of the data as information– Design factors to reduce uncertainty– Making use of semiotics – you should know how!

• Document what has been (and not) done– Look to add metainformation

35

Page 36: Information integration, life-cycle and visualization

Preservation• ‘Archiving’ is only one component

• Intent is that ‘you can open it any time in the future’ and that ‘it will be there’– Where are your class notes from last term?– This term?

• This involves steps that may not be conventionally thought of

• Think 10, 20, 50, 200 years…. looking historically gives some guide to future considerations

36

Page 37: Information integration, life-cycle and visualization

Remember• The life cycle applies within and before and

after your use case…

• So, let’s look at one in a little more detail

37

Page 38: Information integration, life-cycle and visualization

How the information is created

• Systemic

• Environmental

• Trial-and-error (or ad-hoc)

38

Page 39: Information integration, life-cycle and visualization

How is information delivered?

• White paper (a document)• Web site FAQ• Web site informational• Web site directed (link sent with e-mail, and so

on) to a specific Web site• Application-based delivery via managed expert

system• One-to-one presentation:

– Word of mouth– Ad-hoc communication

39

Page 40: Information integration, life-cycle and visualization

How the information is managed• Complexity of the

information

• Complexity of the creation process

• Complexity of the management system

Complexity=Uncertainty?

• Financial impact of

creation 40

Page 41: Information integration, life-cycle and visualization

Type of information created• Tacit (created and stored informally):

– Human memory– Localize, e.g. hard drive of the computer– Movement of tacit information into a formalized

structure

• Explicit (created and sorted formally):– Network shared– Network Web site/intranet– Informal knowledge-management system– Document-management system– Formal KM system

41

Page 42: Information integration, life-cycle and visualization

For information creation:• Consider the

– Value of the source– Age of the information– Proximity of the information to the consumer– Source of the information, and previous

interactions with that specific source

• Means for Re-creation??

42

Page 43: Information integration, life-cycle and visualization

Value of the source• Age of the information

• Proximity of the information to the consumer

• Source of the information, and previous interactions with that specific source

43

Page 44: Information integration, life-cycle and visualization

Life cycle is a complex issue• Must be managed

• Documented

• As part of the use case, but also often outside it

44

Page 45: Information integration, life-cycle and visualization

Next• Visualize whirled peas

45

Page 46: Information integration, life-cycle and visualization

Information Visualization• Defn: "to form a mental vision,

image, or picture of (something not visible or present to sight, or of an abstraction); to make visible to the mind or imagination"[The Oxford English Dictionary, 1989]

• Direct link to cognition and mental representation

• Semiotics (again)

46

Page 47: Information integration, life-cycle and visualization

Why visualization?• Reducing amount of data, quantization

• Patterns

• Features

• Events

• Trends

• Irregularities

• Exit points for analysis

47

Page 48: Information integration, life-cycle and visualization

Types of visualization• Color coding (including false color) – color

theory from last week

• Classification of techniques is based on– Dimensionality– Information being sought, i.e. purpose (design)

• Line plots• Contours• Surface rendering techniques• Volume rendering techniques

• Animation techniques

• Non-realistic, including ‘cartoon/ artist’ style 48

Page 49: Information integration, life-cycle and visualization

Visualization formats• Many – vector, raster

(image), animation, multi-dimensional,

49

Page 50: Information integration, life-cycle and visualization

However, information cf. data..• Think back to your presentations on

semiotics and the visual representations of information systems – both good and bad

• Not just a matter of the ‘producer’ view… consider the ‘consumer’ view, i.e. what is the goal of the visualization?

• This is a time when – Experience helps a lot– But so does listening and gaining external

feedback50

Page 51: Information integration, life-cycle and visualization

Remember - metadata• Many of these formats already contain

metadata or fields for metadata, use them!

• How do you visualize:– Metadata?

51

Page 52: Information integration, life-cycle and visualization

New modes• http://www.visualizing.org/ • http://www.smashingmagazine.com/2007/08/02/

data-visualization-modern-approaches/• http://agbeat.com/business-marketing/piktochart-

simple-infographic-creator-online-for-the-busy-professional/

• http://ijustdid.org/2012/06/infographics-generators/

• Many modes: – http://www.siggraph.org/education/materials/HyperVis/

domik/folien.html52

Page 53: Information integration, life-cycle and visualization

visualizing.org

53

Page 54: Information integration, life-cycle and visualization

Visualization

54

Page 55: Information integration, life-cycle and visualization

Managing visualization products

• The importance of a ‘self-describing’ product

• Visualization products are not just consumed by people

• How many images, graphics files do you have on your computer for which the origin, purpose, use is still known?

• How are these logically organized? 55

Page 56: Information integration, life-cycle and visualization

Discussion• About integration

• About information life-cycle in general

• About visualization

• Degree to which these topics are part of your projects

56

Page 57: Information integration, life-cycle and visualization

Reading for this week• Is retrospective and covers the topic areas

– Information Integration– Information Life Cycle– Information Visualization

57

Page 58: Information integration, life-cycle and visualization

Project Assignment• A) Analysis of existing information system content

and architecture, critique, redesign and prototype redeployment

• B) Pursuit of a detailed use case around a particular area of informatics, includes developing a prototype IS, architecture, design, etc.

• Due April 29 (write up) and May 6 (presentation) • That’s 6 (7) weeks (after break)• Check in every ~ 2 weeks• Will set aside class time to meet

58

Page 59: Information integration, life-cycle and visualization

Teams (after Spring break)• IR: • Red:• Orange:• Yellow: • Green: • Blue:• Indigo: • Violet:

59

Page 60: Information integration, life-cycle and visualization

Let’s look at these:• http://www.nws.noaa.gov/

• http://www.nodc.noaa.gov/, http://www.ngdc.noaa.gov/, http://www.ncdc.noaa.gov/

• http://www.bco-dmo.org/

• http://giovanni.gsfc.nasa.gov

• http://mirador.gsfc.nasa.gov

• 50 best web sites of 2012

• http://www.coolhomepages.com/

• Worst web sites… 60

Page 61: Information integration, life-cycle and visualization

What is next• Spring break!

• Then Assignment 3 presentations

61