data mining 2 (ex análisis inteligente de datos y data mining ) lluís a. belanche

24
Data Mining 2 (ex Análisis Inteligente de Datos y Data Mining) Lluís A. Belanche

Upload: langston-jessie

Post on 02-Jan-2016

12 views

Category:

Documents


1 download

DESCRIPTION

Data Mining 2 (ex Análisis Inteligente de Datos y Data Mining ) Lluís A. Belanche. www.lsi.upc.edu/... /~belanche/docencia/aiddm/aiddm.html /~avellido/teaching/data_mining.htm. Contents of the course (hopefully). 1. Introduction & methodologies 2. Exploratory DM through visualization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Data Mining 2(ex Análisis Inteligente de Datos y Data Mining)

Lluís A. Belanche

Page 2: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

www.lsi.upc.edu/...

/~belanche/docencia/aiddm/aiddm.html

/~avellido/teaching/data_mining.htm

Page 3: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Contents of the course (hopefully)

1. Introduction & methodologies

2. Exploratory DM through visualization

3. Pattern recognition: introduction

4. Pattern recognition: the Gaussian case

5. Feature extraction

6. Feature selection & weighing

7. Error estimation

8. Linear methods are nice!

9. Probability in Data Mining

10. Latency, generativity, manifolds and all that

11. Application of GTM: from medicine to ecology

12. DM Case studies

Sorry guys! … no fuzzy systems …

Page 4: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche
Page 5: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche
Page 6: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Errorestimation

Page 7: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Feature extraction, selection and weighing have many uses

Page 8: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Linear classifiers are nice! (I)

Page 9: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Linear classifiers are nice! (II) Transformation

(x) = [ (x), (x), … m(x) ]

with x = [ x1, x2, …, xn ]

Useful for “ascending” (m>n) or “descending” (m>n)

with 0 < m,n < oo (integers) … an example?

Page 10: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Linear classifiers are nice! (III) Nets

(x) = [ (x), (x), … m(x) ]

with x = [ x1, x2, …, xn ]

x (x)

Page 11: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Utility

• This is a very powerful setting• Let us suppose:

r>s increase in dimension

increase in expressive power, ease the task for almost any learning machine

r<s decrease in dimension

visualization, compactation, noise reduction, removal of useless information

Contradictory !?

Page 12: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

On intelligence …

• What is Intelligence?• What is the function of Intelligence?

to ensure survival in nature

• What are the ingredients of intelligence?– Perceive in a changing world– Reason under partial truth– Plan & prioritize under uncertainty– Coordinate different simultaneous tasks– Learn under noisy experiences

Page 13: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

“Generally, a car can be parked rather easily because the final position of the car is not specified exactly. It it were specified to within, say, a fraction of a millimeter and a few seconds of arc, it would take hours of maneuvering and precise measurements of distance and angular position to solve the problem.”

HighHigh precision carries a highhigh cost.

Parking a Car (difficult or easy?)

Page 14: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Soft Computing

Rough Sets

Fuzzy Logic

Neural Networks

Evolutionary Algorithms

Chaos & Fractals

Belief

Networks

The primordial soup

Page 15: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

What could MACHINE LEARNING possibly be?

In the beginning, there was a set of examples …

• To exploit imprecision, uncertainty, robustness, data dependencies, learning and/or optimization ability, to achieve a working solution to a problem which is hard to solve.

• To find an exact (approximate) solution to an imprecisely (precisely) formulated problem.

Page 16: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

The challenge is to put these capabilities into use by devising methods of computation which lead to an acceptable solution at the lowest possible cost.

This should be the guiding principle

So what is the aim?

Page 17: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Fuzzy Logic : the algorithms for dealing with imprecision and uncertainty Neural Networks : the machinery for learning and function approximation with noise Evolutionary Algorithms : the algorithms for adaptive search and optimization

RSRough Sets

uncertainty arising from the granularity in the domain of discourse

Different methods = different roles

Page 18: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Examples of soft computing

• TSP: 105 cities, – accuracy within 0.75%, 7 months– accuracy within 1%, 2 days

• Compare– “absoulute best for sure” with “very good with

very high probability”

Page 19: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Are you one of the top guns?

• Consider …– Search space of size s– Draw N random samples– What is the probability p that at least one of

them is in the top t ?

• Answer: p = 1 – (1-t/s)N

• Example: s= 1012, N=100.000, t=1.000 1 in 10.000 !

Page 20: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

On Algorithms

• what is worth?

Problems

Eff

icie

ncy

P

Specialized algorithms: best performance for special problemsGeneric algorithms: good performance over a wide range of problems

Specialized Algo.

Generic Algorithms

Page 21: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

Words are important !

• What is a theory ?

• What is an algorithm ?

• What is an implementation ?

• What is a model ?

• What does “non-linear” mean ?

• What does “non-parametric” mean ?

Page 22: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

The problem of induction

• Classical problem in Philosophy

• Example: 1,2,3,4,5,?• A more through

example: JT

Page 23: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

What are the conditions for successful learning?

• Training data (sufficiently) representative

• Principle of similarity

• Target function within capacity of the learner

• Non-dull learning algorithm

• Enough computational resources

• A correct (or close to) learning bias

Page 24: Data Mining  2 (ex Análisis Inteligente de Datos y  Data Mining ) Lluís A. Belanche

And the Oscar goes to …

The real problem is not whether machines think, but whether men do.

B.F. Skinner,

Contingencies of Reinforcement