modeling ultra-high dimensional feature selection as a slow intelligence system wang yingze cs 2650...

27
Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Upload: nathaniel-osborne

Post on 17-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Modeling Ultra-high Dimensional Feature Selection as a Slow

Intelligence System

Wang Yingze

CS 2650 Project

Page 2: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Outline

Midway results

Framework of Slow Intelligence System

Tasks for project

Iterative feature selection

IntroductionIntroduction

Page 3: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Introduction

UltraHigh-dimensional variable selection is the hot topic in statistics and machine learning.

Model relationship between one response and associated features , based on a sample of size n.

Y1, , pX X

Y X

Page 4: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Application

Associated studies between phenotypes and SNPs. Gene selection or disease classification in bioinformatics.

n Patients’ degree of disease sickness

each patient’s data with p genes

Important genes selected

one Gene expression level

Page 5: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Challenges

Dimensionality grows rapidly with interactions of the features

Portfolio selection and networking modeling: 2000 stocks involves over 2 millions unknown parameters in the covariance matrix.

Protein-protein interaction: the sample size may be in the order of thousands, but the number of features can be in the order of millions.

To construct effective method to learn relationship between features and response in high dimension for scientific purposes.

Page 6: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Outline

Midway results

Framework of Slow Intelligence System

Tasks for project

Iterative feature selection

Introduction

Iterative feature selection

Page 7: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Existing methods

1. LASSO : L1 regularization linear regression

2. Forward regression: sequentially add variables

3. Backward regression: start with them all then delete them on the bases of smallest change in

4. Stepwise regression: at each step one can be entered (on basis of greatest improvement in but one also may be removed if the change (reduction) in is not significant.

5. Least-angle regression: estimated parameters are increased in a direction equiangular to each one's correlations with the residual.

2R

2R2R

Page 8: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

State-of-the-Art Approach

Interactive feature selection method proposed by Jianqing Fan in Princeton University

“Ultrahigh dimensional feature selection: beyond the linear model”

Contribution:

Ultrahigh dimensional data Accuracy slow

Page 9: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Step 1: Large-scale screening

Apply Pearson correlation and ranking to pick a set 1

P

1A

Page 10: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Step 2 : Moderate-scale selection

Employ an existing regression method to select a subset

of these indices .1M

P1A

1M

1A

Page 11: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Step 3: Large-scale screening

Adding other features one each time with to the regression model:

11 { }{1 }\ ( , )j M jj P M L Loss Y X

P1A

1M

j

1M

Page 12: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Step 3 (con’t)

Ranking j features according to , select the top numbers of features. And add to forming the new feature set

Repeats Steps 2-3, select new from , then form new

until

jL

1 1| | | |A M 1M

2A

P2A

1M

2M 2A 3M

1l lM M P

lM

P2A

1M

Page 13: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Outline

Midway results

Framework of Slow Intelligence System

Tasks for project

Iterative feature selection

Introduction

Framework of Slow Intelligence System

Page 14: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Slow Intelligence System

“A General Framework for Slow Intelligence Systems”, by S.K. Chang, International Journal of Software Engineering and Knowledge Engineering

Page 15: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Time Controller

Slow decision cycle(s) to complement quick decision cycle(s): SIS possesses at least two decision cycles. Therefore, Slow Intelligence Systems work usually correctly but not always fast.

Time Controller Design Panic Button Petri-net model

Page 16: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Motivation

“Modeling Human Intelligence as A Slow Intelligence System”

by Tiansi Dong, DMS2010

SIS for object mapping between scenes Two object tracing results due to two different priorities

1. Priority on spatial changes (minimal spatial changes)

2. Priority on object categories (objects are mapped within same categories)

Page 17: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

SIS1 for Object tracing (priority on spatial changes)

Enumerate all possible mapping

Elimination and concentration the mapping with the minimal spatial changes

Page 18: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

SIS2 for Object tracing (priority on object category)

Enumerate all possible mapping

Elimination and concentration the mapping with the same category

Page 19: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Outline

Midway results

Framework of Slow Intelligence System

Tasks for project

Iterative feature selection

Introduction

Tasks for project

Page 20: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Task one

Modeling Ultra-high Dimensional Feature Selection as a

Slow Intelligence System

Use SIS to model Iterative feature selection method to five phases: Enumeration, Elimination, Adaptation, Propagation, Concentration.

The whole SIS system contains additional Sub-SIS system.

Represent it in Mathematical formulation

Page 21: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Task two

Design time controller in term of Petri Net and introduce

Knowledge base

Time controller controls the time to evoke each phase of

SIS. Knowledge base contains five different moderate-scale

selection algorithms. KB can be changed and updated in the slow cycle.

Represent Time controller in Petri Net using ReNew Editor

Page 22: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Future work

Experiment: Use some real data (colon cancer data) to do the experiment and compare the results with some existing feature selection method like LASSO, forward, backward regression, etc.

Weka: Demo

I will use some visualization tool to visualize the result and

the process of feature selection.

Page 23: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Outline

Midway results

Framework of Slow Intelligence System

Tasks for project

Iterative feature selection

Introduction

Midway results

Page 24: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Diagram: Main SIS System

Page 25: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Diagram: Sub SIS system

Page 26: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Diagram: Petri-Net Model

Page 27: Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project