modeling ultra-high dimensional feature selection as a slow intelligence system wang yingze cs 2650...
TRANSCRIPT
Modeling Ultra-high Dimensional Feature Selection as a Slow
Intelligence System
Wang Yingze
CS 2650 Project
Outline
Midway results
Framework of Slow Intelligence System
Tasks for project
Iterative feature selection
IntroductionIntroduction
Introduction
UltraHigh-dimensional variable selection is the hot topic in statistics and machine learning.
Model relationship between one response and associated features , based on a sample of size n.
Y1, , pX X
Y X
Application
Associated studies between phenotypes and SNPs. Gene selection or disease classification in bioinformatics.
n Patients’ degree of disease sickness
each patient’s data with p genes
Important genes selected
one Gene expression level
Challenges
Dimensionality grows rapidly with interactions of the features
Portfolio selection and networking modeling: 2000 stocks involves over 2 millions unknown parameters in the covariance matrix.
Protein-protein interaction: the sample size may be in the order of thousands, but the number of features can be in the order of millions.
To construct effective method to learn relationship between features and response in high dimension for scientific purposes.
Outline
Midway results
Framework of Slow Intelligence System
Tasks for project
Iterative feature selection
Introduction
Iterative feature selection
Existing methods
1. LASSO : L1 regularization linear regression
2. Forward regression: sequentially add variables
3. Backward regression: start with them all then delete them on the bases of smallest change in
4. Stepwise regression: at each step one can be entered (on basis of greatest improvement in but one also may be removed if the change (reduction) in is not significant.
5. Least-angle regression: estimated parameters are increased in a direction equiangular to each one's correlations with the residual.
2R
2R2R
State-of-the-Art Approach
Interactive feature selection method proposed by Jianqing Fan in Princeton University
“Ultrahigh dimensional feature selection: beyond the linear model”
Contribution:
Ultrahigh dimensional data Accuracy slow
Step 1: Large-scale screening
Apply Pearson correlation and ranking to pick a set 1
P
1A
Step 2 : Moderate-scale selection
Employ an existing regression method to select a subset
of these indices .1M
P1A
1M
1A
Step 3: Large-scale screening
Adding other features one each time with to the regression model:
11 { }{1 }\ ( , )j M jj P M L Loss Y X
P1A
1M
j
1M
Step 3 (con’t)
Ranking j features according to , select the top numbers of features. And add to forming the new feature set
Repeats Steps 2-3, select new from , then form new
until
jL
1 1| | | |A M 1M
2A
P2A
1M
2M 2A 3M
1l lM M P
lM
P2A
1M
Outline
Midway results
Framework of Slow Intelligence System
Tasks for project
Iterative feature selection
Introduction
Framework of Slow Intelligence System
Slow Intelligence System
“A General Framework for Slow Intelligence Systems”, by S.K. Chang, International Journal of Software Engineering and Knowledge Engineering
Time Controller
Slow decision cycle(s) to complement quick decision cycle(s): SIS possesses at least two decision cycles. Therefore, Slow Intelligence Systems work usually correctly but not always fast.
Time Controller Design Panic Button Petri-net model
Motivation
“Modeling Human Intelligence as A Slow Intelligence System”
by Tiansi Dong, DMS2010
SIS for object mapping between scenes Two object tracing results due to two different priorities
1. Priority on spatial changes (minimal spatial changes)
2. Priority on object categories (objects are mapped within same categories)
SIS1 for Object tracing (priority on spatial changes)
Enumerate all possible mapping
Elimination and concentration the mapping with the minimal spatial changes
SIS2 for Object tracing (priority on object category)
Enumerate all possible mapping
Elimination and concentration the mapping with the same category
Outline
Midway results
Framework of Slow Intelligence System
Tasks for project
Iterative feature selection
Introduction
Tasks for project
Task one
Modeling Ultra-high Dimensional Feature Selection as a
Slow Intelligence System
Use SIS to model Iterative feature selection method to five phases: Enumeration, Elimination, Adaptation, Propagation, Concentration.
The whole SIS system contains additional Sub-SIS system.
Represent it in Mathematical formulation
Task two
Design time controller in term of Petri Net and introduce
Knowledge base
Time controller controls the time to evoke each phase of
SIS. Knowledge base contains five different moderate-scale
selection algorithms. KB can be changed and updated in the slow cycle.
Represent Time controller in Petri Net using ReNew Editor
Future work
Experiment: Use some real data (colon cancer data) to do the experiment and compare the results with some existing feature selection method like LASSO, forward, backward regression, etc.
Weka: Demo
I will use some visualization tool to visualize the result and
the process of feature selection.
Outline
Midway results
Framework of Slow Intelligence System
Tasks for project
Iterative feature selection
Introduction
Midway results
Diagram: Main SIS System
Diagram: Sub SIS system
Diagram: Petri-Net Model