multi-label prediction via compressed sensing

Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University 01-22-2010 * Some notes are directly copied from the original paper.

Upload: ronli

Post on 18-Jan-2016




0 download


Multi-Label Prediction via Compressed Sensing. By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009). Presented by: Lingbo Li ECE, Duke University 01-22-2010. * Some notes are directly copied from the original paper. Outline. Introduction Preliminaries Learning Reduction - PowerPoint PPT Presentation


Page 1: Multi-Label Prediction via Compressed Sensing

Multi-Label Prediction via Compressed Sensing


Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang

(NIPS 2009)

Presented by: Lingbo LiECE, Duke University


* Some notes are directly copied from the original paper.

Page 2: Multi-Label Prediction via Compressed Sensing


• Introduction

• Preliminaries

• Learning Reduction

• Compression and Reconstruction

• Empirical Results

• Conclusion

Page 3: Multi-Label Prediction via Compressed Sensing


• Large database of images;

• Goal: predict who or what is in a given image

Samples: images with corresponding labels

is the total number of entities in the whole database.

• One-against-all algorithm:

Learn a binary predictor for each label (class).

Computation is expensive when is large. e.g. ,

• Assume the output vector is sparse.

310 410


dyyyy }1,0{),...,,( 21



Page 4: Multi-Label Prediction via Compressed Sensing


{ , , J , , , }Mike James ulie Nick Joe Linda











image x

Main idea: “Learn to predict compressed label vectors, and then use sparse reconstruction algorithm to recover uncompressed labels from these predictions”

Compressed sensing:For any sparse vector , it is highly possible to compress to logarithmic in dimension with perfect reconstruction of .


d y

Page 5: Multi-Label Prediction via Compressed Sensing


• : input space; • : output (label) space, where • Training data: • Goal: to learn the predictor with low mean-

squared error

Assume• is very large;• Expected value is sparse, with only a few non-zero



Page 6: Multi-Label Prediction via Compressed Sensing

Learning reduction

• Linear compression function where

• Goal: to learn a predictor

Predict the label y with the Predictor F

Predict the compressed label Ay with

the Predictor H

Samples Compressed Samples

To minimize To minimize

Page 7: Multi-Label Prediction via Compressed Sensing

Reduction-training and prediction

Reconstruction Algorithm R:

If is close to , then should be close to

Page 8: Multi-Label Prediction via Compressed Sensing

Compression Functions

Examples of valid compression functions:

Page 9: Multi-Label Prediction via Compressed Sensing

Reconstruction Algorithms

Examples of valid reconstruction algorithms: iterative and greedy algorithms

• Orthogonal Matching Pursuit (OMP)

• Forward-Backward Greedy (FoBa)

• Compressive Sampling Matching Pursuit (CoSaMP)

Page 10: Multi-Label Prediction via Compressed Sensing

General Robustness Guarantees

What if the reduction create a problem harder to solve than the original problem?

Sparsity error is defined as

where is the best k-sparse approximation of y

Page 11: Multi-Label Prediction via Compressed Sensing

Linear Prediction

• If there is a perfect linear predictor of , then there will be a perfect linear predictor of :

Page 12: Multi-Label Prediction via Compressed Sensing

Experimental Results• Experiment 1: Image data (collected by the ESP Game)

65k images, 22k unique labels; Keep the 1k most frequent labels;

the least frequent occurs 39 times while the most frequent occurs about 12k times, 4 labels on average per image;

Half of the data as training and half as testing.

• Experiment 2: Text data (collected from

16k labeled web page, 983 unique labels;

the least frequent occurs 21 times, the most frequent occurs about 6500 times, 19 labels on average per web page;

Half of the data as training and half as testing.

• Compression function A: select m random rows of the Hadamard matrix.

• Test the greedy and iterative reconstruction algorithm: OMP, FoBa, CoSaMp and Lasso.

• Use correlation decoding (CD) as a baseline method for comparisons.


Page 13: Multi-Label Prediction via Compressed Sensing

Experimental Results

MeasureMeasure the precision



k22 y distance yl

Top two: image data; Bottom: text data

Page 14: Multi-Label Prediction via Compressed Sensing


• Application of compressed sensing to multi-label prediction problem with output sparsity;

• Efficient reduction algorithm with the number of predictions equal to logarithmic in original labels;

• Robustness Guarantees from compressed case to the original case; and vice versa for the linear prediction setting.