Feature Selection through Lasso


March 6, 2006


Bin Yu


Statistics Department, UC Berkeley


Information technology advances are making data collection possible in most if not all fields of science and engineering and beyond. Statistics as a scientific discipline is challenged and enriched by the new opportunities resulted from these high-dimensional data sets. Often data reduction or feature selection is the first step towards solving these massive data problems. However, data reduction through model selection or l0 constrained optimization leads to combinatorial searches which are computationally expensive or infeasible for massive data problems. A computationally more efficient alternative to model selection is l1 constrained optimization or Lasso optimization.

In this talk, we will describe the Boosted Lasso (BLasso) algorithm that is able to produce an approximation to the complete regularization path for general Lasso problems. BLasso consists of both a forward step and a backward step. The forward step is similar to Boosting and Forward Stagewise Fitting, but the backward step is new and crucial for BLasso to approximate the Lasso path in all situations. For cases with finite number of base learners, when the step size goes to zero, the BLasso path is shown to converge to the Lasso path. Experimental results are also provided to demonstrate the difference between BLasso and Boosting or Forward Stagewise Fitting. We can extend BLasso to the case of a general convex loss penalized by a general convex function and illustrate this extended BLasso with examples.

Since Lasso is used as a computationally more efficient alternative to model selection, it is important to study the model selection property of Lasso. I will present some (almost) necessary and sufficient conditions for Lasso to be model selection consistent in the classical case of small number of features and large sample size. (This is joint work with Peng Zhao at UC Berkeley.)


Bin Yu

Bin Yu is Professor of Statistics at University of California at Berkeley. She also holds a ChangJiang Chair Professorship at Peking University, and is the founding co-director of the Microsoft Lab on Statistics and Information Technology at Peking University. Her current research interests are machine learning, information theory, and statistical problems from remote sensing, internet tomography, sensor networks, neuroscience, finance, and bioinformatics. She was in the S.S. Chern Mathematics exchange program between China and US in 1985, and obtained her B.S. (1984) from Peking University and M.A. (1987) and PhD (1990) from University of California at Berkeley. She was a regular or visiting faculty at University of Wisconsin at Madison, Yale University, MIT and ETH (Zurich). She was a Member of Technical Staff at Lucent Bell Labs from 1998-2000 and holds two U.S. patents. She is a Fellow of IEEE and IMS (Inst of Math. Stats) and ASA (American Statistical Association), was a Special Inviated (now Medallion) Lecturer of IMS in 1999, and a Miller Research Professor at Berkeley in spring 2004. She has served and is serving on many editorial boards including the Annals of Statistics, J. Amer. Statist. Assoc. and J. Machine Learning Research. She was a guest co-editor for Statistica Sinica on Bioinformatics and for IEEE Trans. Signal Processing on Machine Learning, and a co-editor of a Springer book on Nonlinear regression and classification.