Lasso-based Factor Screening Analysis and Optimal Supersaturated Designs
School of Industrial Engineering
Complex systems such as large-scale computer simulation models typically involve a large number of factors. When investigating such a system, screening experiments are commonly used to sift through these factors to identify a subgroup of them that most significantly influence the interested response. The identified factors are then subject to further and more carefully investigation. A good screening experiment can efficiently allocate the experimental resources to those important factors, and can therefore greatly expand the ability of analysts and decision makers to gain insights from a complicated system in a reasonable amount of time. This talk will discuss our current work on L1-penalty-based screening strategies. The L1-based shrinkage methods can be casted as penalized or regularized least squares problems with continuous objective function. Meishalshen and Buhlsman (2006) and Zhao and Yu (2006) independently identified a sufficient and almost necessary condition called the irrepresentible condition under which Lasso is sign consistent in variable selection. This condition is determined only by the design or data matrix, not the actual observations. The relationship between the variable selection performance of these methods and the irrepresentible condition can be further derived for both finite and asymptotic samples. When sample size is sufficiently large, the asymptotic results shows that if the irrepresentable condition is satisfied, the probability of correct selection will converges to 1; otherwise, it will shrink to 0. But for small samples, LASSO will converge with a probability <1. While the past work is fundamental to understand the performance of LASSO in variable selection, the focus has been on asymptotic samples. The variable selection performance for the finite sample, which is prevailing in practice, is not well studied. In this work, we define and deduct the correct selection probability for finite sample with LASSO. Specifically, we prove that there always exist a lower and upper bounds of the probability of correct selection with finite sample for a given λ. This result motivates us to develop an algorithm that iteratively update λ to maximize the probability of correct selection. Compared with other prevailing model selection methods like cross validation and Cp methods, the proposed framework focuses directly on the probability of correct variable selection, instead of the variance of the model fitting. Therefore it should be more appropriate for model selection purpose. We offer a Matlab package of the proposed method, and the numerical results show that it outperforms other selection methods significantly. To our best knowledge, this is the first work focusing on finite samples, and the results can be applied directly in practice. Based on the result, we further propose optimality criteria for constructing supersaturated designs that guarantee variable selection performance. The generated designs are optimal for Lasso-based screening experiments.