A Nonparametric Bayesian Technique for High-Dimensional Regression

Abstract

This paper proposes a nonparametric Bayesian framework called VariScan for simultaneous clustering, variable selection, and prediction in high-throughput regression settings. Poisson-Dirichlet processes are utilized to detect lower-dimensional latent clusters of covariates. An adaptive nonlinear prediction model is constructed for the response, achieving a balance between model parsimony and flexibility. Contrary to conventional belief, cluster detection is shown to be aposteriori consistent for a general class of models as the number of covariates and subjects grows. Simulation studies and data analyses demonstrate that VariScan often outperforms several well-known statistical methods.

Publication
Electronic Journal of Statistics