Rotterdam Seminars Econometric Institute

Speaker(s)
Hannes Leeb (University of Vienna, Austria)
Date
Thursday, 1 November 2018
Location
Rotterdam

We study linear subset regression in the context of a high-dimensional linear model. Consider y = a + b’z + e with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x = M’z, for some d x p matrix M.  Here, `high-dimensional’ means that the number d of available explanatory variables in the overall model is much larger than the number p of  variables in the submodel.  In this paper, we present Pinsker-type results for prediction of y given x.  In particular, we show that the mean squared prediction error of the best linear predictor of y given x is close to the mean squared prediction error of the corresponding Bayes predictor $E[y|x], provided only that p/log(d) is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from n independent observations of (y,x) is close to that of the Bayes predictor, provided only that both p/log(d) and p/n are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables z.