You are currently viewing High dimensional data challenges in estimating multiple linear regression.

High dimensional data challenges in estimating multiple linear regression.

High dimensional data challenges in estimating multiple linear regression
Jassim N Hussain
Department of Statistics, College of Administration & Economics, University of Kerbala, Iraq.
Email: jasim.nasir@uokerbala.edu.iq

Abstract
Nowadays, High dimensional data are quickly increasing in many areas because of the development of new technology which helping to collect data with a large number of variables in order to better understanding for a given phenomenon of interest. Multiple Linear Regression is a famous technique used to investigate the relationship between one dependent variable and one or more of independent variables and analyzing the effects of them. Fitting this model requests assumptions, one of them is large sample size. High dimensional data does not satisfy this assumption because the sample size is small compared to the number of explanatory variables (k). Consequently, the results of traditional methods to estimate the model can be misleading. Regularization or shrinkage techniques (e.g., LASSO) have been proposed to estimate this model in this case. Nonparametric method was proposed to estimate this model. Average mean square error and root mean square error criteria are used to assess the performance of nonparametric; LASSO and OLS methods in the case of simulation study and analyzing the real dataset. The results of simulation study and the analysis of real data set show that nonparametric regression method is outperformance of LASSO and OLS methods to fit this model with high dimensional data.