Proposed Two-Parameter Estimator For Linear Regression Model With An application

A thesis

Submitted to the council of the college of Administration &Economics\ University of Karbala as partial fulfillment of the requirements for the Master degree in Statistics Sciences

By
 Noor Alzahraa Naeem Abd Ali

Supervision
 Prof.  Dr. Shrooq Abdul Redha Sa’aed Al Sabah

Abstract

Multicollinearity is described as the presence of a complete or incomplete linear relationship between all or some of the explanatory variables in the regression model, which leads to a violation of one of the assumptions of ordinary least squares (OLS), which is the absence of a correlation between the values ​​of the observations of the explanatory variables in the regression model to be estimated, with which the effect of the variables cannot be separated from each other, which leads to a violation of one of the assumptions of the Gauss-Markov theory, which states that there is no linear relationship between the independent variables in the model. As a result, the estimates are inaccurate, unstable, and unrepresentative of the reality represented by these phenomena.  The aim of this thesis is to propose a two-parameter estimator to estimate the parameters of the linear regression model that has the ability to face the problem of multicollinearity based on the previous information about the parameters to be estimated and to compare this estimator with [Ridge Regression estimator, Modified Ridge Regression estimator, Bayesian Ridge Regression estimator, Lui estimator, Modified Lui estimator, Shrinkage estimator, Kaciranlar’s two-parameter estimator for the linear regression model and Lokman et al.’s two-parameter estimator for the linear regression model] using the mean square error (MSE) criterion as well as the (AIC), (BIC), and (HQIC) criteria for each model by conducting Monte-Carlo simulations for small, medium, and large samples to study the behavior of the proposed method.  The proposed method was found to be the best of the remaining estimation methods because it achieved the lowest comparison criteria, but this method was close to the usual least squares method in the event that the correlation between the explanatory variables is very weak. The larger the sample size, the better the proposed method is. The least squares method failed to overcome the problem of multicollinearity in the event that multicollinearity is high between the explanatory variables, while the proposed method was very effective in solving this problem. In light of these results, a simple random sample of (100) women was drawn to study the factors affecting the number of children born, which represents the response variable Y and a group of variables affecting the phenomenon, with seventeen explanatory variables X. The least squares method and the proposed method were applied to this data. The data were analyzed through the Matlab program and it was found that the proposed method outperforms the usual least squares method because it has the lowest criteria in addition to the significance of the model.  The problem of multicollinearity in the applied data and the significance of the relationship between the dependent variable and the independent variables were revealed. The value of the coefficient of determination was (R2=0.95) and the value of the correlation coefficient was (r=0.97).