Robust classification Using Nonparametric Kernel Discriminant Analysis with an Application
A thesis
Submitted to the council of the college of Administration &Economics\ University of Karbala as partial fulfillment of the requirements for the Master degree in Statistics Sciences
By
Ja’afar Ali Farhan
Supervision
Asst. Prof. Dr. Enas Abdul Hafedh Mohammed
Abstract
The majority of data in our real world deviates from the ideal assumptions required by traditional statistical methods, which causes a violation of the assumption of normality in the data, or there is data collected that represents non-linear data, and as a result we may face a problem in classification. Traditional discriminant analysis cannot confront this problem, so it must From searching for a robust method that deals with this problem, therefore, this thesis aimed to use the Robust Kenel Discriminant Analysis (RKDA) method in case the data deviate from its normal state and compare it with traditional Robust Kenel Discriminant Analysis and quadratic discriminant analysis using the classification error rate criterion. (MR) ̂ To choose the best classification method, through two aspects: the experimental aspect, and using Monte-Carlo simulation experiments. It was found that the linear discriminant analysis method is better than the rest of the discriminant analysis methods when the target density functions are normally distributed (D, E), and that the method Core discriminant analysis achieved an advantage in Gaussian density states (D, E) at sample size (n=1000, 5000). The core discriminant analysis method achieved an advantage over the rest of the methods when the density function (K) was achieved by a small percentage. The hippocampal core discriminant analysis method also achieved an advantage over other methods when density functions deviate from the normal distribution with a high percentage of preference.
In applied side, we depend on the reports of the laboratory unit at Al-Hussein Teaching Hospital in the Holy Governorate of Karbala for the purpose of obtaining variables related to lymphocytic leukemia, which included 100 observations from males and females. The observations were divided into two groups, the first It included people who did not have the disease with a size of (50) views, and the second included people with the disease with a size of (50) views. The application variables were Y: a variable such as having or not having the disease. The explanatory variables are X1: the sex of the infected person, X2: white blood cells (WBC). Blood Cells), X3: RBC (Red Blood Cells), X4: HGB (Hemoglobin Blood) percentage, and The classification for the first group is M͡R1 (0.12) and for the second group is M͡R2 (0.56). Thus, the overall classification error rate (M͡R) was (0.34), which is a small error rate that indicates the accuracy of the classification.