## Comparing the Performance of Zero Mean Classification Functions under Unequal Misclassification Cost

**Michael Asamoah-Boaheng**^{1,}, **Atinuke O. Adebanji**^{2}, **Nkansah Ababio**^{3}

^{1}School of Graduate Studies Research and Innovation, Kumasi Polytechnic, Box 854, Kumasi, Ghana

^{2}Department of Mathematics, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana

^{3}Kofi Agyei Senior High School, P.O Box AN 2471 Ash-Town Kumasi, Bampenase-Ashanti

### Abstract

In this study the performance of Minimum Expected Cost of Misclassification method (MECM) and Quadratic Discriminant Function approach (QDF) were compared and evaluated for the case of equal mean discrimination under unequal misclassification cost. 30 pairs of Female liked sex twins extracted from Stocks (1933) ten (10) variate data on 832 twin children was used for evaluation. Discriminant functions were derived under each of the following misclassification cost ratios; 1: 1, 1: 2, 1: 3 and 1: 4 and their error rate estimates determined using the Cross Validation (CV) and Balanced Error Rate (BER) methods. Least Mean error rates were recorded under QDF method as compared to that of MECM. The error rate estimates showed the QDF outperforming the MECM in the provision of maximum separation between the two groups. Also the two classification rules were found to be sensitive to misclassification cost ratios exceeding 1:2.

**Keywords:** classification rules, error rates, discriminant functions, minimum expected cost of misclassification

*American Journal of Applied Mathematics and Statistics*, 2014 2 (6),
pp 409-415.

DOI: 10.12691/ajams-2-6-9

Received October 22, 2014; Revised December 04, 2014; Accepted December 19, 2014

**Copyright**© 2013 Science and Education Publishing. All Rights Reserved.

### Cite this article:

- Asamoah-Boaheng, Michael, Atinuke O. Adebanji, and Nkansah Ababio. "Comparing the Performance of Zero Mean Classification Functions under Unequal Misclassification Cost."
*American Journal of Applied Mathematics and Statistics*2.6 (2014): 409-415.

- Asamoah-Boaheng, M. , Adebanji, A. O. , & Ababio, N. (2014). Comparing the Performance of Zero Mean Classification Functions under Unequal Misclassification Cost.
*American Journal of Applied Mathematics and Statistics*,*2*(6), 409-415.

- Asamoah-Boaheng, Michael, Atinuke O. Adebanji, and Nkansah Ababio. "Comparing the Performance of Zero Mean Classification Functions under Unequal Misclassification Cost."
*American Journal of Applied Mathematics and Statistics*2, no. 6 (2014): 409-415.

Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |

### 1. Introduction

The problem of equal mean classification or zero mean difference has posed a challenge to researchers for a long time and several attempts have been made at deriving parsimonious rules that address this hurdle. This study considered the equal mean discrimination problem by evaluating the performance of MECM and the QDF under the case of unequal misclassification cost. The problem of discrimination was first initiated by ^{[5]} in his paper titled “the use of multiple measurements in taxonomic problems”. Fishers approach to classification with two populations was based on arriving at a linear classification statistics using an entirely different argument. Fisher's idea was based on transforming multivariate observations '*x*' to univariate '*y*' observations such that 'y' being derived from either population one or population two were truly separated as much as possible ^{[9]}. ^{[8]} studied a variable selection criterion for linear discriminant rule and its optimality in high dimensional data where a new variable procedure was developed for selecting the true variable set. Also ^{[11]} studied discriminant analysis of multivariate time series and its application to diagnosis. They demonstrated with real and synthetic ECG data and concluded that their approach to classifying multivariate time series outperforms other well-known approaches for classifying multivariate time series.

^{[13]} first studied the problem of discrimination with common mean vector and with different covariance structure of two multidimensional normal populations. ^{[3]} considered the specific problem of zero-mean uniform discrimination. The problem was investigated for the case when the variance covariance matrices for the two groups were known to have uniform structures, with an assumption of equal and unequal correlation coefficients ().

A discriminant function based on equal correlation coefficient as well as uniform covariance matrix was obtained for classificatory purposes. ^{[4]} studied the same problem of common mean discrimination by focusing on the case where the difference of the means of the two groups are the zero vector. ^{[7]} based their discrimination on several methods in order to compare the performance of each of them after their application. They employed the classical approach, semi Bayesian and complete Bayesian approach, Discrimination with equal and unequal covariance matrices. ^{[10]} derived a simple model for discriminating among equal mean data for two populations and the problem of assigning observations to one of the two populations by an investigation into the covariance matrices of populations using the absolute deviations. He observed that the Absolute Linear Discriminant Function (ALDF) was almost as good as the QDF when the two groups are closer. In concluding, the derived absolute linear discriminant rule performed slightly worse than the quadratic discriminant rule as used earlier by ^{[3]} and ^{[4]}. His absolute linear rule performed reasonably well, after the data was contaminated by the introduction of outliers into the two groups/populations. On the other hand, the quadratic discriminant rule performed poorly after the contamination.

^{[12]} studied the asymptotic expected error rate for the equal-mean uniform-covariance discrimination problem. They approximated the unconditional expected error rate of the sample discriminant function up to the second order term and compared with the Monte Carlo simulation evaluated at several combinations of the parameters to ensure accuracy of the approximation of the error rate. ^{[2]} compared the performance of both Linear and Quadratic classifier under unequal cost of misclassification and concluded that both classifiers are insensitive to some specified cost ratios. They found that the procedures are insensitive to cost ratio exceeding ratio 1:2. ^{[1]} investigated the performance of the homoscedastic discriminant function (HDF) under the non-optional condition of unequal group representation (prior probabilities) in the population and the asymptotic performance of the classification function. The misclassification of observation from the smallest group increased when the sample size ratio exceeded the sampling ratio 1:2 and this resulted in increases in error rate and was not corrected by increases in the sample size.

### 2. Materials and Methods

The concept of Discrimination and Classification are concerned with separating objects from different populations into different groups and with allocating new observations to one of these groups. Discriminant analysis is rather exploratory in nature and used as a separative procedure which is normally employed on a one time basis. This section outlines the various methods and materials employed in obtaining the results for the study.

**2.1. Data Used**

Female like sex twins comprising thirty (30) pairs of monozygotic twins and 25 pairs of dizygotic twins was used for the data analyses. A sample size of fifteen (15) from each group were selected after 10 replications based on simple random sampling and the estimates of were estimated from the mean estimates of the 10 replicated samples. The final sample size, was selected based on the closeness of the estimated parameters to that of the mean values of the 10 replicated estimates. The 10 variables selected from the data included: Height (Ht), Weight (Wt), Head length (HL), Head breadth (HB), Head circumference (HC), Interpupillary distance (ID), Systolic blood pressure (SBP), Pulse interval (PI), Strength of left(SGL) grip, Strength of right grip (SGR). The difference between the first and the second recorded twin was taken as an observation for each variate. ^{[14]}. R-console version 2.15.1 was used to analyse the data.

**2.2. Discrimination and Classification for two Populations**

Let and be two *p*-variate groups with density functions and respectively. Now consider an observed value . The observed vector *X *must be assigned to either population or . Denote * *the sample space (collection of all possible outcomes of *X*) and partition the sample space as where is the subspace of outcomes which we classify as belonging to population and the subspace of outcomes classified as belonging to . It follows therefore that the (conditional) probability of classifying an object as belonging to when it really comes from equals;

(1) |

The conditional probabilities can also be obtained for when . Prior class probabilities are obtained when we want to obtain the probability of correctly and incorrectly classifying an observation/ objects. We denote the following. Let , be the prior probability of where . The overall probabilities of correctly and incorrectly classifying observations are: *P* (object is correctly classified as ) = Where . *P** *(object is misclassified as) = Where .

**2.3. Cost of Misclassification**

Denote as the cost of classifying an object/observation into when actually belongs to. Where the ECM is derived as

(2) |

Where and are the prior probabilities for the two populations. The two regions and are used to minimized the expected cost of misclassification.

(3) |

(4) |

**2.4. Classification of Normal Populations when (Quadratic Classification Rule)**

The regions of minimum ECM and minimum Total Probability of Misclassification (TPM) depends on the ratio of the densities. Hence substituting the normal densities with different covariance matrices in equation 3 and 4 after taking natural logarithm gives the following classification regions. Allocate *x *to or otherwise to if,

(5) |

Where

**2.5. The Minimum Expected Cost of Misclassification Method (MECM)**

Let be the multivariate normal density associated with population . Let = the prior probability of population . = the cost of allocating an item to when infact it belongs to for . For *k=i, *. Finally, let be the set of *x*’s classified as and *P* (classifying item as |) = for . Hence the Expected Cost of Misclassification (ECM) becomes:

(6) |

We seek rules that minimize the ECM. This leads to an optimal classification rule: classify an object () into if

(7) |

Also assign to if

(8) |

Where and are the density functions for both the Monozygotic and Dizygotic twin groups respectively. and being the monozygotic and dizygotic twin observations. ^{[9]}.

**2.6. Error Rate Estimation**

The performance of any classification procedure is based on the error rates or misclassification probabilities.

**2.6.1. Balanced Error Rate (BER)**

The Balanced Error Rate (BER) statistic is the average of the misclassification rates on samples drawn from populations and as shown in the table below. Where *a, b, c, d* are the entries in the confusion matrix.

The balanced error rate is given mathematically as

(9) |

**2.6.2. Cross Validation**

Let and be the number of left out observations misclassified in group 1 and 2 respectively and it’s given by ^{[9]}.

(10) |

### 3. Results and Discussion

After the application of the methods in section 2 above, the following results were obtained in studying the effect of unequal misclassification cost on the two classification functions namely QDF and Minimum Expected Cost of Misclassification (MECM).

The equality of the two mean vectors for Monozygotic and Dizygotic twin groups were tested with Hoteling to ensure that the equal mean assumption is not violated. Based on the results, the test proved to be insignificant indicating that there is no significant difference between the two mean vectors.

**3.1. Application with QDF**

From the methodology section the QDF obtained under equal prior probabilities () and equal misclassification cost () was derived as: Allocate *x *to , otherwise to if

(11) |

The QDF as well as the quadratic classification rules for equal and unequal misclassification ratios for in the order 1:1, 1:2, 1:3 and 1:4 were obtained as follows:

(12) |

(13) |

(14) |

(15) |

The following QDF’s were obtained when the cost ratios for the two groups were alternated as in the order of the following misclassification cost ratios: 1:2, 1:3, and 1:4.

(16) |

(17) |

(18) |

Based on the above functions under the two misclassification cost ratios, the following discriminant scores were obtained in Table 1a and Table 1b.

From Table 1a and Table 1b, we observed no misclassified observations from the monozygotic twin group whilst three twin pairs of observations were misclassified from the dizygotic group. The proportion of correct classification was recorded as 0.80. Five (5) and three (3) observations were misclassified from monozygotic and dizygotic twin groups representing approximately 73 percent of correct classification when the cost of misclassifying an observation as monozygotic twin was twice the cost of the misclassifying observation as dizygotic twin. For ratio 1:3, the correct proportion of classification was 0.77 since 5 and 2 twin observations were misclassified from monozygotic and dizygotic groups. Seven (7) and nine (9) were misclassified from monozygotic and dizygotic groups with 70 percent correct classification of observations. After alternating the cost ratios, 0.73, 0.83 and 0.80 were the correct proportion of classification for 1:2, 1:3 and 1:4 misclassification cost respectively. (*See **Table 1**a and **Table 1**b*).

**3.2. Results for the Various Misclassification Cost Ratios Using MECM Classification Rule**

The optimal classification rule derived in the methodology section (equations 7 and 8) were used to derive the discriminant scores as shown in Table 2 and Table 2a below.

From Table 2, the discriminant scores generated under equal cost of misclassification (i.e. with cost ratio ) was able to misclassify 2 and 9 twin pair observations from both the monozygotic and the dizygotic groups respectively with proportion of correct classification as 0.633. The proportion for correct classification of twin observations under the cost ratios 1:2, 1:3 and 1:4 were obtained as 0.70, 0.70 and 0.80 respectively. Hence as the cost of misclassifying a twin observation into dizygotic twin group increases, better and maximum separation were achieved since few observations were misclassified hence with a recorded least error rates. The misclassification cost ratios were alternated in the order and the effect of the misclassification cost on the classification rule was assessed. Based on this, the proportion of correct classification as shown in Table 2a for the misclassification costs 1:2, 1:3, 1:4 were obtained as 0.33, 0.20 and 0.133. This results indicates that, as we increase the cost of allocating a twin observation into monozygotic group when it actually belongs to the dizygotic group, the proportion of correct classification reduces in that manner, hence increases the number of misclassified observations.

**3.3. Evaluating the Classification Rules of QDF and MECM under Various Misclassification Costs**

The performance of two classification functions namely QDF and MECM were evaluated by estimating their error rates as a result of the misclassified observations with CV and BER methods. From Table 3, we observed generally that, the error estimates obtained using CV error estimator was quiet higher than the estimates of that of the Balanced Error Rate (BER). However the mean error rates obtained for the QDF method under the cost ratios in the order c(1|2): c(2|1) recorded low error rates generally as compared to when the cost ratios were alternated in the order of c(2|1): c(1|2). This means that the discriminant functions performs better in the provision of maximum separation between the two twin groups when the associated misclassification cost assigned to misclassifying an observation to population 2 (dizygotic group) increases. Also the function performed slightly poor when the cost ratios exceeded 1:1 and 1:2 and somehow having no effect on the classification rule, under both misclassification cost situations. This shows some slight conformation with the research work of ^{[1]} in which they discovered that the misclassification of observation from the smallest group increased when the sample size ratio exceeded the sampling ratio 1:2 and this resulted in increases in error rate and was not corrected by increases in the sample size. Also ^{[2]} asserted that their three derived linear classifiers were insensitive under the cost ratios exceeding 1:2.

Similarly, the classification rule obtained under MECM performed slightly better when the cost of misclassifying an observation into the dizygotic group was increased beyond one in the ratio order c(1|2): c(2|1) than when it was alternated.

In comparing the two methods based on their mean error rates in Table 3, we discovered that, the error rate for the MECM was appreciably higher ranging from approximately 17% to 82%. Whilst QDF recorded least error rates ranging from approximately 17%-28% under the various misclassification cost ratios. Hence the QDF outperformed the MECM in the provision of maximum separation between the two twin observations. The results conforms to the work of ^{[10]} where his derived absolute linear discriminant rule performed slightly worse than the quadratic discriminant rule as used by ^{[3]} and ^{[4]}.

### 4. Conclusion

Two classification methods were studied; QDF and MECM when the assumptions of equal misclassification cost were violated. Generally, most of the twin observations were correctly classified under the various misclassification cost ratios in the order of c(1|2): c(2|1) than when it was alternated. Hence increasing the cost of misclassifying a dizygotic observation provided maximum separation than when the cost of misclassifying an observation into the monozygotic twin group was increased. Also both classification methods were found to be sensitive when the misclassification cost ratios exceeded 1:2. Least mean error rates were recorded for the QDF based on the misclassified observations whiles the MECM recorded high mean error rates, thus outperforming the MECM. Hence maximum separation between the two twin groups (monozygotic and Dizygotic) with equal group vectors assumed were provided by the Quadratic Discriminant Function (QDF) classification method.

### References

[1] | Adebanji, A.O., Adeyemi, S., and Iyaniwura, J.O., “Effect of sample size Ratio on the Performance of the Linear Discriminate function”, International Journal of Modern Mathematics 3 (1), 97-108. 2008. | ||

In article | |||

[2] | Ariyo, O.S., and Adebanji, A.O, “Effect of Misclassification Costs on the Performance Functions”, Paper Presented at the 45^{th} Annual Conference of the Science Association of Nigeria (SAN) held Niger Delta University, Wilberforce, Island, Bayelsa State, 2010. | ||

In article | PubMed | ||

[3] | Bartlett, M.S., and Please, N.W., “Discrimination in the case of zero mean differences. Biometrika, 50 (1/2): 17-21. 1963. | ||

In article | CrossRef | ||

[4] | Desu, M.M. and Geisser, S., “Methods and applications of equal-mean discrimination”, Discriminant Analysis and Applications, pages 139-161. 1973. | ||

In article | |||

[5] | Fisher, R.A., “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eungenics, 7, 179-188. 1936. | ||

In article | CrossRef | ||

[6] | Ganeslingam, S., Nanthakumar, A., and Ganesh, S., “A comparison of quadratic discriminant function with discriminant function based on absolute deviation from the mean”, Journal of Statistics and Management Studies, 9:441-457. 2006. | ||

In article | |||

[7] | Geisser, S. and Desu, M.M., “Predictive zero-mean uniform discrimination”, Biometrika, 55, No. 3: 519-524. 1968 | ||

In article | CrossRef | ||

[8] | Hyodo, M. and Kubokawa, T., “A variable selection criterion for linear discriminant rule and its optimality in high dimensional and large sample data”, Journal of multivariate Analysis, 123, 364-379. 2014 | ||

In article | CrossRef | ||

[9] | Johnson, R.A. and Wichern, D.W., Applied Multivariate Statistical Analysis, (Sixth ed). NJ: Pearson Education, Inc. 575-621. 2007. | ||

In article | PubMed | ||

[10] | Lachenbruch, P.A., “Zero-mean difference discrimination and the absolute linear discriminant function”, Biometrika, 62 (2): 397-401. 1975. | ||

In article | CrossRef | ||

[11] | Maharaj, E.A., Alonso, A.M., “Discriminant analysis of multivariate time series: Application to diagnosis based on ECG signals”, Computational Statistics and Data Analysis, 70 (2014): 67-87. 2014. | ||

In article | |||

[12] | Marco, V.R.., Young, D.M., and Tubne, D.W., “Asymptotic expansions and estimation of the expected error rate for equal-mean discrimination with uniform covariance structure”, Biometrika, 29 (1): 103-111. 1987. | ||

In article | CrossRef | ||

[13] | Okamoto, M., “Discrimination for variance matrices. Osaka Math., 13: 1-39. 1961. | ||

In article | |||

[14] | Stocks, P., “A biometric investigation of twins, Part II”, Ann. Eugen, 5, 1-55. 1933. | ||

In article | CrossRef | ||