Comparing the Performance of Zero Mean Classification Functions under Unequal Misclassification Cost

Veiw figure View Table

The balanced error rate is given mathematically as

(9)

2.6.2. Cross Validation

Let and be the number of left out observations misclassified in group 1 and 2 respectively and it’s given by ^[9].

(10)

3. Results and Discussion

After the application of the methods in section 2 above, the following results were obtained in studying the effect of unequal misclassification cost on the two classification functions namely QDF and Minimum Expected Cost of Misclassification (MECM).

The equality of the two mean vectors for Monozygotic and Dizygotic twin groups were tested with Hoteling to ensure that the equal mean assumption is not violated. Based on the results, the test proved to be insignificant indicating that there is no significant difference between the two mean vectors.

3.1. Application with QDF

From the methodology section the QDF obtained under equal prior probabilities () and equal misclassification cost () was derived as: Allocate x to , otherwise to if

(11)

The QDF as well as the quadratic classification rules for equal and unequal misclassification ratios for in the order 1:1, 1:2, 1:3 and 1:4 were obtained as follows:

(12)

(13)

(14)

(15)

The following QDF’s were obtained when the cost ratios for the two groups were alternated as in the order of the following misclassification cost ratios: 1:2, 1:3, and 1:4.

(16)

(17)

(18)

Table 1a. Discriminant scores for the various misclassification cost ratios using QDF

Download as

Veiw figure View Table

Based on the above functions under the two misclassification cost ratios, the following discriminant scores were obtained in Table 1a and Table 1b.

Table 1b. Discriminant scores for the various misclassification cost ratios using QDF

Download as

Veiw figure View Table

From Table 1a and Table 1b, we observed no misclassified observations from the monozygotic twin group whilst three twin pairs of observations were misclassified from the dizygotic group. The proportion of correct classification was recorded as 0.80. Five (5) and three (3) observations were misclassified from monozygotic and dizygotic twin groups representing approximately 73 percent of correct classification when the cost of misclassifying an observation as monozygotic twin was twice the cost of the misclassifying observation as dizygotic twin. For ratio 1:3, the correct proportion of classification was 0.77 since 5 and 2 twin observations were misclassified from monozygotic and dizygotic groups. Seven (7) and nine (9) were misclassified from monozygotic and dizygotic groups with 70 percent correct classification of observations. After alternating the cost ratios, 0.73, 0.83 and 0.80 were the correct proportion of classification for 1:2, 1:3 and 1:4 misclassification cost respectively. (See Table 1a and Table 1b).

3.2. Results for the Various Misclassification Cost Ratios Using MECM Classification Rule

The optimal classification rule derived in the methodology section (equations 7 and 8) were used to derive the discriminant scores as shown in Table 2 and Table 2a below.

Table 2. Scores obtained from the optimum classification rule based on the MECM

Download as

Veiw figure View Table

Table 2a. Scores obtained as a result of the classification rule from MECM under various costs in the ratio order of c(2|1):c(1|2)

Download as

Veiw figure View Table

From Table 2, the discriminant scores generated under equal cost of misclassification (i.e. with cost ratio ) was able to misclassify 2 and 9 twin pair observations from both the monozygotic and the dizygotic groups respectively with proportion of correct classification as 0.633. The proportion for correct classification of twin observations under the cost ratios 1:2, 1:3 and 1:4 were obtained as 0.70, 0.70 and 0.80 respectively. Hence as the cost of misclassifying a twin observation into dizygotic twin group increases, better and maximum separation were achieved since few observations were misclassified hence with a recorded least error rates. The misclassification cost ratios were alternated in the order and the effect of the misclassification cost on the classification rule was assessed. Based on this, the proportion of correct classification as shown in Table 2a for the misclassification costs 1:2, 1:3, 1:4 were obtained as 0.33, 0.20 and 0.133. This results indicates that, as we increase the cost of allocating a twin observation into monozygotic group when it actually belongs to the dizygotic group, the proportion of correct classification reduces in that manner, hence increases the number of misclassified observations.

3.3. Evaluating the Classification Rules of QDF and MECM under Various Misclassification Costs

The performance of two classification functions namely QDF and MECM were evaluated by estimating their error rates as a result of the misclassified observations with CV and BER methods. From Table 3, we observed generally that, the error estimates obtained using CV error estimator was quiet higher than the estimates of that of the Balanced Error Rate (BER). However the mean error rates obtained for the QDF method under the cost ratios in the order c(1|2): c(2|1) recorded low error rates generally as compared to when the cost ratios were alternated in the order of c(2|1): c(1|2). This means that the discriminant functions performs better in the provision of maximum separation between the two twin groups when the associated misclassification cost assigned to misclassifying an observation to population 2 (dizygotic group) increases. Also the function performed slightly poor when the cost ratios exceeded 1:1 and 1:2 and somehow having no effect on the classification rule, under both misclassification cost situations. This shows some slight conformation with the research work of ^[1] in which they discovered that the misclassification of observation from the smallest group increased when the sample size ratio exceeded the sampling ratio 1:2 and this resulted in increases in error rate and was not corrected by increases in the sample size. Also ^[2] asserted that their three derived linear classifiers were insensitive under the cost ratios exceeding 1:2.

Similarly, the classification rule obtained under MECM performed slightly better when the cost of misclassifying an observation into the dizygotic group was increased beyond one in the ratio order c(1|2): c(2|1) than when it was alternated.

Table 3. Error rate estimates for QDF and MECM under unequal misclassification cost ratios

Download as

Veiw figure View Table