Selecting Appropriate Variables for DEA Using Genetic Algorithm (GA) Search Procedure

View current figure in a new window

Figures index

Veiw figure View Figure

2.2. Search Procedure of Subset Using GA and RM Criteria

With the aid of the subselect R package contributed by ^[19], best subset of variables for the study was obtained and the essence of the GA search procedure is given here. For full detail discussion of search procedure can be seen in ^{[19, 20]}.

The procedure of selection process is simple and as follows. For any subgroup of variables (say, ), a - variable subsets is randomly selected from the full data set of variables as an initial population (N), where (). In each iteration, the number of child-bearing couples (parents) to be formed is half the size of the population (ie., N/2) and each couples generates a child (a new - variable subsets) which takes over all the properties of its parents. Each father is selected among the members of the population with probability proportional to his value of the criterion. For each father F, a mother M is selected with equal probability among the members of the population which have at least two variables not belonging to F. The child produced by each pair (F; M) includes all the variables which belong to both parents. The remaining variables are selected with equal probability from the parents’ symmetric difference with the additional restriction that at least one variable from M\F and one from F\M will be selected. Each offspring may optionally undergo a mutation in the form of a local improvement algorithm, with a user-specified probability. The parents and offspring are ranked according to their criterion value, and the best population of these -subsets will make up the next generation, which is used as the current population in the subsequent iteration. The stopping criterion for the number of generation is based on .

For measuring the quality of variable subset RM criteria was used. ^{[21, 22]} defined four different types of criteria to measure the quality of subset variables. RM criterion is equivalent to the second method of those four criteria. It is a simple concept, a weighted average of the multiple correlations between each principal component (PC) of the full data set and the r - subset variables. Further, RM criteria has also been referred by ^{[19, 23]}. The value of RM coefficient lies between 0 and 1.

2.3. RM Coefficient

Where,

is the full data matrix;

is the orthogonal projection matrix on the subspace spanned by a given variable subset;

is the correlation of (covariance) matrix of the full data set;

is the index set of the variables in the variable subset;

is the principal submatrix of which results from retaining the rows and columns whose indices belongs to R;

is the principal submatrix of obtained by retaining the rows and columns associated with set

is largest eigenvalue of the covariance (or correlation ) matrix defined by A;

is the multiple correlation between the principal component of the full data set and the -variable subset.

3. Case Study: Indian Banking Sector

3.1. Initial Variables Selection

According to ^[24], in banking theory literature, there are two approaches for selection of input and output for DEA, viz., production and intermediation approach. The production approach defines the bank activity as production of services and views the banks as using physical inputs such as labor and capital to provide deposits and loans accounts. On the other hand, intermediation approach views banks as the intermediating funds between savers and investors. Banks collect deposits, using labor and capital, and then intermediate those sources of funds to loans and other earning assets. Intermediation approach is more suitable and most widely used in the banking literature reported by ^[25]. Production approach is more suitable for the analysis of bank branch efficiency and at the same time, intermediation approach more suitable for cross-sectional bank studies and also its quite popular in empirical research (^{[26, 27]}). Therefore, in the present study, intermediation approaches are followed to estimate the efficiency of banks. Even then, it is possible to see that different authors using different variables for the same problem. Initial variables for the study are selected after carefully examining literatures based on efficiency estimation on Indian commercial banks. The maximum number of times repeated variables from recent literatures are taken as initial variables. In intermediation approach variable deposit is used as inputs. Table 1. shows the initial variables and its code.

Table 1. Initial variables and its codes

Download as

Veiw figure View Table

View next table

3.2. Bank and Data Selection

Commercial banks in India (DMUs) for the present study are determined based on the following criteria i) Banks should be active in the Indian business market for a minimum period of five years (2008 – 2012), ii) Every selected bank should have more than 3 branches and 100 employees and iii) Banks should not be continuously in loss for 2 years. Based on the above conditions, 55 commercial banks were selected for the study of which 26 are public sector banks (six SBI and its association and twenty nationalized banks), 20 private sector banks (thirteen old and seven new private banks) and 9 foreign banks.

The present study deals with the secondary data for the year 2012 published in web pages of Reserved Bank of India (RBI) and Indian Banks’ Association (IBA) are used for the analysis efficiency of commercial banks in India. The initial data set consists of 55 banks and 15 variables (both inputs and outputs).

4. Result and Discussion

4.1. Selection of Best Subset of Inputs and Outputs Using GA

As stated earlier, Subselect R package contributed by ^[19] was used, and best subset of variables for the study was obtained. The process of selecting the best subset of variables was subjective in nature. Several solutions are generated using different number of values viz (1, 2, 3…, n-1). Subsets of variables have been obtained separately for inputs and outputs using the dataset for the year 2012. At initial stage, eight input variables were selected for the present study. The variable TCO was removed before executing the GA due to correlation error encounter which affects the search algorithm while obtaining subsets. Similarly, TIE from output variable was also removed for the same reason. Therefore, maximum number of subset for input and output became six and five respectively. In DEA, ^[28] provides two thumb rules for the selection of sample size; a) n ≥ max(S * P), which states that sample size should be greater than or equal to product of inputs and outputs; b) n ≥ 3(S + P), states that the number of observation in the data should be at least three times the sum of the inputs and outputs, where n is the sample size (DMU’s), S is the number of inputs and P is the number of outputs. Based on these conditions, the present study uses maximum number of subsets available because number of commercial banks (DMU’s) was 55 which was greater than (S*P) = (6*5) = 30 and 3(S+P) = 3(6+5) = 33.

Table 2. Results of subsets and its best value of inputs and outputs

Download as

Veiw figure View Table

View previous table

View next table

Table 2. shows the subset of input and output variables and its best value obtained from GA search procedure for different values of . For r value 6 in input and 5 in output obtains the maximum best values (0.99955 and 0.99988).

4.2. Selection of Best DEA Model

By applying DEA ( input oriented – VRS ) technique, efficiency of banks was computed for different combinations of subsets of input and outputs. Analysis started with r = 1 for input and output (input variable DEP and output variable LAA). Model is named as M11. Further, computation was carried by keeping the same input and increasing the r value (2, 3, 4 and 5) for output and models named as, M12, M13, M14 and M15. Likewise, the same methodology was followed for the remaining subsets of both inputs and outputs reported elsewhere. A total of 35 models were constructed in the present study search process.

Table 3 exhibited the variables used in different models, number of efficiency, average efficiency scores and percentage of banks efficiency change by 10%. Selection process was done as follows. First, percentage difference of efficiency scores for model M11 and M12 were computed; approximately 84% difference was found which was greater than 10%, as a result, M12 model was retained. Then computed percentage difference between model M12 against M13 was found to be approximately 2% difference and again model M12 was retained and was kept as a base model till next model obtained had more than 10% of efficiency difference. While computing percentage difference between model M12 and M15 approximately 46% difference was found which is greater than 10%, as a result M15 retained as base model for computing difference with other models. This process was carried till end of the models (M65) and found none of the model obtained was greater than 10% difference and finally M15 was chosen as the best model for further study.

Table 3. Results of model specification search

Download as

Veiw figure View Table

View previous table

View next table

Final model for DEA is shown in Table 4.

Table 4. Final model for DEA

Download as

Veiw figure View Table