ISSN(Print): 2328-7306
ISSN(Online): 2328-7292

Article Versions

Export Article

Cite this article

- Normal Style
- MLA Style
- APA Style
- Chicago Style

Research Article

Open Access Peer-reviewed

Pius Miri Ng’ang’a^{ }, Antony Waititu Gichuhi, Antony Wanjoya, Thomas Mageto

Received August 10, 2018; Revised September 17, 2018; Accepted October 07, 2018

Artificial Neural Network (ANN) is a parallel connection of a set of nodes called neurons which mimic biological neural system. Statistically, ANN represents a class of non-parametric models which is capable of approximating a non-linear function by a composition of low dimensional ridge functions. This study aimed at modeling diabetes mellitus among adult Kenyan population using 2015 stepwise survey data from Kenya National Bureau of Statistics. Data analysis was carried out using R statistical software version 3.5.0. Among the input variables Age, Sex, Alcoholic status, Sugar consumption, Physical Inactivity, Obesity status, Systolic and Diastolic blood pressure had a significant relationship with diabetic status at 5% level of significance. A multi layered feed-forward neural network with a back propagation algorithm and a logistic activation function was used. Considering a parsimonious model, the model selected had the eight input variables with two neurons in the hidden layer since it gave a minimum MSE of 0.0580 reported. 75% of data was used for training while 25% was used for testing. The sensitivity of the trained network was reported as 75% while specificity was 94.29%. The overall accuracy of the model was 84.64% . This implied that the model could correctly classify an individual as either diabetic or not with an accuracy rate of 84.64%. A 10-fold cross validation was carried out and an average MSE of 0.0686 reported. Kolmogorov-Smirnov test of normality was carried out and at 5% level of significance, for most parameter estimates, we failed to reject the null hypothesis and concluded that the network parameter estimates were asymptotically normal and consistent. With a good choice of risk factors for diabetes, neural network structures could be successfully used to accurately model diabetes melitus among Kenyan adult population.

Artificial Neural Networks have recently received a great deal of attention in many fields of study. This is due to the fact that ANN attempts to model the capabilities of human brain. They have been used in a variety of applications where statistical methods are traditionally employed. Globally, an estimated 422 million adults were living with diabetes in 2014, compared to 108 million in 1980. The global prevalence (age-standardized) of diabetes has nearly doubled since 1980, rising from 4.7% to 8.5% in the adult population. This reflects an increase in associated risk factors such as being overweight or obese. Over the past decade, diabetes prevalence has risen faster in low- and middle-income countries than in high-income countries ^{ 1}. Diabetes caused 1.5 million deaths in 2012. Higher-than-optimal blood glucose caused an additional 2.2 million deaths, by increasing the risks of cardiovascular and other diseases. Forty-three percent of these 3.7 million deaths occur before the age of 70 years. The percentage of deaths attributable to high blood glucose or diabetes that occurs prior to age 70 is higher in low- and middle-income countries than in high-income countries ^{ 1}.

Diabetes can be classified as type 1(which requires insulin injections for survival) and type 2(where the body cannot properly use the insulin it produces). The majority of people with diabetes are affected by type 2 diabetes. This used to occur nearly entirely among adults, but now occurs in children too.

Sophisticated laboratory tests are usually required to diagnose diabetes. To complement this, researchers are nowadays turning to use of computer based diagnoses which sometimes can be more accurate than the clinical diagnosis. One such computer based diagnosis is the use of Artificial Neural Network. The neural network, firstly developed in 1943, is a part of artificial intelligence developed to predict a model outcome. When the output of the network is discrete, then this is a classification and when the output has continuous values it is performing prediction ^{ 2}. This is a suitable and powerful tool to help doctors in the medical field with several advantages such as the ability to deal with a great amount of data and reduced time of diagnoses. The ability of neural networks to produce good prediction results in classification and regression problems has motivated its use on data related to health outcomes such as death or illness diagnosis ^{ 3}, ^{ 4}. In such studies, the dependent variable of interest is a class label, and the set of possible explanatory variables which are the inputs to the neural networks may be binary or continuous. In this study, ANN was used to classify the individual either as diabetic or non-diabetic based on input variables. The input variables were the physical risk factors for diabetes (Age, Sex, Smoking behavior, Alcoholic status, Salt consumption, Sugar consumption, Physical Inactivity) and secondary risk factors (Obesity status, Systolic and Diastolic blood pressure).

Diagnostics of diseases is broad and challenging area. Its task is to detect a disease that patient with the symptoms have. This process is very complicated, because not all disease’s symptoms are specific to only one disease and often the symptoms overlaps. Errors caused by human factor are not rare in this process. To eliminate human error, in modern medicine, different technologies are used nowadays. Some of them are clinical decision support systems. Using information about a patient’s condition in the mathematical model, the probable diagnosis can be determined. These mathematical models include Artificial Neural Networks. Artificial Neural Networks (ANNs) play a vital role in the medical field in solving various health problems like validating clinical diagnosis of various diseases.

The main objective of this study was to apply artificial neural network in diagnosing diabetes mellitus among Kenyan adult population. Specifically, the study aimed at: (1) determinining the relationship between diabetes mellitus status and various risk factors, (2) exploring the asymptotic properties of Artificial Neural Network parameter estimates and (3) ascertaining the best Artificial Neural Network models for diagnosing diabetes.

Generally, in Kenya not much of the study has been carried out to diagnose diabetes mellitus using ANN for adult Kenyan population. However, a lot of research has been done using ANN in medical diagnosis worldwide

Some of the classical Statistical tools applied for prediction and diagnosis in many disciplines are Discriminant analysis ^{ 5, 6}; Logistic regression ^{ 7}; Bayesian approach ^{ 8} and Multiple Regression ^{ 9, 10, 11, 12}. All these models have been proven to be very effective for solving relatively less complex statistical problems ^{ 13}. On the other hand, real world problems are very complex in nature and as such classical models rely heavily on priori assumptions.

To overcome this problem, Artificial Neural networks are increasingly becoming important due to the following reasons. First, as opposed to the classical model-based methods, ANNs are data-driven self-adaptive methods in that there are few a priori assumptions about the models for problems under study. They learn from examples and capture very complex functional relationships among the data even if the underlying relationships are unknown or hard to describe ^{ 14}. Second, ANNs can generalize. After learning the data presented to them (a sample), ANNs can often correctly infer the unseen part of a population even if the sample data contain noisy information. Third, ANNs are universal functional approximators. It has been shown that a network can approximate any continuous function to any desired accuracy ^{ 15, 16, 17, 18, 19}. ANNs have more general and flexible functional forms than the traditional statistical methods can effectively deal with. Due to these properties, ANN is increasingly becoming popular as compared to traditional statistical models.

Artificial neural networks provide a powerful tool to help doctors to analyze, model and make sense of complex clinical data across a broad range of medical applications. Most applications of artificial neural networks to medicine are classification problems; that is, the task is on the basis of the measured features to assign the patient to one of a small set of classes ^{ 20}.

There are several reviews concerning the application of ANNs in medical diagnosis. The concept was first outlined in 1988 in the pioneering work of ^{ 21} and since then many papers have been published. In his work, ^{ 22} used artificial neural networks to find potent combination of key variables which accurately identified specific analytes and their level of toxicity. He found that ANN can find potent biomarkers embedded in any type of expression data, mainly proteins which systematically identify the treatment classes of interest with a near 100% accuracy. Whether these proteins are useful in actual diagnosis is tested by presenting the computer model with unknown classes.

Reference ^{ 23} developed one of the most successful application of ANN in clinical diagnosis of myocardial infarction. He trained ANN on a group of 356 patients with and without acute myocardial infarction in a cardiac intensive care setting. Using a multi-layer feed forward network trained using a back propagation algorithm, the ANN had unprecedented sensitivity of 92% and a specificity of 96%

Application of Artificial Neural Network in diagnosing diabetes mellitus has been extensively used by various authors specifically using the Pima Indian data set taken from the UCI machine learning repository. This database has a well validated data resource for exploring the prediction and classification of diabetes mellitus. The data set has eight attributes i.e Number of times pregnant, Plasma glucose concentration (a 2 h in an oral glucose tolerance test), Diastolic blood pressure (mm Hg), Triceps skin fold thickness (mm), 2-h serum insulin (lU/ml), Body mass index (weight in kg/(height in metres)^{2}), Diabetes pedigree function and Age (years).

Various researchers have used different algorithms and techniques to compare the various classification accuracies obtained. ^{ 24} applied neural network classification to Pima Indian diabetes dataset. Using various combinations of pre-processing and missing value techniques, the experimental system achieved an excellent classification accuracy of 99% which is among the best.

Reference ^{ 25} applied artificial neural network using Levenberg-Marquardt (LM) algorithm and a probabilistic neural network(PNN) structure to pima Indian data set to diagnose diabetes. They obtained an accuracy of 82.37% and 78.13% using Multi-Layer Neural Network (LM algorithm) and PNN respectively.

Reference ^{ 26} used the same Pima data set for diagnosing diabetes onset. They used multilayer feed-forward neural network with back propagation training algorithm to classify patients as diabetic and not diabetic. Using a sigmoid transfer function for the hidden and the output layer and a momentum rate of 0.66 and a learning rate of 0.33, they obtained a classification accuracy of 82%. Comparing this classification accuracy to other algorithms, multilayer feed-forward trained with back-propagation algorithms was higher than other algorithms like nearest neighbor with backward sequential selection of feature.

Reference ^{ 27} in their work developed Artificial Neural Network models using both classification and predictive neural networks for the rapid diagnosis of diabetes mellitus. They used a dataset with 465 records which were divided into 440 training data sets and 25 testing data sets. The classification network which was trained using Genetic learning had 19 input variables and the target output variables was the “Diagnosis”. The classification results for the training data set showed that 88.41% of the data was correctly classified while 76% of the test set was correctly classified. Generally, both neural network models were able to learn the problem with the predictive network giving a better performance of 84% correctly classified records as opposed to 76% achieved by the classifier network on the same data set.

Reference ^{ 28} proposed a method to predict diabetes mellitus using back propagation algorithm of Artificial Neural Network. They treated the problem of diagnosing diabetes as a binary classification i.e those predicted to be diabetic falling under category 1 and non-diabetic under category 0. They used the supervised multilayer feed-forward network architecture with back propagation algorithm. The input parameters used were: Random Blood Sugar test result, Fasting Blood Sugar test result, Post Plasma Blood Sugar test, age, sex and occupation. They measured the performance of the network in terms of absolute error calculated between network response and desired target. The network achieved a classification accuracy of 92.5%. i.e the model was able to predict whether a person was diabetic or not at 92.5% accuracy.

As in ^{ 29}, used neural network based rule discovery system to determine the presence of hypoglycemic episodes based on the type 1 diabetic patients’ physiological parameters, rate of change of heart rate, corrected QT interval of electrocardiogram signal and rate of change of corrected QT interval. He used a sample size of 420 patients with 320 data sets used to develop the neural network based rule discovery system and 100 data sets used to validate its performance. The sensitivity and specificity were found as 79.30% and 60.53% respectively which are considered to be reasonable and better than the ones found by the commonly used methods, statistical regression, genetic programming and fuzzy regression.

The study utilized secondary data from 2015 Kenya Stepwise survey for Non Communicable Diseases risk factors. Artificial Neural network was used to classify diabetic and non diabetic patients using several input variables (diabetes risk factors). More specifically, a multi layered feed-forward neural network with logistic activation function was used. a 10-fold Cross validation was carried out to validate the model.

The study was carried out in all the forty seven counties of Kenya as shown in Figure 2. A nationally representative sample was selected from the fifth National Sample Survey and Evaluation Programme (NASSEP V) Frame.

The recommendation for STEPs was to draw sample population from the targeted population by use of age-sex groups. The age groups used intervals of 12 years of individuals aged 18 years to 69 years. The population covered by the 2015 Kenya STEPS survey was defined as the universe of non-institutionalized population of men and women aged 18 - 69 years. A sample of households was selected and one person identified within the age groups of interest in the households was eligible for interview and measurements ^{ 30}.

Following the recommendations detailed in STEP-wise approach to surveillance (STEPS) manual, the survey drew sample population from the targeted population by use of age-sex groups. The age groups used intervals of 12 years of population age 18 years to 69 years, resulting into eight groups.

The sample size was calculated using the formula;

where

Sample size,

Level of confidence,

Baseline label of selected indicator,

Margin of error.

Using the values, =1.96 (95 percent confidence Interval), =50 percent (as recommended by WHO for countries who have not conducted a STEPS survey before) and =0.05, the initial estimated sample size was 384. Further adjustments that included multiplication of the sample by 1.5 (design effect to cater for complex survey), 8 (the number of 12 year age-sex groups and 1.25 (to cater for 20 percent non-response) yielded a sample of 5,760. The sample was further adjusted to ease allocation into various strata.

The sample was allocated into all the 92 strata in the NASSEP V frame, ensuring that a minimum of two clusters were selected per strata. This was achieved using power allocation method.

The sample size for 2015 Kenya STEPS survey was 6,000 individuals selected from a total of 200 clusters (100 in urban and 100 in rural) with a uniform sample of 30 individuals per cluster ^{ 30}.

The inclusion criteria was:

i). Individuals aged between 18 and 69 years.

ii). Willing and able to provide informed consent for participation.

The exclusion criteria was:

i). Individuals not aged between 18 and 69 years.

ii). Unable or unwilling to provide informed consent or assent.

**a) Sample Frame**

Administratively, Kenya is divided into 47 Counties. In turn, each county is subdivided into Sub- Counties. Prior to the enactment of the current constitution in 2010, the sub-counties had not been created but similar units were the districts. Each district was divided into divisions, each division into locations and each location into sub-locations. In addition to these administrative units, prior to the 2009 population census, each sub-location was subdivided into census enumeration areas (EAs) i.e. small geographic units with clearly defined boundaries. A total of 96,251 EAs were developed. The list of EAs is grouped by administrative units and includes information on the number of households and population. This information was used in 2010 to design a master sample known as the fifth National Sample Survey and Evaluation Programme (NASSEP V) with a total of 5,360 selected EAs ^{ 30}.

The NASSEP V master frame follows a two-stage stratified cluster sample format. The first stage involved selection of Primary Sampling Units (PSUs) which were the EAs using probability proportional to size (PPS) method, with the measure of size being the households from 2009 census. The second stage involves the selection of households for various surveys. The frame was designed in a multi-tied structure with four sub-samples (C1, C2, C3 and C4), each consisting of 1,340 EAs that can serve as independent frames. The NASSEP V frame used the counties as the first level stratification and further sub divided into rural and urban sub domains. The sampling was done independently within rural - urban sub domains. Each sampled EA was developed into a cluster and undergone listing and mapping process and clusters are within measure of size of average of 100 households (between 50 households and 149 households) ^{ 30}.

**b) Sample Selection **

The 2015 Kenya STEPS survey sample was selected in three stages. Stage one involved selection of PSUs (i.e. clusters), households and individuals.

**c) Selection of PSUs **

The selection of clusters was done using the Equal Probability Selection Method (EPSEM). The clusters were selected systematically from NASSEP V frame with equal probability independently within the urban-rural domains. The process involved ordering the clusters by county, then by urban/rural, and finally by unique geocode. The resulting sample retained properties of PPS as used in creation of the frame.

**d) Household selection **

Using the total number of households from each sampled cluster available from the NASSEP V, a uniform sample of 30 households per cluster was selected using systematic sampling method. This procedure of selecting the sample households with a random start was done by the following criteria:

Let be the total number of households listed in the cluster;

Let be a random number between (0, 1);

Let be the number of households selected in the cluster;

Let be the sampling interval.

1. The first selected sample household is ( is the serial number of the household in the listing) if and only if:

2. The subsequent selected households are those having serial numbers: (rounded to integers) for Random numbers were different and independent from cluster to cluster ^{ 30}.

**e) Individual selection**

All the selected clusters and corresponding households were loaded into Personal Digital Assistants (PDAs). During interviews, all the eligible household members were listed down and PDA used to randomly select one for interviews using the inbuilt Kish Grid method ^{ 30}.

Artificial Neural Network (ANN) was used to classify individuals as either diabetic or not based on physical and behavioural characteristics as input variables. Since secondary data was used in this study, it will be first cleaned by checking missing data and outliers. Outliers will be excluded in the final analysis for the model. Chi square test will be carried out to determine the relationship between diabetes mellitus status and various risk factors.

At the inferential stage, a multi layered feed-forward neural network with logistic activation function model will be used to fit the data. Schwarz information Criterion (SIC), will be used for model selection. Classification Accuracy rate and Mean squared error (MSE) will be reported. To validate our diagnosis model, a 10 fold cross validation will also be carried out.

In order to determine the relationship between diabetes mellitus status and various risk factors, Chi-square test of independence/no relationship was carried out. Two variables are said to be statistically independent if the population conditional distributions of are identical at each level of . When two variables are independent, the probability of any particular column outcome is the same in each row. Statistical independence is, equivalently, the property that all joint probabilities equal the product of their marginal probabilities, for and ; that is, the probability that falls in row and falls in column is the product of the probability that fall in row with the probability that falls in column ^{ 31}.

Consider the null hypothesis that cell probabilities equal certain fixed values For a sample of size with cell counts , the values {} are expected frequencies. They represent the values of the expectations {} when is true. To judge whether the data contradict , we compare {} to {}. If is true, should be close to in each cell. The larger the difference {} , the stronger the evidence against . The Pearson test statistic is used to make such comparisons and it has large-sample chi-squared distributions ^{ 31}.

The Pearson chi-squared statistic for testing is:

This statistic takes its minimum value of zero when all . For a fixed sample size, greater differences {} produce larger values and stronger evidence against . Since larger values are more contradictory to , the P-value is the null probability that is at least as large as the observed value. The statistic has approximately a chi-squared distribution, for large The P-value is the chi-squared right-tail probability above the observed value. The chi-squared approximation improves as increase, and {} is usually sufficient for a decent approximation as discussed in ^{ 31}.

The chi-squared distribution is concentrated over nonnegative values. It has mean equal to its degrees of freedom , and its standard deviation equals . As increases, the distribution concentrates around larger values and is more spread out.

This is a ratio of two proportions. For tables, the relative risk is the ratio,

A relative risk of 1 occurs when i.e when the response is independet of the group. ^{ 31}

An artificial neural network (ANN) is a parallel connection of a set of nodes called neurons which mimic biological neural system. Statistically, ANN represents a class of non parametric models which is capable of approximating a non linear function by a composition of low dimensional ridge functions ^{ 33}. It represents a function of explanatory variables which is composed of simple building blocks and which may be used to provide an approximation of conditional expectations or, in particular, probabilities in regression ^{ 34}. ANN is widely used in classification, regression and statistical pattern recognition problems.

Consider a feed-forward net with input nodes, one layer of hidden nodes, one output node and an activation function The input and hidden layer nodes are connected by weights for and . The hidden and output layers are connected by weights for where is the weight from the bias node to the output node ^{ 34}. Considering an input vector , then the input to the hidden node is the value

(1) |

The output of the hidden node is the value

(2) |

The net input to the output node is the value

(3) |

Finally, the output of the network is the value

(4) |

We note that stands for all the parameters and of the network ^{ 34}. We also write and

denoting them as vectors. In prediction and classification problems, the activation function is usually chosen to be symmetric sigmoidal function i.e fixed bounded continuous non decreasing function.

(5) |

The most appropriate choice of the activation function above is the logistic function given as

where is the learning rate while is called the bias.

In this study, we assumed a statistical model that relates and as follows:

(6) |

and is the error term.

The network is trained on the dataset

i.e these data are used to come up with an estimator for ^{ 34}.

There are two types of network training i.e Supervised and unsupervised learning. In this study supervised training will be used. The supervised training of a neural net requires the following:

1. A sample of input vectors, of size each and an associated output vector

2. The selection of an initial weight set.

3. A repetitive method to update the current weights to optimize the input-output map.

4. A stopping rule

The maximum likelihood method is used to find the optimal estimator for the network ^{ 34}.

The task here is to minimize the error in equation (6). The conditional density of given is given as:

so that the log-likelihood function is given by

(7) |

The second and the third term of the above equation is independent of the weights and therefore can be omitted so that maximizing equation (7) is equivalent to minimizing

(8) |

The weights are then adjusted in such a way that the error function in equation (8) is minimized. However, this study is on classification and the target variable is binary. The probability weights of given are

(9) |

and the likelihood of equation (9) is given by

(10) |

and the negative of the log likelihood is given as

(11) |

where

(12) |

is the value of that maximizes the equation above i.e

(13) |

In equation (11), the weights are adjusted in such a way that the error between the targets and the actual output is minimized. The goodness of the network approximation can be evaluated using a penalty function, , that measures how well network output matches the “target” output corresponding to given inputs Since the output is binary, negative entropy is a good penalty ^{ 35}. Performance as a function of for given and can be measured as A measure of overall network performance is given by the expected penalty, , where the random target/input pair is drawn from the population distribution governing the phenomenon of interest. Choosing to solve yields a network producing the smallest average penalty, given an input randomly drawn from the operating environment. This provides an objective way to choose the “best” approximation and formalizes the requirement that the network “generalizes” well. There are various methods of minimizing equation (8). These include Backpropagation, Quasi-Newton method and Simulated annealing method.

In this study, back propagation method was used to minimize the error.

This is a kind of coordinate wise gradient descent method. The goal is to find a set of weights and that minimizes our objective function, equation (11). Therefore, the partial derivative of the objective function with respect to a weight represents the rate of change of the error function with respect to that weight (it is the slope of the objective function). Moving the weights in a direction down this slope will result in a decrease in the objective function. This intuitively suggests a method to iteratively find values for the weights. We evaluate the partial derivative of the objective function with respect to the weights, and then move the weights in a direction down the slope, continuing until the error function no longer decreases ^{ 36}. Mathematically, the weights are adjusted as follows, taking a unipolar activation function :

Taking individual weights, we have the iteration weight as

for and .

Similarly,

for and and , with and representing the step gain ^{ 34}.

We first discuss the concept of existence of the estimator . Existence of a solution to equation (13) is guaranteed by the following lemma with assumption that is compact ^{ 34}.

**Lemma 1**. *Assume *(11)* and *(12)* holds, then there exists a solution of the maximum likelihood equation* (13).

**Proof.** By our choice of and , given by (12) is continuous in and , and for all . Therefore, is continuous in for all , and it assumes it minimum on compact sets.

Next we discuss the concept of the model irreducibility/Redudancy.

We say that a neural network (with a fixed set of parameters) is “redundant” if there exists another network that represents exactly the same relationship function . A related definition is the reducibility of stated by ^{ 37} as follows.

**Definition:** For satisfying equation (5),

and

is called reducible if one of the following three conditions holds for and .

a) for some

b) for some or

c) for some , where denotes the zero vector of the appropriate size.

A reducible with symmetric sigmoidal leads to a redundant network because it gives a function that can be represented by another network by deleting the neuron, where is described in the conditions above. For condition (a), it is obvious. For (b), delete the neuron and replace by In (c), if , then we can delete the neuron and replace by . On the other hand, if , then we can replace and by and because

(14) |

This is a fundamental problem in neural network. The parameters are not unique since we have a different set of parameters with an identical distributions of ^{ 38}. Let the weights be represented as follows:

(15) |

where .

At this point we note two kinds of transformations that make the input-output map invariant:

i) The function is unchanged if we permute . For example if and are interchanged, remains unchanged.

ii) Equation (14), can be used to establish that the parameters and

gives exactly the same value of and hence the same distribution of .

The transformations described above generate a family with elements. Call this family of transformations For all transformation in this family,

Each transformation can be characterized as being composite function of , where

(16) |

The following two conditions must be satisfied by the activation functions.

1. Condition A: The class of functions

is linearly independent. More precisely, for any positive integer and any scalars and with for every , the condition

Implies that

2. Condition B: Assume that is differentiable and is its derivative. The class of functions

is linearly independent.

As a result of the above two conditions and assuming models (4), (5) and (6) with a continuous function satisfy condition A. (NB: ). Suppose that is irreducible. Also assume that the distribution of has the support . Then the following apply as discussed by ^{ 38}:

a) is **identifiable** up to the family of transformations generated by (16). That is, if there exists another such that , then there exist a transformation generated by (16) that transforms to .

b) Under further assumption that is continuously differentiable and satisfies condition B, the matrix is non singular. Here is a column vector and denotes the gradient of hence is a square matrix. Also, the expectation is taken with respect to the random vector .

Any non decreasing symmetric sigmoidal function that satisfies condition B also satisfies condition A. ^{ 38}. Also any non decreasing function satisfying the first two properties of equation (5) must be a cumulative distribution function (cdf) of a one dimensional random variable. Condition A says that are independent, which is equivalent to the mixture probability density functions being **identifiable**.

Assume that the set are i.i.d with conditional probability distribution

(17) |

we will fit a neural network output function to by minimizing the negative log likelihood equation (11) multiplied by .

(18) |

Let denote the expectation of the target function . Since are i.i.d, we have

(19) |

Assume that has a unique minimum if ranges over a given compact set . Then this minimum is characterized by

(20) |

By the fact that equation (4) is continuous in and continuously differentiable in , we may interchange expectation and differentiation.

Since in this study we are dealing with classification problem, the correctly classified case where for some , equation (20) is solved for . i.e is minimized at the true parameter value . In general if there is no true value, we may define as

(21) |

By minimizing equation (18), we get the estimator . Consistency of this estimator therefore means that converges in probability to as the sample size tends to infinity ^{ 34}.

Next, we discuss the asymptotic normality of the network parameters. In a classical context, our model can be written as follows:

(22) |

From the above equation, the residuals can therefore be expressed as;

(23) |

Since are i.i.d and

the residuals are also i.i.d implying that and

(24) |

Also,

We note that does not depend on .

Since the residuals are not only i.i.d but also bounded in absolute value by 1, their assumptions reduce to,

A1). The activation function is bounded and twice continuously differentiable with bounded derivatives.

A2). has a global minimum at lying in the interior of and with a positive definite Hessian

A3). Let be chosen such that for some , we have , for all .

A4). be i.i.d with unknown density whose support is .

A5). is continuous in and for some .

Having discussed the necessary theory, we now discuss the asymptotic normality of the network parameter estimates.

Let be i.i.d with

Suppose that assumptions A1 to A5 are satisfied. Then, for , with as above

where

with

and

We note that the asymptotic covariance matrix, reflects the two sources of error in and contains the squared modeling bias which vanishes in the correctly specified case while contains which reflects the randomness in the response variable ^{ 34}.

The asymptotic normality of network parameter estimates was determined by use of normal quantile-quantile (qq) plots.

Kolmogorov-Smirnov test of normality will be used to test our hypotheis. Suppose we have an i.i.d sample with some unknown distribution and we would like to test our hypotheis that is equal to a normal distribution .

Lets denote by a c.d.f of a true underlying distribution of the data. We define an empirical c.d.f by that counts the proportion of the sample points below level . For any fixed point , the law of large numbers implies that

i.e the proportion of the sample in the set approximates the probability of this set. It is easy to show that from here that this approximation holds uniformly over all . i.e the largest difference between and goes to 0 in probability ^{ 39}. The key observation in Kolmogorov-Smirnov test is that the distribution of this supremum does not dependon the ‘unknown’ distribution of the sample if is continous distribution.

For a fixed point , the central limit theorem implis that,

because is the variance of , it turns out that , which is the KS statistics ^{ 39}.

A network model with sufficiently large number of hidden units can approximate any unknown function. When a training sample is fixed, a complex network with a large number of hidden units may over fit the data. Thus, there is a trade off between approximation capability and over-fitting while implementing ANN models. One easy approach to regularizing the network complexity is to use model selection criteria. Two such criteria are the Schwarz Information criterion (SIC) proposed by ^{ 25} and Predictive Stochastic Complexity criterion (PSC) introduced by ^{ 40}.

In this study, we used the Schwarz Information Criterion (SIC) which is given as;

(25) |

The first term is the goodness of fit measure (Regression Mean Squared Error) while the second term penalizes model complexity. The Mean Squared Error (MSE) is given by;

This MSE was also used to determine the number of hidden neurons but comparison was made with SIC. Using the SIC criterion, we started with a single hidden neuron and determined SIC(1). Then the second hidden neuron was added and SIC(2) determined. The process continued until an extra hidden neuron did not improve the SIC. We therefore estimated models in order to choose a model with neurons ^{ 34}.

Cross-validation is a process that can be used to estimate the quality of a neural network. When applied to several neural networks with different free parameter values (such as the number of hidden nodes and back-propagation learning rate), the results of cross-validation can be used to select the best set of parameter values. The initial data set is divided into subsets of approximately equal size. The model is then estimated times, each time leaving out one of the subsets. A series of Mean squared error is computed on the basis of the omitted subset. This method is called leave out one cross validation ^{ 41}.

Inorder to assess the fitness of the model, Accuracy, Sensitivity and Specificity were reported. The accuracy of a diagnostic test is often assessed with two conditional probabilities: Given that a subject has the disease, the probability the diagnostic test (prediction) is positive is called Sensitivity ^{ 31}. Given that the subject does not have the disease, the probability that the test is negative is called Specificity. The overall accuracy of the model is the average of specificity and sensitivity.

Consider a table with notation,

The sensitivity, Specificity and Accuracy are calculated as follow;

The study utilized secondary data from 2015 Kenya Stepwise survey for Non Communicable Diseases risk factors. The input variables were the physical risk factors i.e. Age, Sex, Smoking behavior, Alcoholic status, Salt consumption, Sugar consumption, Physical activity/Inactivity, Obesity status, Systolic and Diastolic blood pressure, while the output variable was diabetic status (diabetic or not diabetic). An obese person in this study is any person whose Body Mass Index was greater than or equal to 30 while a diabetic person is someone whose fasting glucose was greater than or equal to 6.1mmol/l.

The table below summarizes the variables and its measurements.

The table below gives descriptive statistics for continuous variables

From Table 2, its clear that the age of respondents was well within the survey inclusion criteria. The minimum age was 18 years and the maximum age was 69 years while mean age of respondents was approximately 38 years. The average systolic blood pressure was 126.6mmHg while the average diastolic pressure was 82.05. The SBP ranged from 80mmHg to 218 mmHg while DBP ranged from 48mmHg to 129 mmHg.

Table 3 shows frequency distribution of the categorical input variables. The results from the study showed that 91.2% of respondents did not smoke or had never smoked. It is also clear that, of all the respondents, 10.3% were obese. Only 7.0% of the respondents were diabetic.

Inorder to find the relationship between diabetic status and the various input variable, a cross tabulation was carried out and summary results presented in the table below.

At 5% level of significance, Sex of respondent, Alcohol consumption, sugar consumption, physical inactivity and Obesity were significant while Smoking and Salt consumption were not significant. This implies that, there is a strong relationship between diabetic status and the significant factors while there is no relationship or association between diabetic status and smoking or salt consumption.

From this analysis, all the significant variables have a relative risk greater than one. The risk of having diabetes is atleast 29% higher for females as compared to males. For those who consume alcohol, the risk of diabetes is at least 33% higher as compared to those who do not consume alcohol. Those who consume excess sugar are 2.2 times likely to have diabetes as compared to those who do not consume excess sugar. It is also evident that, those who are physically inactive have a 73% higher risk of having diabetes as compared to those who are physically active. Those who are Obese are 2.18 times likely to have diabetes as compared to those who are not obese.

Inorder to fit the Neural network model, smoking status and salt consumption will not be considered since they do not have any significant relationship with diabetes mellitus.

The model with the least MSE was selected as per the Table 5 below.

From Figure 1, its very clear that the MSE increases with increase in number of hidden nodes. The MSE is minimum at nodes 2 implying that in order to regularize the network, a model with two hidden nodes should be chosen.

We now train our model with eight input variable and two hidden nodes.

Before training the network, the data set was split into two i.e training set and test set. 75% of data set was for training the network while 25% was for testing and validating. A plot of the network with weights is a shown in Figure 2.

The trained network had twenty one weights. The training process needed 401 steps until all absolute partial derivatives of the error function were smaller than 0.01 (the default threshold). The estimated weights range from -39.2000 to 3.6194. For instance, the intercepts of the first hidden layer are -1.3032 and 3.0729 and the four weights leading to the first hidden neuron are estimated as 3.6194, -27.3465, -35.4901, 0.2369, -0.7540, -8.7735 and 0.9512 for the covariates age, sex, alcohol, sugar, inactive, sbp, dbp and obese, respectively.

A summary table for weights is as in Table 6.

To assess the fitness of the model, a cross classification of the actual data and the predicted outcome using test data set was reported. Table 7 below shows the results of the confusion matrix.

The sensitivity of the trained network was reported as 75% while specificity was 94.29%. This implied that the overall accuracy rate was 84.64%. This implied that the model could correctly classify an individual as either diabetic or not with an accuracy rate of 84.64%. These results are consistent with other neural network models for binary classification.

After the model was trained, a 10 fold cross validation was carried out inorder to test the generalization of the model. The MSE for each fold was reported as in Table 8. The average of these results gives the test accuracy of the algorithm. From this study, it is clear that the validated average MSE was 0.0686 or the error rate was 6.86%.

In order to test our hypothesis, Kolmogorov-Smirnov test of normality was used. The hypothesis stated that;

The Artificial Neural Network parameter estimates are asymptotically normal and consistent.

Table 9 gives the results of the test for the various parameters.

At 5% significance level, we do not reject the null hypothesis for all the parameters except w71. We therefore conclude that most parameter estimates did not have a significant departure from normality. Its only the estimator, w71 that have a significant departure from normality at 5% since it has a P-Value of 0.0325.

The Normal Q-Q plot, or Normal quantile-quantile plot, is a graphical tool used to assess if a set of data plausibly came from some Normal theoretical distribution. It allows one to see at-a-glance if the assumption of normality is plausible and if not, how the assumption is violated and what data points contribute to the violation. If both sets of quantiles came from the same distribution, the points should form a line that is roughly straight. A qq- plot to study the behavior of the ANN parameter estimates with a simulation of large sample shows that the parameter estimates aligned themselves in a straight line. Clearly showing that the ANN parameter estimates had a normal distribution and thus no violation of normality assumption. This is demonstrated in Figure 3 and Figure 4.

Advancement in modern computing has lead to the use of artificial neural networks which mimics the human brain. Combined with the statistical analysis, artificial neural networks are used to identify complex patterns among data. This study aimed at modeling diabetes mellitus using ANN. This combined with clinical diagnoses can greatly assist the clinicians and doctors in correctly diagnosing the underlying disease. The accuracy obtained from the trained model is a good indication that, with good investment and further research in this field, the classification accuracy can be improved and hence, the model can be used in future.

In this study the Diagnosis of Diabetes Mellitus has been modeled using neural network classifier. In order to come up with a network architecture, chi square test of statistics was first carried out inorder to establish the input variables that had a significant relationship with diabetes mellitus. For variables that were continous, a stepwise model building was carried out and the networks MSE did not increase indicating that they were also significant. Inorder to determine the appropriate number of hidden nodes (size of the network), MSE and SIC were used to determine the parsimonious model. The model with more than seven nodes did not converge. Within the models that converged, the model with two hidden node had the minimumm MSE and thus was chosen as the model that could fit the data well.

The ANN network had 9 inputs neurons, one hidden layer with two neurons and the output layer had one neurons. The hidden and output layers used the sigmoid transfer function and were trained using the back propagation algorithm. The data was split into training set, test set and validation set. A 10 fold cross validation was carried out inorder to test the classification accuracy of the model. The sensitivity of the trained network was reported as 75% while specificity was 94.29%. The overall accuracy of the model was 84.64%. As conclusions, It was seen that with a good choice of risk factors for diabetes, neural network structures could be successfully used to help diagnose diabetes disease among Kenyan adult population.

This study sets a precedent in modeling diabetes mellitus using artificial neural network among adult Kenyan population. With increasing interest in artificial intelligence, I would recommend that future research be focused in embracing this new field of statistical computing. More important would be to integrate clinical/medical diagnosis with artificial intelligence. More complex machine learning algorithms like support vector machine, self organizing maps should be applied in diagnosing diabetes.

I thank the Almighty God for enabling me come this far. Without His grace and favour, i would not have made it. Special regards to my Supervisors Prof. Anthony G. Waititu, Dr. Anthony Wanjoya and Dr. Thomas Mageto for their guidance throughout this work. Special Thanks to my entire family for their moral support which kept me going. I sincerely thank my wife Beth and my daughter Salome for their unending support. My gratitudes goes to my employer, Kenya National Bureau of Statistics (KNBS) for fully sponsoring me to undertake my Masters studies. Finally, I thank Alexander Kasyoki Muoka for encouraging me to finish this project and all who supported me in one way or the other, God bless you richly.

[1] | World Health Organization. Global Report on Diabetes, WHO Press, Geneva, 2006, page 6. | ||

In article | |||

[2] | Zainab A, et al. Using Neural Network to predict the Hypertension. International Journal of Scientific Development and Research, 2(2): 35-38, 2017. | ||

In article | |||

[3] | Ripley, B. D. Pattern Recognition and Neural Networks, Oxford Press, London, 1996. | ||

In article | View Article | ||

[4] | Robert, S. “Artificial Intelligence: its use in Medical Diagnosis”, The Journal of Nuclear Medicine, 34 (3): 510-514, 1993. | ||

In article | |||

[5] | Flury, B., & Riedwyl, H.Multivariate statistics: A practical approach. London: Chapman and Hall,1999. | ||

In article | |||

[6] | Press, S. J.,&Wilson, S. “Choosing between logistic regression and discriminant analysis”, Journal of the American Statistical Association, 73(364): 699-705, 1978. | ||

In article | View Article | ||

[7] | Hosmer, D. W., & Lemeshow, S. Applied logistic regression. New York: Wiley Series, 1989. | ||

In article | |||

[8] | Buntine, W. L., & Weigend, A. S. “Bayesian Back-propagation”, Complex Systems, 5(6):603-643, 1991. | ||

In article | |||

[9] | Menard, S. Applied logistic regression analysis, series: Quantitative applications in the social sciences. Thousand Oaks, CA: Sage, 1993. | ||

In article | |||

[10] | Myers, R. H. Classical and modern regression with applications (2nd edition). PWS-KENT Publishing Company, Boston, Massachusetts, 1990. | ||

In article | |||

[11] | Neter, J., Li, W., Nachtsheim, C.J., & Kutner, M. H. Applied linear statistical models (5th edition), McGraw-Hill/Irwin, New York, 2005. | ||

In article | PubMed | ||

[12] | Snedecor, G. W., & Cochran, W. G. Statistical methods (7th edition). Ames, IA: The Iowa State University Press, 1980. | ||

In article | |||

[13] | Razi M.A.,& Athappilly, K. . A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications , 29(1): 69-74, 2005. | ||

In article | View Article | ||

[14] | Zhang, G., Patuwo B.E., & Hu, M.Y. “Forecasting with artificial neural networks: The state of the art”, International Journal of Forecasting, 14(1): 35-62, 1998. | ||

In article | View Article | ||

[15] | Cybenko, G. “Approximation by superpositions of a sigmoidal function”. Mathematics of Controls Signals and Systems, 2(4): 303-314, 1989. | ||

In article | View Article | ||

[16] | Funahashi, K. “On the approximate realization of continuous mappings by neural networks”. Neural Networks, 2(3): 183-192, 1989. | ||

In article | View Article | ||

[17] | Hornik, K., Stichcombe, M., & White H. “Multilayer feedforward networks are universal approximators”. Neural Networks, (2): 359-366, 1989. | ||

In article | View Article | ||

[18] | Hornik, K. “Approximation capabilities of multilayer feed- forward networks”. Neural Networks, 4(2): 251-257, 1991. | ||

In article | View Article | ||

[19] | Irie, B., & Miyake, S. “Capabilities of three-layered perceptrons,” In: Proceedings of the IEEE Second International Conference on Neural Networks, July 1988, San Diego, California USA. | ||

In article | View Article PubMed | ||

[20] | Dybowski, R., & Gant,V. Clinical Applications of Artificial Neural Networks, Cambridge University Press, London, 2007. | ||

In article | |||

[21] | Szolovits, P., Patil,S., & Schwartz, W. “Artificial Intelligence in Medical Diagnosis.”, Annals of Internal Medicine, 108(1): 80-87, 1988. | ||

In article | View Article PubMed | ||

[22] | Bradley, B. “Finding Biomarkers is Getting Easier”, Ecotoxicology, 21(3): 631-636, 2012. | ||

In article | View Article PubMed | ||

[23] | Baxt, W.G. “Use of an artificial neural network for the diagnosis of myocardial infarction”. Annals of Internal Medicine, 115(11): 843-848, 1991. | ||

In article | View Article PubMed | ||

[24] | Jayalakshmi, T., & Santhakumaran A. “A novel classification method for classification of diabetes mellitus using artificial neural networks”, In: International Conference on Data Storage and Data Engineering, February, 2010, Bangalore, India. | ||

In article | |||

[25] | Swanson, N. R., & White, H. “A model-selection approach to assessing the information in the term structure using linear models and artificial neural networks”, Journal of Business Economic Statistics, 13(3): 265-275, 1995. | ||

In article | |||

[26] | Olaniyi E. O, & Adnan K. “Onset Diabetes Diagnosis Using Artificial Neural Network”, International Journal of Scientific & Engineering Research, 5(10): 754-759, 2014. | ||

In article | |||

[27] | Adeyemo, A & Akinwonmi, A. “On the Diagnosis of Diabetes Mellitus Using Artificial Neural Network Models”. African Journal of Computing & ICT, 4(1):1-8, 2011. | ||

In article | |||

[28] | Rajib D, V. Bajpai, G. Gandhi & B. Dey. “Application of artificial neural network technique for diagnosing diabetes mellitus”, In:2008 IEEE Region 10 Colloquium and the Third ICIIS, 8-10 December, 2008, Kharagpur, INDIA. | ||

In article | |||

[29] | Chan K, Ling S, Dillon T, & Nguyen H. “Diagnosis of hypoglycemic episodes using a neural network based rule discovery system”, Expert System Applications. 38(8): 9799-9808, 2011. | ||

In article | View Article | ||

[30] | Kenya National Bureau of Statistics. Kenya STEPwise Survey For Non Communicable Diseases Risk Factors KNBS, MOH and WHO, Nairobi, page 137-140, 2015. | ||

In article | |||

[31] | Agresti, A. An Introduction to Categorical Data Analysis (2nd edition). New York: Wiley Series, 2007. | ||

In article | View Article | ||

[32] | Amato F, et al. “Artificial neural networks in medical diagnosis”. Journal of Applied Biomedicine, 11(2): 47-58, 2013. | ||

In article | View Article | ||

[33] | Intrator, O., & Intrator, N. “Interpreting Neural Networks Results: A simulation study”, Computational Statistics and Data Analysis, 37(3): 373-393, 2001. | ||

In article | View Article | ||

[34] | Waititu, A.G. Nonparametric Change point Analysis for Bernoulli Random Variables Based on Neural Networks, Phd Thesis, Kaiserslautern University, Germany. (https://kluedo.ub.uni-kl.de), 2008. | ||

In article | |||

[35] | White, H. “Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models”, Journal of the American Statistical Association, 84(408): 1003-1013, 1989. | ||

In article | View Article | ||

[36] | Warner, B., & Misra, M. “Understanding neural networks as statistical tools”, The American Statistician, 50(4), 284-293, 1996. | ||

In article | |||

[37] | Sussmann, H. J. “Uniqueness of the weights for minimal feed-forward nets with a given input-output map”,Neural Networks, 5(4): 589-593, 1992. | ||

In article | View Article | ||

[38] | Hwang, J. T., & Ding, A. A. “Prediction intervals for artificial neural networks”. Journal of American Statistical Association, 92(438): 748-757, 1997. | ||

In article | View Article | ||

[39] | Berger, V., & Zhou, Y. “Kolmogorov Smirnov test: Overview”. In Wiley statsref: Statistics reference online. New York: John Wiley & Sons, Ltd. | ||

In article | |||

[40] | Rissanen, J. “Stochastic complexity and modeling”, Annals of Statistics, 14(3): 1080-1100, 1986. | ||

In article | View Article | ||

[41] | Haykin, S. Neural Networks and Learning Machines (3rd edition). New Jersey: Pearson Education, 2009. | ||

In article | |||

[42] | Muhammad, A. R., & Kuriakose, A. “A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models”, Expert Systems with Applications, 29(1): 65-74, 2005. | ||

In article | View Article | ||

[43] | Temurtas, H., Yumusak, N., & Temurtas F. “A comparative study on diabetes disease diagnosis using neural networks”, Expert System Applications, 36(4): 8610-8615, 2009. | ||

In article | View Article | ||

[44] | White, H. Artificial neural networks: Approximation and learning theory. Oxford: Basil Blackwell, 1992. | ||

In article | |||

Published with license by Science and Education Publishing, Copyright © 2018 Pius Miri Ng’ang’a, Antony Waititu Gichuhi, Antony Wanjoya and Thomas Mageto

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Pius Miri Ng’ang’a, Antony Waititu Gichuhi, Antony Wanjoya, Thomas Mageto. Modelling Diabetes Mellitus among Adult Kenyan Population Using Artificial Neural Network. *American Journal of Applied Mathematics and Statistics*. Vol. 6, No. 5, 2018, pp 186-200. http://pubs.sciepub.com/ajams/6/5/3

Ng’ang’a, Pius Miri, et al. "Modelling Diabetes Mellitus among Adult Kenyan Population Using Artificial Neural Network." *American Journal of Applied Mathematics and Statistics* 6.5 (2018): 186-200.

Ng’ang’a, P. M. , Gichuhi, A. W. , Wanjoya, A. , & Mageto, T. (2018). Modelling Diabetes Mellitus among Adult Kenyan Population Using Artificial Neural Network. *American Journal of Applied Mathematics and Statistics*, *6*(5), 186-200.

Ng’ang’a, Pius Miri, Antony Waititu Gichuhi, Antony Wanjoya, and Thomas Mageto. "Modelling Diabetes Mellitus among Adult Kenyan Population Using Artificial Neural Network." *American Journal of Applied Mathematics and Statistics* 6, no. 5 (2018): 186-200.

Share

[1] | World Health Organization. Global Report on Diabetes, WHO Press, Geneva, 2006, page 6. | ||

In article | |||

[2] | Zainab A, et al. Using Neural Network to predict the Hypertension. International Journal of Scientific Development and Research, 2(2): 35-38, 2017. | ||

In article | |||

[3] | Ripley, B. D. Pattern Recognition and Neural Networks, Oxford Press, London, 1996. | ||

In article | View Article | ||

[4] | Robert, S. “Artificial Intelligence: its use in Medical Diagnosis”, The Journal of Nuclear Medicine, 34 (3): 510-514, 1993. | ||

In article | |||

[5] | Flury, B., & Riedwyl, H.Multivariate statistics: A practical approach. London: Chapman and Hall,1999. | ||

In article | |||

[6] | Press, S. J.,&Wilson, S. “Choosing between logistic regression and discriminant analysis”, Journal of the American Statistical Association, 73(364): 699-705, 1978. | ||

In article | View Article | ||

[7] | Hosmer, D. W., & Lemeshow, S. Applied logistic regression. New York: Wiley Series, 1989. | ||

In article | |||

[8] | Buntine, W. L., & Weigend, A. S. “Bayesian Back-propagation”, Complex Systems, 5(6):603-643, 1991. | ||

In article | |||

[9] | Menard, S. Applied logistic regression analysis, series: Quantitative applications in the social sciences. Thousand Oaks, CA: Sage, 1993. | ||

In article | |||

[10] | Myers, R. H. Classical and modern regression with applications (2nd edition). PWS-KENT Publishing Company, Boston, Massachusetts, 1990. | ||

In article | |||

[11] | Neter, J., Li, W., Nachtsheim, C.J., & Kutner, M. H. Applied linear statistical models (5th edition), McGraw-Hill/Irwin, New York, 2005. | ||

In article | PubMed | ||

[12] | Snedecor, G. W., & Cochran, W. G. Statistical methods (7th edition). Ames, IA: The Iowa State University Press, 1980. | ||

In article | |||

[13] | Razi M.A.,& Athappilly, K. . A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications , 29(1): 69-74, 2005. | ||

In article | View Article | ||

[14] | Zhang, G., Patuwo B.E., & Hu, M.Y. “Forecasting with artificial neural networks: The state of the art”, International Journal of Forecasting, 14(1): 35-62, 1998. | ||

In article | View Article | ||

[15] | Cybenko, G. “Approximation by superpositions of a sigmoidal function”. Mathematics of Controls Signals and Systems, 2(4): 303-314, 1989. | ||

In article | View Article | ||

[16] | Funahashi, K. “On the approximate realization of continuous mappings by neural networks”. Neural Networks, 2(3): 183-192, 1989. | ||

In article | View Article | ||

[17] | Hornik, K., Stichcombe, M., & White H. “Multilayer feedforward networks are universal approximators”. Neural Networks, (2): 359-366, 1989. | ||

In article | View Article | ||

[18] | Hornik, K. “Approximation capabilities of multilayer feed- forward networks”. Neural Networks, 4(2): 251-257, 1991. | ||

In article | View Article | ||

[19] | Irie, B., & Miyake, S. “Capabilities of three-layered perceptrons,” In: Proceedings of the IEEE Second International Conference on Neural Networks, July 1988, San Diego, California USA. | ||

In article | View Article PubMed | ||

[20] | Dybowski, R., & Gant,V. Clinical Applications of Artificial Neural Networks, Cambridge University Press, London, 2007. | ||

In article | |||

[21] | Szolovits, P., Patil,S., & Schwartz, W. “Artificial Intelligence in Medical Diagnosis.”, Annals of Internal Medicine, 108(1): 80-87, 1988. | ||

In article | View Article PubMed | ||

[22] | Bradley, B. “Finding Biomarkers is Getting Easier”, Ecotoxicology, 21(3): 631-636, 2012. | ||

In article | View Article PubMed | ||

[23] | Baxt, W.G. “Use of an artificial neural network for the diagnosis of myocardial infarction”. Annals of Internal Medicine, 115(11): 843-848, 1991. | ||

In article | View Article PubMed | ||

[24] | Jayalakshmi, T., & Santhakumaran A. “A novel classification method for classification of diabetes mellitus using artificial neural networks”, In: International Conference on Data Storage and Data Engineering, February, 2010, Bangalore, India. | ||

In article | |||

[25] | Swanson, N. R., & White, H. “A model-selection approach to assessing the information in the term structure using linear models and artificial neural networks”, Journal of Business Economic Statistics, 13(3): 265-275, 1995. | ||

In article | |||

[26] | Olaniyi E. O, & Adnan K. “Onset Diabetes Diagnosis Using Artificial Neural Network”, International Journal of Scientific & Engineering Research, 5(10): 754-759, 2014. | ||

In article | |||

[27] | Adeyemo, A & Akinwonmi, A. “On the Diagnosis of Diabetes Mellitus Using Artificial Neural Network Models”. African Journal of Computing & ICT, 4(1):1-8, 2011. | ||

In article | |||

[28] | Rajib D, V. Bajpai, G. Gandhi & B. Dey. “Application of artificial neural network technique for diagnosing diabetes mellitus”, In:2008 IEEE Region 10 Colloquium and the Third ICIIS, 8-10 December, 2008, Kharagpur, INDIA. | ||

In article | |||

[29] | Chan K, Ling S, Dillon T, & Nguyen H. “Diagnosis of hypoglycemic episodes using a neural network based rule discovery system”, Expert System Applications. 38(8): 9799-9808, 2011. | ||

In article | View Article | ||

[30] | Kenya National Bureau of Statistics. Kenya STEPwise Survey For Non Communicable Diseases Risk Factors KNBS, MOH and WHO, Nairobi, page 137-140, 2015. | ||

In article | |||

[31] | Agresti, A. An Introduction to Categorical Data Analysis (2nd edition). New York: Wiley Series, 2007. | ||

In article | View Article | ||

[32] | Amato F, et al. “Artificial neural networks in medical diagnosis”. Journal of Applied Biomedicine, 11(2): 47-58, 2013. | ||

In article | View Article | ||

[33] | Intrator, O., & Intrator, N. “Interpreting Neural Networks Results: A simulation study”, Computational Statistics and Data Analysis, 37(3): 373-393, 2001. | ||

In article | View Article | ||

[34] | Waititu, A.G. Nonparametric Change point Analysis for Bernoulli Random Variables Based on Neural Networks, Phd Thesis, Kaiserslautern University, Germany. (https://kluedo.ub.uni-kl.de), 2008. | ||

In article | |||

[35] | White, H. “Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models”, Journal of the American Statistical Association, 84(408): 1003-1013, 1989. | ||

In article | View Article | ||

[36] | Warner, B., & Misra, M. “Understanding neural networks as statistical tools”, The American Statistician, 50(4), 284-293, 1996. | ||

In article | |||

[37] | Sussmann, H. J. “Uniqueness of the weights for minimal feed-forward nets with a given input-output map”,Neural Networks, 5(4): 589-593, 1992. | ||

In article | View Article | ||

[38] | Hwang, J. T., & Ding, A. A. “Prediction intervals for artificial neural networks”. Journal of American Statistical Association, 92(438): 748-757, 1997. | ||

In article | View Article | ||

[39] | Berger, V., & Zhou, Y. “Kolmogorov Smirnov test: Overview”. In Wiley statsref: Statistics reference online. New York: John Wiley & Sons, Ltd. | ||

In article | |||

[40] | Rissanen, J. “Stochastic complexity and modeling”, Annals of Statistics, 14(3): 1080-1100, 1986. | ||

In article | View Article | ||

[41] | Haykin, S. Neural Networks and Learning Machines (3rd edition). New Jersey: Pearson Education, 2009. | ||

In article | |||

[42] | Muhammad, A. R., & Kuriakose, A. “A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models”, Expert Systems with Applications, 29(1): 65-74, 2005. | ||

In article | View Article | ||

[43] | Temurtas, H., Yumusak, N., & Temurtas F. “A comparative study on diabetes disease diagnosis using neural networks”, Expert System Applications, 36(4): 8610-8615, 2009. | ||

In article | View Article | ||

[44] | White, H. Artificial neural networks: Approximation and learning theory. Oxford: Basil Blackwell, 1992. | ||

In article | |||