Artificial Neural Network for Solving Fuzzy Differential Equations under Generalized H – Derivation

The aim of this work is to present a novel approach based on the artificial neural network for finding the numerical solution of first order fuzzy differential equations under generalized H-derivation. The differentiability concept used in this paper is the generalized differentiability since a fuzzy differential equation under this differentiability can have two solutions. The fuzzy trial solution of fuzzy initial value problem is written as a sum of two parts. The first part satisfies the fuzzy condition, it contains no adjustable parameters. The second part involves feed-forward neural networks containing adjustable parameters. Under some conditions the proposed method provides numerical solutions with high accuracy.


Introduction
Nowadays, fuzzy differential equations (FDEs) is a popular topic studied by many researchers since it is utilized widely for the purpose of modeling problems in science and engineering. Most of the practical problems require the solution of a FDE which satisfies fuzzy initial or fuzzy boundary conditions, therefore, a fuzzy initial or fuzzy boundary problem should be solved. However, many fuzzy initial or fuzzy boundary value problems could not be solved exactly, sometimes it is even impossible to find their analytical solutions. Thus, considering their approximate solutions is becoming more important [1].
The theory of FDE was first formulated by Kaleva and Seikkala. Kaleva had formulated FDE in terms of the Hukuhara derivative (H-derivative). Buckley and feuring have given a very general formulation of a first-order fuzzy initial value problem. They first find the crisp solution, make it fuzzy and then check if it satisfies the fuzzy differential equation [2].
In recent years artificial neural network (ANN) for estimation of the ordinary differential equation (ODE) and partial differential equation (PDE) has been used. We briefly review some articles in the literature concerning the differential equations. In (1990) lee, Kang [3] used parallel processor computers to solve a first order differential equation with Hopfield neural network models. In (1994) Meade, Fernandez [4,5] solved linear and non-linear ODEs by using feed-forward neural networks (FFNN) architecture and B-splines of degree one. In (1997) Lagaris, Likas, et al. [6,7] used ANN for solving ODEs and PDEs with the initial / boundary value problems. In (1999) Liu, Jammes [8] developed some properties of the trial solution to solve the ODEs by using ANN. In (2003) Ali, Ucar, et al. [9] solved the vibration control problems by using ANN. In (2004) Tawfiq [10] presented and developed supervised and unsupervised algorithms for solving ODE and PDE. In (2006) malek, shekari [11] presented numerical method based on ANN and optimization techniques which the higher-order ODE answers approximates by finding a package form analytical of specific functions. In (2008) Pattanaik, Mishra [12] applied and developed some properties of ANN for solution of PDE in RF Engineering. In (2010) Baymani, Kerayechian, et al. [13] proposed ANN approach for solving stokes problems. In (2011) Oraibi [14] designed FFNN for solving ordinary initial value problem. In (2012) Ali [15] designed fast FFNN to solve two point boundary value problems. In (2013) Hussein [16] designed fast FFNN to solve singular boundary value problems. In (2014) Tawfiq, Al-Abrahemee [17] designed ANN to solve singular perturbation problems, and other researchers.
Numerical solution of FDE by using ANN is the subject of a very modern because it only goes back to 2010. In (2010) Effati and pakdaman [18] used ANN for solving FDE, they used for the first time the ANN to approximate fuzzy initial value problems. In (2012) Mosleh, Otadi [19] used ANN for solving fuzzy Fredholm integro-differential equations. In (2013) Ezadi, Parandin, et al. [20] used ANN based on semi-Taylor series to solve first order FDE. In (2016) Suhhiem [21] developed and used fuzzy ANN for solving fuzzy and non-fuzzy differential equations.
In 2008, the concept of the generalized Hukuharadifferentiability is studied by Chalco-Cano and Roman 2

International Journal of Partial Differential Equations and Applications
Flores [22,23] to solve FDE. In this work, for solving FDE Under Generalized H -Derivation, we present modified method which relies on the function approximation capabilities of FFNN and results in the construction of a solution written in a differentiable, closed analytic form. This form employs FFNN as the basic approximation element, whose parameters (weights and biases) are adjusted to minimize an appropriate error function. To train the ANN which we design, we employ optimization techniques, which in turn require the computation of the gradient of the error with respect to the network parameters. In this proposed approach the model function is expressed as the sum of the two terms: the first term satisfies the fuzzy initial / fuzzy boundary conditions and contains no adjustable parameters. The second term can be found by using FFNN, which is trained so as to satisfy the FDE. It is necessary to note that the solution of the FDE by using ANN based on conversion the FDE into a system of ODEs.

Basic Definitions
In this section, the basic notations which are used in fuzzy calculus are introduced Definition (1), [19]: The r-level (or r-cut) set of a fuzzy set A � labeled by A r , is the crisp set of all x in X (universal set) such that :

Definition (2), [20]: Extension Principle
Let X be the Cartesian product of universes X 1 , X 2 , …, X m and A � 1 , A � 2 , …, A � m be m -fuzzy subset in X 1 , X 2 , …, X m respectively, with Cartesian product A � = A � 1 × A � 2 × … × A � m and f is a function from X to a universe Y, and f −1 is the inverse image of f. For m = 1, the extension principle will be:

Definition (3), [1]: Fuzzy Number
A fuzzy number u � is completely determined by an ordered pair of functions �u (r), u (r)�, 0≤ r ≤ 1, which satisfy the following requirements : 1) u (r) is a bounded left continuous and non decreasing function on [0,1].
2) u (r) is a bounded left continuous and non increasing function on [0,1].
3) u (r) ≤ u (r), 0≤ r ≤ 1. The crisp number a is simply represented by : The set of all the fuzzy numbers is denoted by E 1 . Remark (1), [19]: For arbitrary u � = �u, u�, v � = �v, v� and K ∈ R, the addition and multiplication by K can be defined as : For all r ∈ [0,1].

3
In the case of 0 ≤ a 2 ≤ b 2 , multiplication operation can be simplified as: when previous sets A and B is defined in the positive real number R + , the operations of multiplication, division and inverse are written as :

20]: Triangular Fuzzy Number
Among the various shapes of fuzzy numbers, triangular fuzzy numbers is the most popular one. A triangular fuzzy number is a fuzzy number represented with three points as follows : This representation is interpreted as membership functions : Now if you get crisp interval by r-cut operation, interval [A] r shall be obtained as follows ∀ r ∈ [0,1] from: We get: A= �a 2 -a 1 �r + a 1 , A= �a 2 -a 3 �r + a 3 . Thus: which is the parametric form of triangular fuzzy number A � .
(2) We call every function defined in set A � ⊆ E 1 to B � ⊆ E 1 a fuzzy function. Definition ( ), [18]: The fuzzy function F: R → E 1 is said to be continuous if : For an arbitrary t 1 ∈ R and ϵ > 0 there exists a δ > 0 such that : where D is the distance between two fuzzy numbers. Definition (7), [18]: Let I be a real interval. The r-level set of the fuzzy function y ∶ I → E 1 can be denoted by : y t , y t t I, r 0,1 The Seikkala derivative yˊ(t) of the fuzzy function y(t) is defined by : Definition (8), [18]: let u, v ∈ E 1 . If there exist w∈ E 1 such that u= v + w, then w is called the H-difference (Hukuhara-difference) of u, v and it is denoted by w= u ⊝ v.
In this work the ⊝ sign stands always for H-difference, and let us remark that u ⊝ v ≠ u + (-1) v.
It is necessary to note that the definition (9) is the classical definition of the H-derivative (or differentiability in the sense of Hukuhara ).

Definition (10), [22,23]: Generalized H -Differentiability
Let International Journal of Partial Differential Equations and Applications (ii) If F is differentiable in the second form (2) of definition (10), then f r and g r are differentiable functions and Proof: see [22].

Artificial Neural Networks [10]
Artificial neural networks (ANNs) are learning machines that can learn any arbitrary functional mapping between input and output. They are fast machines and can be implemented in parallel, either in software or in hardware. In fact, the computational complexity of ANN is polynomial in the number of neurons used in the network. Parallelism also brings with it the advantages of robustness and fault tolerance. (i.e.) ANN is a simplified mathematical model of the human brain. It can be implemented by both electric elements and computer software. It is a parallel distributed processor with large numbers of connections It is an information processing system that has certain performance characters in common with biological neural networks. ANN has been developed as generalizations of mathematical models of human cognition or neural biology, based on the assumptions: 1) Information processing occurs at many simple elements called neurons that is fundamental to the operation of ANNs.
2) Signals are passed between neurons over connection links.
3) Each connection link has an associated weight which, in a typical neural net, multiplies the signal transmitted. 4) Each neuron applies an activation function (usually nonlinear) to its net input (sum of weighted input signals) to determine its output signal.
Note: The units in a network are organized into a given topology by a set of connections, or weights, shown as lines in a diagram.

Characterize of Artificial Neural Network [10]
ANN is Characterized by: 1) Architecture: it is pattern of connections between the neurons.
2) Training Learning Algorithm: it is method of determining the weights on the connections.
3) Activation function: The output of a neuron depends on the neuron's input and on its activation function.

Typical Architecture of ANN [10]
ANNs are often classified as single layer or multilayer. In determining the number of layers, the input units are not counted as a layer, because they perform no computation. Equivalently, the number of layers in the net can be defined to be the number of layers of weighted interconnects links between the slabs of neurons. This view is motivated by the fact that the weights in a net contain extremely important information.

The Bias [21]
In sections (3.4) and (3.5), we describe the main implementation of the back-propagation algorithm for multi-layer feed forward neural network (FFNN). The most implementations of this algorithm employ an additional class of weights known as biases (Figure 1). Biases are values that are added to the sums calculated at each node(except input nodes) during the feed-forward phase. The negative of a bias is sometimes called a threshold. For simplicity, biases are commonly visualized simply as values associated with each node in the intermediate and output layers of a network, but in practice are treated in exactly the same manner as other weights, with all biases simply being weights associated with vectors that lead from a single node whose location is outside of the main network.

Multilayer Feed Forward Architecture [21]
In a layered neural network the neurons are organized in the form of layers. We have at least two layers: an input and an output layer. The layers between the input and the output layer (if any) are called hidden layers, whose computation nodes are correspondingly called hidden neurons or hidden units. Extra hidden neurons raise the network's ability to extract higher-order statistics from (input) data. The source nodes in the input layer of the network supply respective elements of the activation pattern (input vector), which constitute the input signals applied to the neurons (computation nodes) in the second layer (i. e., the first hidden layer). The output signals of the second layer are used as inputs to the third layer, and so on for the rest of the network. A layer of nodes projects onto the next layer of the neurons (computation nodes), but not vice versa. In other words, this network is a feed forward neural network (Figure 1). i.e., when any output of the neurons is input of neurons of the same level or preceding levels, the network is described as feed forward, if there is at least one connected exit as entrance of neurons of previous levels or of the same level, including themselves, the network is denominated of feedback. The feedback networks that have at least a closed loop of back propagation are called recurrent. The neurons in each layer of the network have as their inputs the output signals of the preceding layer only. The set of output signals of the neurons in the output (final) layer of the network constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input (first) layer. The ANN is said to be totally connected in the sense that every node in each layer of the network is connected to every other node in the adjacent forward layer, otherwise the network is called partially connected. In this work, totally connected multilayer FFNN is used.

Back propagation Training Algorithm [21]
Training a network by back propagation involves three stages: )The feed forward of the input training pattern.
) The back propagation of the associated error. ) The adjustment of the weights. The term back propagation refers to the process by which derivatives of the neural network error with respect to the neural network weights and biases can be computed. This process can be used with a number of different optimization strategies. In another word the standard back propagation is based on the gradient descent, back propagation also known as the Generalized Delta Rule. It is the most widely used supervised training algorithm for ANN. Back propagation is a well-known training method for the multilayer FFNN and it has many industrial applications in function approximation, pattern association, and pattern classification. Because of its importance, we will discuss it in some detail

Activation Function [21]
The activation function (sometimes called a transfer function) can be a linear or nonlinear function. There are many different types of activation functions. Selection of one type over another depends on the particular problem that the neuron (or ANN) is to solve. The activation function denoted by S: R → R defines the output of a neuron, which is bounded monotonically increasing, differentiable and satisfies: Lim x⟶+∞ s(x) =1 and Lim x⟶−∞ s(x) =0.
The sigmoid function, is by far the most common form of activation function used in construction of ANNs. An example of the sigmoid function is the logistic function defined the range from 0 to 1, an important feature of the sigmoid function that it is differentiable.
It is sometimes desirable to have the activation function range from -1 to 1 allowing an activation function of the sigmoid type to assume negative values, for example, the hyperbolic tangent function which is smooth function.
During this work, we take s(x) = tanh(x) as an activation function, depending on the results of [21] which evidence that an transfer function tanh(x) enables the training algorithm to learn faster.

Theorem (2): The World Approximation Builder
The multi-Layer perceptron (MLP) network with one hidden Layer with a sigmoid functions in the middle layer and linear transformation functions in output layer are able to approximate all functions in any degree of the integral of the square. (see [3]).

First Order Fuzzy Differential Equation
A fuzzy differential equation of the first order is in the form: with the fuzzy initial condition y(a) = y 0 , where y is a fuzzy function of X and F (x, y (x)) is a fuzzy function of the crisp variable X and the fuzzy variable y while yˊ is the fuzzy derivative (If we consider yˊ(x) in the second form (2) of definition (10) According to our proposed method ) of y and y 0 is a fuzzy number It is clear that the fuzzy function F(x, y) is the mapping F: R × E 1 → E 1 [18]. Now it is possible to replace (16) by the following equivalent system: x, y, y min F x, u : u y, y x, y, y max F x, u : u y, y .
The parametric form of system (17) is given by: x, y x, r , y(x, r) , y( , ) y y (x,r) x, y x, r , y(x, r) , y( , ) y H a r r G a r r where X ∈ [a, b] and r ∈ [0, 1]. Now with a discretization of the interval [a, b], a set of points x i , i = 1,2,3, … g are obtained. Thus for an arbitrary x i ∈ [a, b], the system (19) can be rewritten as: x , y x ,r , y(x ,r) 0 y (x , ) x , y x ,r , y(x ,r) 0 with the initial conditions: y(a, r) = y 0 (r), y(a, r) = y 0 (r), r ∈ [0, 1].
In this work, the function approximation capabilities of feed-forward neural networks is used by expressing the trial solution for the system (19) as the sum of two terms (see eq 22). The first term satisfies the initial conditions /boundary conditions and contains no adjustable parameters. The second term involves a feed-forward neural network to be trained so as to satisfy the fuzzy differential equations. Since it is known that a multilayer perceptron with one hidden layer can approximate any function to arbitrary accuracy, the multilayer perceptron is used as the type of the network architecture.
If y t (x, r, p) is a trial solution for the first equation in system (19) and y t (x, r, p) is a trial solution for the second equation in system (19) where p and p are adjustable parameters. Indeed, y t (x, r, p) and y t (x, r, p) are approximation of y(x, r) and y(x, r) respectively, then a discretize version of the system (19) can be converted to the following optimization problem: x , y (x , , ), y (x , , ) H y x , r, min x , y (x , , ), y (x , , ) y x , r, �Here p �⃗ = (p, p) contains all adjustable parameters� subject to the initial conditions: y t (a, r, p) = y 0 (r) , y t (a, r, p) = y 0 (r). Each trial solution t y (x, r, p) and y t (x, r, p) employs one feed-forward neural network for which the corresponding networks are denoted by N(x, r, p) and N(x, r, p) with adjustable parameters p and p respectively. The trial solutions t y and y t should satisfy the initial conditions, and the networks must be trained to satisfy the differential equations. Thus t y and y t can be chosen as follows: where N(x, r, p) and N(x, r, p) are single-output feed-forward neural network with adjustable parameters p and p respectively. Here X and r are the network inputs.
It is easy to see that in (22), t y and y t satisfy the initial conditions.
Thus the corresponding error function that must be minimized over all adjustable neural network parameters will be: where x i´s are points in [a, b]. For solving FDE which described in this subsection we use two ANNs, each network is of dimension 2 × m × 1: two input units x and r, one hidden layer with m units and one linear output unit.
For every entries x and r the input neurons makes no changes in its input, so the inputs to the hidden neurons are : where v j and v j are the weight parameters from the jth units in the hidden layers to the output layer in the first and second network.

Reducing a FDE to a System of ODEs [22,23]
The solution of the fuzzy differential equation (16)  The existence and uniqueness of the two solutions (for problem (16)) which described above are given by the following theorem Theorem (3): Let F ∶ I × E 1 → E 1 be a continuous fuzzy function such that there exists k > 0 such that D�F(x, w), F(x, z)� ≤ k D(w, z) for all t ∈ I and w, z ∈ E 1 then the problem (16) has two solutions (one (1)differentiable and the other one (2)-differentiable) on I, where I = [a, b]. Proof: see [23].
To illustrate how we can find the two solutions for a fuzzy differential equation under generalized H-derivation, we present the following example : (2) According to subsection (4.2), Case II., after reducing the above problem, we have the following system of ODEs

Numerical Example
To show the behavior and properties of the proposed method, one problem will be solved in this section. We have used a multilayer perceptron having one hidden layer with ten hidden units and one output unit. The activation function of each hidden unit is hyperbolic tangent activation function. The analytical solutions y a (x, r) and y a (x, r) have been known in advance. Therefore, we test the accuracy of the obtained solutions by computing the deviation (absolute error): Where y t (x, r) and y t (x, r) are the trial solutions.
In order to obtain better results, more hidden units or training points may be used. To minimize the error function we have used BFGS quasi-Newton method (For more details, see [21]  x r x r p x r x r p The ANN trained using a grid of ten equidistant points in [0, 1]. The error function that must be minimized for this problem will be :  Then we use (28) to update the weights and biases. analytical and trial solutions for this problem can be found in Table 1 and Table 2.

Conclusion
In this paper, we have presented numerical method based on artificial neural network for solving first order fuzzy initial value problem under generalized H-derivation. The method which we have used allows us to translate the FDE into system of ODEs and then solve this system. we demonstrate the ability of ANN to approximate the solution of FDEs. Therefore, we can conclude that the method which we proposed can handle effectively all types of FDEs and provide accurate approximate solution throughout the whole domain and not only at the training set. As well, one can use the interpolation techniques to find the approximate solution at points between the training points or at points outside the training set. Further research is in progress to apply and extend this method to solve higher order FDEs.