The Design of Robust Soft Sensor Using ANFIS Network

Veiw figure View Figure

(1)

Where x₁(or x₂) is the input to node i and A_i(or B_i-2) is a linguistic label associated with this node. Here the membership function forA₁, A₂, B₁ or B₂ can be any appropriate parameterized membership function, like for example the generalized bell membership function:

(2)

Where a_i, b_i and c_i are the parameters of membership functions. Parameters in this layer are referred to as premise (antecedent) parameters.

Layer 2. Every node in this layer is a fixed node, whose output is the product of all the incoming signals:

Each node output represent the firing strength of a rule. In general, any other T-norm operators that perform fuzzy and can be used as the node function in this layer.

(3)

Layer 3. Every node in this layer is a fixed node labelled N. The node calculates the ratio of the rules firing strength to the sum of all rules firing strengths:

(4)

For suitability, outputs of this layer are called normalized firing strengths.

Layer 4. Every node in this layer is an adaptive node with a node function:

(5)

Where p_i, r_i and q_i are the design parameters. Parameters in this layer are referred to as consequent parameters.

Layer 5. The single node in this layer is a fixed, which the overall output is computed by the summation of all incoming signals ^[7].

(6)

2.2. Subtractive Clustering

The subtractive clustering method is proposed by Chiu ^[8]. Clustering data set in an unsupervised method by measuring the potential of data set in the feature space is subtractive clustering technique. Subtractive clustering assumes that each data point is a potential cluster center and calculates the potential for each data point based on the density of surrounding data points. The first cluster center has the highest potential and the potential of data points near the first cluster center (within the influential radius) is destroyed. The radius of influence is important to determine the number of clusters. A smaller radius leads to many smaller clusters in the data space, which results in more rules, and vice versa. It is important to choice suitable radius of influence for clustering the data space. The Subtractive Clustering technique is used to determine the ANFIS structure. Subtractive Clustering looks for an optimal data point by distributing the data into clusters and defining a cluster center based on the density off surrounding data points. The Subtractive Clustering algorithm has the following steps:

1. Select the data point with the highest potential to be the first cluster center.

2. All data points in the vicinity of the first cluster center (as determined by radii) are removed in order to determine the next data cluster and its center location.

3. This process is iterated until all of the data is within radii of a cluster center.

The optimal point defining a cluster center is found with proper cluster radii ^[9].

2.3. The Outliers

Outliers are the cases with data values different than the values of the majority of the cases in the data set. Outliers are patterns in data that do not conform to a well-defined no notion of normal behavior. Figure 2 illustrates outliers in a simple 2-dimensional data set. The data has two types of outliers (single/batch).

Figure 2. Some examples of outliers in 2-dimensional data set

Download as

Veiw figure View Figure

Points that show sufficiently far away from the data set, are outliers ^[10]. In analytical chemistry, empirical data often contain outliers of one type (Single Outlier) or another (Batch Outlier). Statistical techniques are often used which are sensitive to such outliers, and negative results may have been affected by them, and the most robust and resistant methods have been developed since 1960 and less sensitive to outliers. Robustness is the key issue for modeling and identification.

Figure 3. Quadratic cost function

Download as

Veiw figure View Figure

2.3. Absolute cost function as robust cost function

Robust Cost Functions are alternative cost functions which are robust against outliers and noise. Quadratic cost function is the conventional cost function which is used as optimization tool. Figure 3 show the Quadratic cost function.

(7)

Absolute cost function is a type of robust cost function. Properties of Absolute cost function are absolute error, called ‘total variation’, convex and non-differentiable at origin. Figure 4 show the Absolute cost function.

Figure 4. Absolute cost function

Download as

Veiw figure View Figure

(8)

2.4. Particle Swarm Optimization (PSO) Algorithm

This section gives a brief over view of PSO method. Also, the procedure of the minimization of the robust cost function by using the PSO algorithm for the solution is given.

Particle Swarm Optimization (PSO) is a method used to explore the search space of a given problem to ﬁnd the settings or parameters required to maximize a particular objective. This method, ﬁrst described by James Kennedy and Russell C. Eberhart in 1995 ^[11], originates from two separate concepts: the idea of swarm intelligence based oﬀ the observation of swarming habits by certain kinds of animals such as birds and ﬁsh and the ﬁeld of evolutionary computation. It uses a number of particles that constitute a swarm moving around in the search space looking for the best solution. Each particle is treated as a point in a N-dimensional space which adjusts its “flying” according to its own flying experience as well as the flying experience of other particles.

Each particle keeps track of its coordinates in the solution space which are associated with the best solution (fitness) that has achieved so far by that particle. This value is called personal best (pbest).

Another best value that is tracked by the PSO is the best value obtained so far by any particle in the neighborhood of that particle. This value is called global best (gbest).

In the PSO algorithm, the velocity of particle is updated according the following equation:

(9)

Each individual moves from the current position to the next one by the modified velocity in (9) using the following equation:

(10)

Using the modified velocity and position of particle based on (9) and (10), the search mechanism of the PSO is demonstrated in Figure 5.

Figure 5. The search mechanism of the particle swarm optimization

Download as

Veiw figure View Figure

The general PSO algorithm can be summarized as follows:

Initialize a population (array) of particles with random positions and velocities of N-dimensions in the problem space.

Evaluate the desired optimization fitness function in N variables for each particle.

Compare particle's fitness evaluation with particle's pbest. If current value is better than pbest, then set pbest value equal to the current value and the Pbest location equal to the current location in d-dimensional space.

Compare fitness evaluation with the population's overall previous best. If the current value is better than Gbest, then reset Gbest to the current particle's array index and value.

Change the velocity and position of the particle according to (9) and (10) respectively. V_i and X_i represent the velocity and position of i^th particle with N-dimensions respectively and rand₁ and rand₂are two uniform random functions.

Go to Step 2 until satisfying stopping criteria, usually a sufficiently good fitness or a maximum number of iterations / epochs ^[9].

PSO method has many advantages; it is simple, fast and easy to be coded. Another advantage is that the initial population of the PSO is maintained, and so there is no need for applying operators to the population, a process that is time and memory-storage-consuming.

2.5. Selection of Parameters for PSO Algorithm

The selection of these PSO parameters plays an important role in the optimization ^[12]. A single PSO parameter choice has a great effect on the rate of convergence. For this paper, the optimal PSO parameters are determined by trial and error experimentations.

A smaller radius leads to many smaller clusters in the data set space, which results in more rules, and vice versa. Hence it is important to select suitable influential radius for clustering the data set space.

The optimum radius, (0.6), is determined by trial and error experimentations.

2.6. Training ANFIS with PSO

In this section, the PSO method employed for minimization of robust cost function in order to update the parameters of ANFIS structure. The ANFIS has two types of parameters which need to update, the antecedent parameters and the conclusion parameters. The membership functions are assumed Gaussian as in equation (11), and their parameters are C_i and σ_i, where σ_i is the variance of membership functions and C_i is the center of membership functions.

(11)

The parameters of conclusion part are represented with p_i, r_i and q_i that are shown in (12) and (13).

(12)

(13)

3. Case Study, Implementation and Results

3.1. Case Study

A model of a highly nonlinear chemical plant consisting of two cascade Continuous Stirred Tank Reactors (CSTRs) followed by a no adiabatic flash separator with a recycle has been selected in this paper ^[13].

A simple graph of the plant is displayed in Figure 6. This combination is prevalent in chemical industries such as the styrene polymerization process. The desired product B is produced by an irreversible first order reaction.

An undesirable side reaction occurs and leads to the consumption of B and the production of the superfluous side product C.

The product stream from CSTR-2 is directed into a flash to separate the excess A from the product B and the side product C. Reactant A has the highest relative volatility and is the major component in the vapor phase. A fraction of the vapor phase is purged and the residual stream is condensed and recycled back to CSTR-1. The liquid phase exiting from the flash consists predominantly of B and C.

In order to test the ability of soft sensor to track the process changes, we define some realistic scenarios of changes on the plant as follow:

1. A step change in the concentration of the input feed. At first, it is supposed that the input feed purely consists of component A. After the application of the step change, the input feed contains both components A and B with fractions of 90% and 10%, respectively.

2. Random changes in Q_r, T₀, D and F₀ each with an amplitude of about 10% of its operating point value. The changes happen slowly and randomly, which is closer to the real case events.

All the changes are applied to the model during the test stage.

The inputs of soft sensor are: D, F₀, F₁, H_b, H_m, H_r, Q_b, Q_m, Q_r, T₀, T_b, T_d, T_m and T_r which can easily be measured and the output is the product mass fraction X_Bb.

As it is mentioned before, the proposed approach has the ability of robust training. It means that the proposed approach will reduce the influence of noise and outliers on the performance of the soft sensor.

The model is implemented in Simulink. We generate about 600 samples from the model. All the variables of training set are scaled to [0, 1]. 240 data points are used as the training set and 360 data points are used for testing the model.

Figure 6. Two CSTRs plus flash separator

Download as

Veiw figure View Figure

3.2. Performance Indices

In this study, in addition to graphical demonstration we use two common numerical measures for performance evaluation. The Root Mean Square Error (RMSE) (also called the Root Mean Square Deviation, RMSD) is a frequently used measure of the difference between values predicted by a model and the values actually observed from the environment that is being modelled. These individual differences are also called residuals, and the RMSE serves to aggregate them into a single measure of predictive power.

The RMSE of a model prediction with respect to the estimated variable X_model is defined as the square root of the mean squared error:

(14)

where X_obs is observed values and X_model is modelled values at time/place i.

The calculated RMSE values will have units, and RMSE for phosphorus concentrations can for this reason not be directly compared to RMSE values for chlorophyll a concentrations etc. However, the RMSE values can be used to distinguish model performance in a calibration period with that of a validation period as well as to compare the individual model performance to that of other predictive models.

RMSE demonstrates how well the predicted output fits the true output but it does not necessarily reflect whether the two sets of data move in the same direction. For instance, by simply scaling the system output, we can change the RMSE without changing the directionality of data. The correlation coefficient (R) solves this problem. The correlation coefficient is calculated by:

(15)

The correlation coefficient tells us the strength and direction of the relationship between two variables. The closer the number is to 0, the weaker the relationship. The closer the number is to ±1.00, the stronger the relationship. Therefore, the R lies in the range [-1, 1]. When R = 1, there is a perfect positive linear correlation between and, that is they vary by the same amount. When R = -1 there is a perfectly linear negative correlation between and, that is they vary in opposite ways (when increases, decreases by the same amount). When R = 0 there is no correlation between, and the variables are called uncorrelated. Intermediate values describe partial correlations.

3.3. Implementation of the methodology

At the first stage of the purposed algorithm, we generate a structure ANFIS by subtractive clustering. The first-order Sugeno fuzzy model was used in consequence part. In addition, in order to reduce the number of parameters to be optimized by the robust training method in the ANFIS structure, only two parameters have been used for the Gaussian membership functions in the premise part of each rule. Then use absolute cost function as robust cost function to updates the parameters of this structure. We employed PSO algorithm for minimization of cost functions.

After training step, for evaluating the performance of the trained ANFIS model and to monitor how well the network is training, the test data sets were presented to the network.

In this study, in addition to graphical demonstration we use two common numerical measures for performance evaluation: The Root Mean Square Error (RMSE) and correlation coefficient (R).

RMSE demonstrates how well the predicted output fits the true output but it does not necessarily reflect whether the two sets of data move in the same direction and the correlation coefficient (R) gives more information about the training of network.

The results of using quadratic and absolute cost function for training ANFIS are shown in the following table and also the plot of the training data, testing data and correlation coefficient (R) for each method is displayed.

According to this study the results of the new method are very close to the experimental results and the correlation coefficient (R) value is 0.88128 for testing data sets.

The stopping criterion is the number of iterations as the maximum iterations is 100.

3.4. Results and discussion

The simulation result show a very higher prediction accuracy in comparison with the ANFIS method for designing soft sensor. Figure 7 shows data set of soft sensor. Figure 8, Figure 9 and Figure 10 show the results Train data, Test data and Correlation coefficient of ANFIS approach and Figure 11, Figure 12 and Figure 13 show the results Train data, Test data and Correlation coefficient of new method for designing soft sensor. The results of using quadratic and robust cost function for training ANFIS are shown in the Table 1. As is shown in Figures and Table 1, a new method for the design of the soft sensor is robust against Gaussian noise and single/batch outliers, while other methods such as ANFIS and neural network are strongly affected by Gaussian noise and single/batch outliers. The correlation coefficient is closer to one, the better for us. Mean square error is used as quantitative measure for performance evolution of the training method.

Figure 7. Data set

Download as

Veiw figure View Figure

Figure 8. Result of ANFIS conventional training approach for CSTR training data set

Download as

Veiw figure View Figure

Figure 9. Result of ANFIS conventional training approach for CSTR testing data set

Download as

Veiw figure View Figure

Figure 10. Correlation coefficient of ANFIS conventional training approach for CSTR data set

Download as

Veiw figure View Figure

Figure 11. Results of the proposed robust training approach for CSTR training data set

Download as

Veiw figure View Figure

Figure 12. Results of the proposed robust training approach for CSTR testing data set

Download as

Veiw figure View Figure

Figure 13. Correlation coefficient of the proposed robust approach for CSTR data set

Download as

Veiw figure View Figure

Table 1. Results (RMSE and Correlation Coefficient)

Download as