Unification of Well-known Numeric Methods for Solving Nonlinear Equations
Sharif University of Thechnology, IranAbstract | |
1. | Introduction |
2. | Classifying Methods |
3. | Criteria for Comparison of the Methods |
4. | Systems of Nonlinear Equations |
5. | Conclusion |
References |
Abstract
This article is a fairly comprehensive document on the numerical solution of nonlinear equations. The aim of this paper is to unify all numerical methods for solving nonlinear equations and complete the Najafi-Nikkhah method [1,2] and generalize the famous methods for solving systems of nonlinear equations. So, the available methods in this field, are being investigated and it will be indicated that how these techniques, despite the apparent dispersion, all are obtained from a unified idea, and this unified pattern would help find new techniques in a systematic way. All current methods require that the initial starting point or points to be close to the solution appropriately, but for the equations with complicated appearance finding the initial guess would not be easy. So, this article intends to provide an appropriate response for this fundamental issue for the first time. An algorithm is proposed to complete the Najafi-Nikkhah technique [1,2], and declares the procedure to make the initial guess become closer to the solution, even if they are far away from each other. Then, the procedure could be completed by one of the common methods available in this field. Dispersion of the current methods causes the confusion in the case of using them. Therefore, these methods should be compared with some criteria to determine the use of them in the practical applications. In the next section of this article, these criteria together with some comparisons including a series of tables and diagram would be provided. Finally, in the last section, generalization of the current methods for solving the nonlinear equation with a singular unknown to the methods for solving the systems of nonlinear equations would be investigated. Then, the generalization of the modified Najafi-Nikkhah method for the systems of nonlinear equations will be presented which can reduce the dependency of the initial guess to the solution, significantly.
Keywords: unification, nonlinear equations, classifying methods, suitable starting point, stability, number of iterations, criteria for comparison of the methods
Copyright © 2015 Science and Education Publishing. All Rights Reserved.Cite this article:
- Amin Najafi Amin. Unification of Well-known Numeric Methods for Solving Nonlinear Equations. American Journal of Numerical Analysis. Vol. 3, No. 3, 2015, pp 65-76. https://pubs.sciepub.com/ajna/3/3/2
- Amin, Amin Najafi. "Unification of Well-known Numeric Methods for Solving Nonlinear Equations." American Journal of Numerical Analysis 3.3 (2015): 65-76.
- Amin, A. N. (2015). Unification of Well-known Numeric Methods for Solving Nonlinear Equations. American Journal of Numerical Analysis, 3(3), 65-76.
- Amin, Amin Najafi. "Unification of Well-known Numeric Methods for Solving Nonlinear Equations." American Journal of Numerical Analysis 3, no. 3 (2015): 65-76.
Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |
At a glance: Figures
1. Introduction
Since now, many different techniques have been proposed for solving nonlinear equation of f(x) =0. But, in practice, the question is how to unify all these techniques.
First, general information about the available methods would be discussed. The Available numerical methods can be classified into two main categories:
1. A set of techniques that uses successive derivatives of f(x) function in the specified point.
2. A set of techniques that uses the function value in some different points.
However, a person might be interested in offering a new technique which uses the combination of two aforementioned techniques. The first class generally has better performance, meaning convergence for more initial points and higher convergence speed. The problem is performing the derivative operation, which is not a simple task in all cases. The latter is usually used when the objective is finding the real roots merely. The first class usually generates both real and complex root, however, it could not be used in a controlled manner for finding the real roots merely.
If the all points on the second class become close to each other, they could be converted to their equivalent in the first class. As an example, consider the false position technique [3]:
(1) |
This method requires two initial points. If these two point become closer to each other, that is:
(2) |
The method converts to the Newton-Raphson method [4] immediately:
(3) |
In general, in order to convert the techniques of the first class to their equivalent in the latter, it should be referred to the numerical approximation of the derivative, the examples of which is provided in the following table:
So, in order to calculate the new order of derivative in the first class, at least a new point in the equivalent method of the second class is required. The rate of the convergence for the method using the n-th derivative is n+1. However, this would not mean the same speed of convergence for all these methods.
2. Classifying Methods
The available methods can be categorized from another outlook.
1. A class which uses the first order derivative in one or two initial points like, secant [5], false position [3] and Newton-Raphson [4] methods.
2. A class which uses two sequential derivatives in a point or three separate points [6-13][6].
3. A class which combines the first class idea in each iteration several times and use them simultaneously [14-20][14], the Ostrowski method [14, 15, 16] is the most famous method of this class.
2.1. Discussion first classIf y = f (x), then x can be written in terms of the inverse Taylor series around an arbitrary point x0 as follows:
(4) |
In which:
So, in the case of y=0, the aforementioned series can be rewritten as follows, which in fact is the root of the equation in terms of the initial guess x0
(5) |
This relationship is the basis of unifying all the existing methods.
2.2. Discussion Second ClassThe first three terms of the equation (5) are the unifying factors of the second class methods. Indeed, all these methods somehow should ultimately be a generator of the three. For clarity, it is better to analyze the various methods of this category further.
2.2.1. Householder Technique [7]
This technique generates the first three terms of the equation (5) exactly, but it is not an efficient method compared to other methods in this class.
2.2.2. Schroder Technique [8]
In Newton-Raphson equation, if the alternative term of is replaced instead of f(x), the Schroder relation is produced as follows:
(6) |
According to the following series:
(7) |
The equation (6) would be approximated as follows:
(8) |
If Equation (8) is compared with equation (5), it becomes evident that equation (8), instead of generating the third term of the equation (5), that is , generates , and that is why this method become divergent usually. However, in the case of conveyance, the rate is almost appropriate compared to other methods.
2.2.3. Halley Method [9]
This method is similar to Schroder method [8], except that in Newton- Raphson method [4], is replaced with and so, we have:
(9) |
If the above equation is approximated by equation (7):
(10) |
As it can be seen, in this method, unlike the Schroder method, the first three term of equation (5) will appear. This method is more stable compared to the other two methods, Householder [7] and Schroder [8].
2.2.4. Ridder Method [11]
This method is applied for finding the real roots, and in this context, it is fairy appropriate and acceptable method in comparison with the other methods:
(11) |
In which and the values of and should have different signs [11]. Referred to the same idea discussed in the introduction, if the three points are completely close to each other, the equivalent relation could be obtained as follows:
(12) |
If the following series is used, this relation serves better results than Halley relation [9].
(13) |
It can be written:
(14) |
2.2.5. Muller Method [12] and Its Relation with Halley’s Irrational Formula [10]
Similar to the Ridder relation [11], three points are used in this method, so according to the computational cost in each iteration and the convergence rate, it gains less score compared to Ridder method.
(15) |
Just like the Ridder method, by making the points closer to each other, an equivalent method can be derived as follows:
(16) |
The above equation is exactly Halley’s Irrational Formula [10]. In comparison to the other similar methods, this is a relatively good method. In order to relate this equation to equation (5), a more general case will be discussed in Laguerr method [13].
2.2.6. Laguerr Method [13]
Although this method has been proposed for polynomials, but it can be used in the general case, too:
(17) |
m is an arbitrary value, but usually the best case is achieved when m=2, which gives the Halley’s Irrational Formula. In the following discussion, it would be clear that why the above equation is correct for each m, also the relation with equation (5) would be observed:
(18) |
We have:
(19) |
2.2.7. Najafi-Nikkhah Method [1,2]
In general, the method is as follows:
(20) |
In which, G(x) is an arbitrary function, that should satisfy the following condition [1, 2]:
(21) |
Two particular cases have been offered for G(x) [1, 2]:
1_ G(x)=xk:
(22) |
This method is the fastest method for solving polynomial equation. However, in comparison to other similar methods, it is considered a good method.
2_ G(x)=kx:
(23) |
In the case of the exponential function appears in the equations, this method would be more favorable than other methods.
(24) |
So, this method generates the first three terms of the equation (5).
2.2.7. Modified Najafi-Nikkhah Method
2.2.7.1. Modification of the method in the case of small values for
Due to the existence of in the denominator, if it reaches close to the zero, the method would be slowing down or even divergent. To fix this problem, first the Taylor expansion of the function f (x) in terms of function G(x) should be considered:
(25) |
In which:
In this case, the term of and also the value of can be neglected. So:
(26) |
By putting the above expression equal to zero, we can achieve:
(27) |
2.2.7.2. Algorithm for finding suitable starting point
One of the main problems that can be seen in all of the existing methods is finding the starting point that ensures the convergence of the method. In this section an algorithm is proposed that in most cases solve this problem. It is assumed that the value of the is very great for an initial guess, meaning that the initial guess is so far from the root, since otherwise, there is no need for these procedures.
In the equations of (22) and (23), the proposed functions for G(x), were the particular cases. However, they might be appropriate for polynomials and exponential functions, but in general, they would not be appropriate. In the general case, selection of the G (x) is dependent on the main equation and is determined based on the behavior of the f (x). Two more general cases can be considered:
1. G(x) =A(x) k. in this case, with respect to the condition (21) we have:
(28) |
2. G(x) =exp (k.A(x)). Again, due to the condition (21) we have:
(29) |
Suppose that f(x) is the sum of a constant number such as , and several functions, that each of them is the product of a number of other functions.
(30) |
The basic idea is that the function to be viewed as a vector that is the sum of the vectors in the form of . So, if among these vectors, a vector like could be found that has the largest absolute value of projection on the , with the assistance of the equation (23), a function in the form of below can be used instead of f(x) for finding the initial guess:
(31) |
In which the value of the k can be calculated from equation (28) or (29). However, in most cases, k will be approximately equal to 1. So, in order to simplify the calculation, the procedure of finding k can be discarded. By putting the equation (29) equal to zero, the value of the would obtained, which can be assumed to be equal to C1. Now, according to equation (30) we can write:
(32) |
By comparing the equations (30) and (32), it can be observed that the equation similar to the first one is reached, except that the value of the new function is significantly close to zero due to the usage of Ln function. In this case, there are two choices. The new equation can be solved by the same conventional methods or, once again, this algorithm is used and these procedures be repeated. The producer is continued up to equations in the form of the below is reached:
(33) |
In which, Ct is a constant value, and M(x) is function, the inverse of which is a well-known function. With an example, it would be clearer:
(34) |
The initial point is considered x0=34. For this initial guess we have:
It Can be seen that with this choice, has a considerable distance to zero and none of the conventional methods would converge simply. So, the proposed modified algorithm should be applied. Considering that the largest is , according to the equation (32) we can write:
Solving this equation with the conventional methods is very simple and quickly converges to x1=5.20219, for which:
It can be seen that by using only one step of the proposed algorithm, how close the initial guess become to the root. But in some cases, a function like could not be found among that is a main constituent of f(x). in these cases, G(x) can be chosen as below:
(35) |
In which, is the ratio of the magnitude to the sum of the at the point , which represents the influence of in generating the at the desired point:
Then, according to the equation (25), could be approximated as follows:
(36) |
And by using the idea applied in obtaining the equation (32), and according to the equations (30), (35) and (36), we can write:
(37) |
For the simplicity, for each i that leads to , it can be assumed zero. The continuation of the procedure is exactly the same as the procedure mentioned for solving equation (32). The algorithm was in fact an attempt to convert the sum of a series of functions to the product of functions, and using the properties of the Ln(x). In this way, the equations are projected to a new space, in which the absolute of the functions are smaller. An example would clarify the compliance of and H(x) in the selected range for .
Example:
As it was mentioned earlier, in order to simplify the calculation, could be considered zero.
2.2.8. Unifying the Proposed Techniques
According to the all explanation that was introduced up to this section, and considering equation (5), a quite unique relation for generating and explaining the various methods can be proposed, suppose the following relation:
(38) |
In which, for :
(39) |
By choosing different M(x), new methods can be obtained easily. For instance, consider the following few examples:
(40) |
(41) |
(42) |
It should be noted that the majority the available methods include functions such as M (x) which generate the two first terms of the equation (39) exactly and the third term, approximately. This explanation is true about equations (41) and (42), too. M(x) can be named the generating function. This generating function can be observed for the available methods in the following table:
The methods of the third class, use more than one recursive relation in each iteration. The most famous method of this class is Ostrowski method [14, 15, 16] that for the order of convergence of 4 is written as follow [14, 15]:
(43) |
Both of the aforementioned equations are written based on the Newton-Raphson method, however, in the second step, both of the previous point and obtained value from the first relation have been used. In the second equation, is replaced with .The basis of this approximation is the following equation:
(44) |
Ostrowski equation for the order of convergence of 6 is as follows [16]:
(45) |
The general idea of such methods is the use of the Newton-Raphson relation, repeatedly. Similar methods have been proposed by Sharma and Guha [17], Chun and Ham [18], Kou and Wang [19] based on this general idea. In general, the convergence rate of the third class methods is lower than the second class methods, which have been discussed in a dimensionless comparison in the following section. In order to clarify the discussion, comparison criteria for the methods are proposed.
3. Criteria for Comparison of the Methods
For comparing the methods, some criteria should be considered, some of them are as follows:
3.1. Is the Intended Method be able to Generate only the Real Roots?Sometimes only the evaluation of the real roots is required. As was explained in the beginning of the context, all the methods of first and second class can be converted to each other regarding to equation (2) and Table 1, so, this criterion is neglected.
3.2. Order of ConvergenceConvergence order associated with the convergence rate directly, however, this does not mean that if the convergence order is bigger, the method is necessarily faster. Based on the classification of this article, the convergence order of the first and second class is maximum 2 and 3, respectively. For the third class, the minimum order is 4. But in general, if it is used alone, this criterion is not appropriate.
3.3. Number of Required Starting Point and Number of Differentiation OrderAccording to Table 1, if the order of differentiation lowers one order, at least a new initial point is required. So, for example, if the method is used one starting point and two step differentiation, it can be converted to a method that uses three starting point without any need for differentiation. Parametrical differentiation is sometimes difficult, but if it is possible, its performance would be better. On the other hand, finding starting points that satisfy , is sometimes difficult, too. In these cases, methods using first and second order differentiation and they do not require checking this condition are superior.
3.4. The Number of Operations in Each IterationThe main criterion can be proposed, is the number of the operation to converge to one of the roots for different starting points on the complex plane or real axis (in the case of real roots only), and also for which values of starting points, it becomes divergent. So the number of the operations in each method should be determined. The operations can be classified as follows:
1- Addition and subtraction
2. Multiplication and division
3. Power (Over 2) and roots
4. Computing the functions (exp, Ln…)
5. Computing the successive derivatives for initial points , ,
For each of these five categories, weight should be determined. Suppose that the number of operations of the above class is n1 to n5, and the assigned weight is W1 to W5, respectively.
The column 5 in the above table is an unknown value, which depends on the complexity of the studied function. For determining this value, the number of the iterations corresponding to the four previous case for , and should be computed separately. So, for each iteration, a parameter like the below one can be obtained:
(46) |
The coefficient of can be an appropriate criterion for comparing the methods due to the required operation in each iteration. Comparison with this criterion is performed in a dimensionless space. Coefficient of is multiplied to place the values of in a more suitable range. A table has been set that implies these operations for the considered class. It should be noted that in this calculation, addition, subtraction, multiplication and division of numbers is ignored. Also, optimization by changing the variables that might repeat in the method has been done, if possible.
Example: suppose that f(x) is a polynomial that we intend to determine the roots:
We suppose , that and . In the following plots, the vertical axis indicates . These figures are plotted for the mentioned methods and are arranged in terms of their performance for this example. The criteria for the performance of the methods are the smaller and stability of the method. That’s why the schroder method is in the fifth position despite the smaller value of, because of its instability near the roots of the equation.
Different methods can be compared in terms of the convergence range in the complex plane. For this purpose, the example of section 2.2.7.2, equation (34) can be used. Stability of these methods for the mentioned example is investigated for <50. The following figures are plotted in the range of and and arranged due to the stability of the method and smaller values for . It can be observed that the order of the plots is quite different with the previous case, in which the number of the iterations in the dimensionless space was considered. All these diagrams indicate instability in particular ranges, or show an ascending trend by increasing the real part of the initial guess, which implies the increasing number of the operations that is not preferable.
Now, notice the following diagram plotted based on the explained method in section 2.2.7.2:
In this example, in order to solve the equation (32), maximum 5 iterations have been used in Najafi-Nikkhah method, which has been considered in computation of . As it can be observed, although the diagram is plotted in the larger range (X∈[0,180]) compared to previous diagrams, number of operations remain constant by increasing X, in addition, no instability is observed and the number of the operations is less than other methods. In fact, the proposed algorithm in section 2.2.7.2 ensures stability in most cases.
4. Systems of Nonlinear Equations
Consider the following nonlinear equation set:
(47) |
Or its equivalent in the vector form:
(48) |
In which:
(49) |
In order to generalize the proposed method in the previous sections to the methods capable of solving nonlinear equation sets, the equivalent form of , , and should be obtained in the multi variable case. Suppose as an initial guess. Two first relations in the matrix relations are clearly defined in books and articles already. is the equivalent of the determinant of the Jacobi matrix. for variable is the determinant of a matrix like Jacobi, that the j-th column is replaced with .
But for obtaining the equivalent for the second derivative with respect to , the determinant of the Jacobi matrix in the general case for the initial vector , which is the relation in terms of to variables, should be differentiated with respect to . In this case, based on the differentiation property of the matrix determinant, we have:
(50) |
In which, is the determinant of the matrix like Jacobi, that the j-th column is replaced with for . So, for each variable, should be computed separately, which leads to an excessive operations. However, the convergence would be modified significantly. By substituting these equivalents in the equation (5), the solution to the equation set (46) would be obtained. The fundamental problem is that none of these methods have an appropriate convergence in complicated equations. In the case of systems of equations, the main discussion is on the stability of the convergence, rather than its rate. The majority of these methods is sensitive to the initial guess. If for the initial guess , some values of become great, most of the methods become divergent. To resolve this problem, again we can refer to the idea offered in the section 2.2.7.2. It can be assumed that:
(51) |
In which, the are the constant values. Now, can be chosen as follows:
(52) |
In which are:
By extending the equation (25) to multi variable case, it can be written:
(53) |
In which, is the absolute of projection of the vector on vector at .
(54) |
Therefore, we cannot find unit value for , since said limit has no unit average and then we cannot calculate this unit, but we use some approximation for ourpurpose:
(55) |
Then, we can write:
(56) |
If , it can be neglected for simplicity. Whenever the absolute of is significantly large, even for one i, equation set (57) should be solved instead of equation set (47). Any available method, even the simplest one, Newton-Raphson, can be used for solving this equation set. The convergence is more reliable compared to equation (47). After solving, the obtained solutions are used as the next guess. So, for each iteration, if the modification of the result is required based on the stated explanations, a corrective step is needed, which consists of solving a new equation set.
The following example is selected in such a way that solving it with the available methods is not simple. For solving this example, combination of the Newton-Raphson method with the corrective method have been used.
In which:
The following table is set, for determining that how large was the absolute of and for the considered initial guess.
N2 parameter indicates the number of required operations for solving the equation set (56). In this example, as it can be observed from table 7, due to the large values of and . Because of the type of the selected equations, the value of the N2 is large, however, for simpler equations, N2 becomes smaller.
In this step, in order to evaluate the behavior of this method for different initial guess is defined as follows:
(57) |
Due to the difficulty of computing the number of iterations for this example exactly, this parameter is merely defined to compare the relative behavior of this method in different points of the complex plane. Selecting the coefficient 20 for N2, is that it is assumed 20 iterations is required on average at each time of using the corrective method for solving equation set (56). Finally, the overall number of iterations is divided by 10, so the final value for would be placed in an appropriate range. The following diagram indicates variation for different values of and .
5. Conclusion
In this article, in addition to presenting a unified relation for all of the famous methods comparing them with each other in a dimensionless space, a solution to one of the basic problems of the all methods has been offered. The dependency of the method to the initial guess is reduced significantly by using the presented algorithm. Then, all the methods, generalized for solving systems of nonlinear equations by defining equivalent terms. However, as it mentioned earlier, the equivalents for and was known before and only for , a new equivalent is presented. Finally, a method for solving nonlinear equation set is presented based on the corrective idea for one variable equation.
This method is able to solve the more complex equations in the case of divergence of other methods. As it can be observed from figure 4, in the case of one variable, no critical problem faced, but for systems of equations, the method can be optimized. In addition, the problem that remains open is:
Finding a method that would be able to detect all the roots of the equation, despite the discontinuity of the given equations in the specified range of the complex plane. None of the available methods have an exact control on the range of the obtained root specially for complex roots. Sometimes the obtained root is placed in the desired range, and sometimes is far from it.
References
[1] | M. Nikkhah-B, A. Najafi-A, A new method for computing a single root of any function, ICNAAM Greece, 2006. | ||
In article | |||
[2] | R. Oftadeh, M. Nikkhah-Bahrami, A. Najafi,A novel cubically convergent iterative method for computing complex roots of nonlinear equations,Appl. Math. Comput. 217 (2010) 2608-2618. | ||
In article | View Article | ||
[3] | Abramowitz, M. and Stegun, I. A. (Eds.). New York: Dover, p. 18, 1972. | ||
In article | |||
[4] | Abramowitz, M. and Stegun, I. A. (Eds.). New York: Dover, p. 18, 1972. | ||
In article | |||
[5] | Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. “Secant Method, False Position Method, and Ridders' Method.” §9.2 in Cambridge, England: Cambridge University Press, pp. 347-352, 1992. | ||
In article | |||
[6] | Brent, R. P. Ch. 3-4 in Englewood Cliffs, NJ: Prentice-Hall, 1973. | ||
In article | |||
[7] | Householder, A. S. New York: McGraw-Hill, 1970. | ||
In article | |||
[8] | Schröder, E. “Überunendlichviele Algorithmenzur Auflösung der Gleichungen.” Math.Ann. 2, 317-365, 1870. | ||
In article | View Article | ||
[9] | T.R. Scavo and J.B. Thoo, On the geometry of Halley’s method. American Mathematical Monthly, 102:5 (1995), pp. 417-426. | ||
In article | View Article | ||
[10] | Ortega, J. M. and Rheinboldt, W. C. Philadelphia, PA: SIAM, 2000. | ||
In article | View Article | ||
[11] | Ridders, C. F. J. “A New Algorithm for Computing a Single Root of a Real Continuous Function.” IEEE Trans. Circuits Systems 26, 979-980, 1979. | ||
In article | View Article | ||
[12] | Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. Cambridge, England: Cambridge University Press, p. 364, 1992. | ||
In article | |||
[13] | S. Goedecker, Remark on Algorithms to Find Roots of Polynomials, 15(5), 1059-1063 (September 1994). | ||
In article | View Article | ||
[14] | Ostrowski Alexander M., 1966, Solutions of Equations and Systems of Equations, 2nd ed. (New york Academic Press), Chapter 12. | ||
In article | |||
[15] | Ostrowski Alexander M., 1973, Solution of Equations in Euclidean and Bonach Spaces, Academic Press, New York, 3rd ed. | ||
In article | |||
[16] | M. Grau, J. L. D´ıaz-Barrero, An improvement to Ostrowski root-finding method, Appl. Math. Comput. 173 (2006) 450-456. | ||
In article | View Article | ||
[17] | J.R. Sharma, R.K. Guha, A family of modified Ostrowski methods with accelerated sixth order convergence, Appl. Math. Comput. 190 (2007) 111-115. | ||
In article | View Article | ||
[18] | C. Chun, Y. Ham, Some sixth-order variants of Ostrowski root-findingmethods, Appl. Math. Comput. 193 (2007) 389-394. | ||
In article | View Article | ||
[19] | J. Kou, Y. Li, X. Wang, Some variants of Ostrowski’s method with seventhorder convergence, J. Comput. Appl. Math. 209 (2007) 153-159. | ||
In article | View Article | ||
[20] | NenadUjevic, An iterative method for solving nonlinear equations, J. Comput. Appl. Math. 201 (2007) 208-216. | ||
In article | View Article | ||