Imputation of Missing Values for Pure Bilinear Time Series Models with Normally Distributed Innovati...

Poti Owili Abaja, Dankit Nassiuma, Luke Orawo

American Journal of Applied Mathematics and Statistics

Imputation of Missing Values for Pure Bilinear Time Series Models with Normally Distributed Innovations

Poti Owili Abaja1,, Dankit Nassiuma2, Luke Orawo3

1Mathematics and Computer Science Department, Laikipia University, Nyahururu, Kenya

2Mathematics Department, Africa International University, Nairobi

3Mathematics Department, Egerton University, Private Bag, Egerton-Njoro, Nakuru, Kenya

Abstract

In this study, estimates of missing values for bilinear time series models with normally distributed innovations were derived by minimizing the h-steps-ahead dispersion error. For comparison purposes, missing value estimates based on artificial neural network (ANN) and exponential smoothing (EXP) techniques were also obtained. Simulated data was used in the study. 100 samples of size 500 each were generated for different pure bilinear time series models using the R-statistical software. In each sample, artificial missing observations were created at data positions 48, 293 and 496 and estimated using these methods. The performance criteria used to ascertain the efficiency of these estimates were the mean absolute deviation (MAD) and mean squared error (MSE). The study found that optimal linear estimates were the most efficient estimates for estimating missing values. Further, the optimal linear estimates were equivalent to one step-ahead forecast of the missing value. The study recommends OLE estimates for estimating missing values for pure bilinear time series data with normally distributed innovations.

Cite this article:

  • Poti Owili Abaja, Dankit Nassiuma, Luke Orawo. Imputation of Missing Values for Pure Bilinear Time Series Models with Normally Distributed Innovations. American Journal of Applied Mathematics and Statistics. Vol. 3, No. 5, 2015, pp 199-205. http://pubs.sciepub.com/ajams/3/5/4
  • Abaja, Poti Owili, Dankit Nassiuma, and Luke Orawo. "Imputation of Missing Values for Pure Bilinear Time Series Models with Normally Distributed Innovations." American Journal of Applied Mathematics and Statistics 3.5 (2015): 199-205.
  • Abaja, P. O. , Nassiuma, D. , & Orawo, L. (2015). Imputation of Missing Values for Pure Bilinear Time Series Models with Normally Distributed Innovations. American Journal of Applied Mathematics and Statistics, 3(5), 199-205.
  • Abaja, Poti Owili, Dankit Nassiuma, and Luke Orawo. "Imputation of Missing Values for Pure Bilinear Time Series Models with Normally Distributed Innovations." American Journal of Applied Mathematics and Statistics 3, no. 5 (2015): 199-205.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

At a glance: Figures

1. Introduction

A time series is data recorded sequentially over a specified time period. There are cases where some observations that were supposed to be collected are not obtained and this result in missing values. Being unable to account for missing observation may result in a severe mis-representation of the phenomenon under study. Further, it can cause havoc in the estimation and forecasting of linear and nonlinear time series as in [3]. This problem can be solved through missing value imputation.

Imputation of missing values has been done for several linear time series models. For non-linear time series models, imputation has been done for ARMA models with stable errors as in [24]. For other nonlinear models, such as bilinear time series models, there is no evidence to show that imputation of missing values has been explicitly done. Therefore this study derived estimates of missing values for the bilinear time series models with normally distributed innovations. The missing values were derived using optimal linear interpolation techniques based on minimizing the h-steps-ahead dispersion error. Other techniques for estimating missing values that were used included the non-parametric methods of artificial neural network as in [4] and [31] as well as exponential smoothing.

Interest in this study was also on the quality of the imputed values at the level of the individual, an issue that has received relatively little attention as in [5]. The basic idea of an imputation approach, in general, is to substitute a plausible value for a missing observation and to carry out the desired analysis on the completed data as in [22]. Here, imputation can be considered to be an estimation or interpolation technique.

The imputation of the missing value technique developed may be adopted by data analysts to improve on time series modeling.

2. Literature Review

Most of the real-life time series encountered in practice are neither Gaussian nor linear in nature and are adequately described by nonlinear models. One of the most important nonlinear models used in practice is the bilinear time series models. The nonlinearity of bilinear models can be approached in two ways. The first approach is to create a model that consist of a blend of non-Gaussian and nonlinearity which has been widely discussed as in [31] where he considers the existence of bilinear models with infinite variance innovations. The other approach is to introduce nonlinearity in the model but assume that the distribution of the innovation sequence is Gaussian as in [36]. Properties of these models have been extensively studied in the literature.

2.1. Bilinear Models

A discrete time series process is said to be a bilinear time series model of order BL (p,q , P, Q) if it satisfies the difference equation

where θ, and are constants while is a purely random process which is normally distributed and =1. For pure bilinear time series model, we have

Bilinear time series models may have sudden burst of large negative and positive values that vary in form and amplitude depending on the model parameters and thus it may be plausible for modeling nonlinear processes as in [25]. A bilinear model is a member of the general class of nonlinear time series models called ‘State dependent models’ formed by adding the bilinear term to the conventional ARMA model as in [30].

It is a parsimonious and powerful nonlinear time series model. Researchers have achieved forecast improvement with simple nonlinear time series models. Reference [21] used a bilinear time series model to forecast Spanish monetary data and reported a near 10% improvement in one step-ahead mean square forecast error over several ARMA alternatives. Reference [9] also reported a forecast improvement with bilinear models in forecasting stock prices. The statistical properties of such models have been analyzed in detail as in [10, 11, 25], and [17], while an economic application is presented as in [14].


2.1.1. Identification of Bilinear Time Series Models

The first step in identification of bilinear time series model is to determine whether a given data is nonlinear or not. Once the data is found to be nonlinear, it is important to fit an appropriate time series model to the data. Reference [39] pointed out that the second order properties of nonlinear BL (p, p, 0, 1) models are similar to those of linear ARMA (p, 1) ones and hence it is necessary to study higher order cumulants to distinguish them. The technique of identification of a given nonlinear model can be extended to more general bilinear models provided there are difference equations for higher order moments and cumulants as in [24].

For some super diagonal and diagonal bilinear time series, the third order moments do not vanish and the pattern of nonzero moments can be used to discriminate between the white noise and the bilinear models and also between different bilinear models. Looking at the table of third order moments, one can easily distinguish bilinear models from pure ARMA or mixed ARMA models.

Third order moments may also be useful in detecting non-normality in the distribution of the innovation sequence. References [10] and [37] have shown that in most cases, second order autocorrelation will be zero for these models which makes it difficult to distinguish them from complete white noise.

Reference [24] showed that with a large bilinear coefficient, a bilinear model can have sudden large amplitude burst and is suitable for some kind of seismological data such as earthquakes, underground nuclear explosions. The variant of the bilinear process is time dependent. This feature enables bilinear process to be used also for financial data as in [21]. Empirical studies have been done on estimating missing values for different time series data. Reference [26] used interpolation and mean imputation techniques to replace simulated missing values from annual hourly monitoring air pollution data.

Reference [29] developed alternative techniques suitable for a limited set of ARMA (p, q) with stable innovations for the case with index α∈ (1, 2]. This was later extended to the ARMA stable process with index α∈ (0, 2] as in [24]. He developed an algorithm applicable to general linear and nonlinear processes by using the state space formulation and applied it in the estimation of missing values.

2.2. Missing Value Imputation for Nonlinear Time Series Models

Reference [35] derived a recursive estimation procedure based on optimal estimating function and applied it to estimate model parameters to the case where there are missing observations as well as handle time-varying parameters for a given nonlinear multi-parameter model. More specifically, to estimate missing observations, [3] formulated a nonlinear time series model which encompasses several standard nonlinear models of time series as special cases. It also offers two methods for estimating missing observations based on prediction algorithm and fixed point smoothing algorithm as well as estimating functions equations. Recursive estimation of missing observations in an autoregressive conditional heteroscedasticity (ARCH) model and the estimation of missing observations in a linear time series model are shown as special cases. However, little was done on bilinear time series models.

Reference [28] investigated influence of missing values on the prediction of a stationary time series process by applying Kaman filter fixed point smoothing algorithm. He developed simple bounds for the prediction error variance and asymptotic behavior for short and long memory process.

The work on estimation of missing values has also been extended to vector time series. A classic example is the studies done as in [20] who worked on estimation of missing values in possibly partially non stationary vector time series. He extended the method as in [19] for estimating missing values and evaluating the corresponding function in scalar time series. The series is assumed to be generated by a possibly partially non-stationary and non-invertible vector autoregressive moving average process. No pattern of missing data is assumed. Future and past values are special cases of missing data that can be estimated in the same way. The method does not use the Kalman filter iterations and hence avoids initialization problems. The estimation of missing values is provided by the normal equations of an appropriate regression problem.

2.3. Nonparametric Methods for Estimating Missing Values

Nonparametric methods have also been proposed for missing data. Reference [26] considered kernel estimation of a multivariate density for data with incomplete observations. When the parameter of interest is the mean of a response variable which is subject to missingness, the kernel conditional mean estimator to impute the missing values is proposed as in [5]. Reference [13] studied the estimation of average treatment effects using non-parametrically estimated propensity scores. Time series smoothers estimate the level of a time series at time t as its conditional expectation given present, past and future observations, with the smoothed value depending on the estimated time series model as in [16].

Reference [25] derived a recursive form of the exponentially smoothed estimated for a nonlinear model with irregularly observed data and discussed its asymptotic properties. They made numerical comparison between the resulting estimates and other smoothed estimates. Reference [1] have used neural networks and genetic algorithms to approximate missing data in a database. A genetic algorithm is used to estimate the missing value by optimizing an objective function. Many approaches have been developed to recover missing values, such as k-nearest neighbor as in [38], Bayesian PCA (BPCA) as in [27], least square imputation as in [12], local least squares imputation (LLSimpute) as in [15] and least absolute deviation imputation (LADimpute) as in [5].

It can be seen from the literature that there are a several methods used for estimating missing values for different time series data. What is however lacking in the literature is an explicit method for estimating missing values for bilinear time series models. The study therefore sought to estimate missing values for bilinear time series which have different probability distributions.

2.4. Estimation of Missing Values Using Linear Interpolation Method

Suppose we have one valuemissing out of a set of an arbitrarily large number of n possible observations generated from a time series process. Let the subspace be the allowable space of estimators of based on the observed values i.e., =sp where n, the sample size, is assumed large. The projection of onto (denoted) such that the dispersion error of the estimate (written disp) is a minimum would simply be a minimum dispersion linear interpolator. Direct computation of the projection onto is complicated since the subspaces =sp and are not independent of each other. We thus consider evaluating the projection onto two disjoint subspaces of. To achieve this, we express as a direct sum of the subspaces and another subspace, say, such that. A possible subspace is, where is based on the values. The existence of the subspaces and is shown in the following lemma.


2.4.1. Lemma

Suppose is a nondeterministic stationary process defined on the probability space . Then the subspaces and defined in the norm of the are such that.

Proof:

Suppose, then can be represented as

where . Clearly the two components on the right hand side of the equality are disjoint and independent and hence the result. The best linear estimator of can be evaluated as the projection onto the subspaces and such that disp) is minimized. i.e.,

But where the coefficients’ are estimated such that the dispersion error is minimized. The resulting error of the estimate is evaluated as

Now squaring both sides and taking expectations, we obtain the dispersion error as

(1)

By minimizing the dispersion with respect to the coefficients the optimal linear estimate is

(2)

3. Methodology

Three methodologies were used in this study, each corresponding with the estimation method used. These methods included estimation based on optimal linear interpolation, artificial neural network and exponential smoothing. However, the time series data used and performance measures applied were the same for all the methods.

3.1. Methodology for Optimal Linear Interpolation Method

In this study, the estimates of the missing values for bilinear time series models with normal innovations were derived using optimal linear interpolation method by minimizing the dispersion error. The estimates of missing values using non parametric methods of ANN and exponential smoothing were also obtained.


3.1.1. Data Collection

Data was obtained through simulation using computer codes written in R-software. These codes are shown in the Appendix. The time series data were simulated from different simple bilinear models which have normal innovations. The seed in the R program code was changed to obtain a new sample for bilinear models BL(0,0,1,1) and BL(0,0,2,1). For each program code, a set of 100 samples of size 500 were generated.


3.1.2. Missing Data Points and Data Analysis

Three data points 48, 293 and 496 were selected at random from a sample of 500. Observations at these points were removed to create a ‘missing value(s)’ at these points to be estimated. Data analysis was done using statistical and computer softwares which included Excel, TSM and R and Matlab7.


3.1.3. Performance Measures

The MAE (Mean Absolute Deviation) and MSE (Mean Squared Error) were used. These were obtained as follows

(3)

and

(4)

4. Results

4.1. Derivation of the Optimal Linear Estimates of Missing Values

Estimates of missing values for pure bilinear time series models whose innovations have a Gaussian innovation were derived by minimizing the h-steps-ahead dispersion error. Two assumptions were made. The first one was that that the series are stationary and thus their roots lie within the unit circle. Secondly, the higher powers (of orders greater than two or products of coefficients of orders greater than two) of the coefficients are approximately negligible.

4.2. Pure Bilinear Time Series with Normally Distributed Innovations
4.2.1. Simple Pure Bilinear Time Series Models with Normal Distribution

The missing value estimate is based on the following theorem 4.1.

Theorem 4.1

The optimal linear estimate for pure bilinear model BL(0,0,1,1) with normal distribution is given by

Proof

The simplest pure bilinear time series model of order one, BL (0, 0, 1, 1) is of the form

(5)

Through recursive substitution, of equation (5), the stationary BL (0, 0, 1, 1) is obtained as

The h-steps ahead forecast is

Therefore the forecast error is

This can be expressed as

(6)

Substituting equation (6) in equation (1) we have

(7)

Simplifying each term of equation (7) separately, we obtain

Hence equation (7) can be simplified as

(8)

Now differentiating equation (8) with respect to and equating to zero, we obtain

Substituting the values of in equation (1), we obtain best estimator of the missing value as

This result shows that the missing value is a one step-ahead prediction based on the past observations collected before the missing value. This is similar to the findings of Nassiuma (1994) and Abraham (1981) which found for missing values for stable processes in some cases.


4.2.2. Estimating Missing Values for the Pure Diagonal Bilinear Time Series Model BL (0, 0, 2, 1)

The pure diagonal bilinear time series model with normal innovations of order p is given by

(9)

The missing value estimate is based on the following theorem 4.2.

Theorem 4.2

The optimal linear estimate for the bilinear time series model BL (0, 0, 2, and 1) is given by

Proof

The stationary BL(0,0,2,1) is given by

The h-steps ahead forecast error is given by

Therefore the forecast error is

or it can also be represented as

(10)

Substituting equation (0) in equation (1), we have

(11)

Simplifying each term of equation (11) separately, we have

Hence equation (7) can be simplified as

(8)

Now differentiating equation (8) with respect to and equating to zero, we obtain

4.2. Simulation Results

In this section, the results of the estimates obtained from the optimal linear estimate, artificial neural networks and exponential smoothing are presented. The graphs of the time series data are shown in figures 1 and 2 below. They are characterized by sharp outburst as clearly evident in BL (0,0,1,1). Sharp outburst is one of the characteristics of nonlinearity in bilinear models.

Figure 1. BL(0,0,1,1) with Normally Distributed Innovations
Figure 2. BL(0,0,2,1) with Normally Distributed Innovations

Simulation results are given in Table 1-Table 2.

Table 1. Efficiency of Measures for normal BL(0,0,1,1)

From Table 1, it is evident that the OLE gave the most efficient estimates (mean MAD=0.867245) of the missing values for the different missing data points positions, followed by EXP smoothing estimates (mean MAD=0.8742). Estimates based on ANN had the least efficiency (mean MAD=0.89154).

Table 2. Efficiency of Measures obtained for Normal_ BL(0,0,2,1)

From Table 2, it is clear that the OLE estimates of missing values were the most efficient (mean MAD=0.785298) for the different missing data point positions. This was followed by EXP smoothing estimates (mean MAD=0.908818). It evident that for bilinear time series data with normal errors, the OLE estimates gave the most efficient estimates of the missing values.

5. Conclusion

In this study we have derived the estimates of missing values for pure bilinear time series models whose innovations are normally distributed by minimizing the dispersion error. The study found the optimal estimate of the missing value is equivalent to one-step-ahead forecast. Further the study found that optimal linear estimates were the most efficient for normally distributed data. The study recommends that for bilinear time series data with normal innovations, OLE estimates be used in estimating missing values.

5.1. Recommendation for Further Research

A more elaborate research should be done to compare the efficiency of several imputation methods such as K-NN, Kalman filter and estimating functions, genetic algorithms, besides the three used in this study.

References

[1]  Abdalla, M.; Marwalla, T. (2005). The use of Genetic Algorithms and neural networks to approximate missing data. Computing and Informatics vol. 24, 5571-589.
In article      View Article
 
[2]  Abraham, B. (1981). Missing observations in time series. Comm. Statist. A-Theory Methods.
In article      View Article
 
[3]  Abraham, B.; Thavaneswaeran, A. (1991). A Nonlinear Time Series and Estimation of missing observations. Ann. Inst. Statist.Math. Vol. 43, 493-504.
In article      View Article
 
[4]  Bishop, C. M. (1995). Neural Networks for pattern recognition. Oxford: Oxford University Press.
In article      
 
[5]  Cao, Y; Poh K, L and Wen Juan Cui, W, J.(2008). A non-parametric regression approach for missing value imputation in microarray. Intelligent Information Systems. pages 25–34
In article      
 
[6]  Cheng and D. M. Titterington D M (1994). Neural networks: review from a statistical perspective.
In article      
 
[7]  Cheng, P. (1994). Nonparametric estimation of mean of functionals with data missing at random. Journal of the American statistical association, 89, 81-87.
In article      View Article
 
[8]  Cortiñas J,A.; Sotto, C; Molenberghs, G; Vromman, G.(2011). A comparison of various software tools for dealing with missing data via imputation. Bart Bierinckx pages 1653-1675.
In article      
 
[9]  De Gooijer, J.C.(1989) Testing Nonlinearities in World Stock Market Prices, Economics Letters v31, 31-35
In article      
 
[10]  Granger, C. W; Anderson, A. P.(1978). An Introduction to Bilinear Time Series model. Vandenhoeck and Ruprecht: Guttingen.
In article      
 
[11]  Hannan, E J. (1982). “A Note on Bilinear Time Series Models”, Stochastic Processes and their Applications, vol. 12, p. 221-24.
In article      View Article
 
[12]  Hellem, B T (2004). Lsimpute accurate estimation of missing values in micro Array data with least squares method. Nucleic Acids, 32, e34.
In article      View Article  PubMed
 
[13]  Hirano, K; Imbens, G, W.; Ridder (2002). Efficient estimation of average treatment effects using the estimated propensity score.Econometrica 71, 1161-1189.
In article      View Article
 
[14]  Howitt, P. (1988), “Business Cycles with Costly Search and Recruiting”, Quarterly Journal of Economics, vol.103 (1), p. 147-65.
In article      View Article
 
[15]  Kim, J K and Fuller. W. (2004). Fractional hot deck imputations. Biometrika 91, 559-578.
In article      View Article
 
[16]  Ledolter, J.(2008). Time Series Analysis Smoothing Time Series with Local Polynomial Regression on Time series. Communications in Statistics—Theory and Methods, 37: 959-971.
In article      View Article
 
[17]  Liu J. and Brockwell P. J. 1988. “On the general bilinear time series model.” Journal of Applied probability, 25, 553-564.
In article      View Article
 
[18]  Liu, J. (1989). A simple condition for the existence of some stationary bilinear time series.
In article      
 
[19]  Ljung, G. M. (1989). A note on the Estimation of missing Values in Time Series. Communications in statistics simulation 18(2), 459-465.
In article      View Article
 
[20]  Luceno, A.(1997). Estimation of Missing Values in Possibly Partially Nonstationary Vector Time Series.Biometrika Vol. 84, No. 2 (Jun., 1997), pp. 495-499. Oxford University Press.
In article      View Article
 
[21]  Maravall, A. (1983), “An application of nonlinear time series forecasting”, Journal of Businesa 6 Econamic Statistics, 1, 66-74.
In article      
 
[22]  Mcknight, E, P; McKnight, M, K; Sidani, S.; Figueredo, A.(2007). Missing data. Guiford New York.
In article      
 
[23]  Nassiuma, D. K. (1994). A Note on Interpolation of Stable Processes. Handbook of statistics,Vol. 5 Journal of Agriculture, Science and Technology Vol.3(1) 2001: 81-8.
In article      
 
[24]  Nassiuma, D. K.(1994). Symmetric stable sequence with missing observations. J.T.S.A. volume 15, page 317.
In article      View Article
 
[25]  Nassiuma, D.K and Thavaneswaran, A. (1992). Smoothed estimates for nonlinear time series models with irregular data. Communications in Statistics-Theory and Methods 21 (8), 2247-2259.
In article      View Article
 
[26]  Norazian, M. N., Shukri, Y. A., Azam, R. N., & Al Bakri, A. M. M. (2008). Estimation of missing values in air observations. Lecture notes in Statistics Vol. 25. Sprínger verlag. New York.
In article      
 
[27]  Oba S, Sato MA, Takemasa I, et al. A Bayesian missing value estimation method for gene expressioprofile data. Bioinformatics 2003; 19(16): 2088-2096.
In article      View Article  PubMed
 
[28]  Pascal, B. (2005):Influence of Missing Values on the Prediction of a Stationary Time Series.Journal of Time Series Analysis. Volume 26, Issue 4, pages 519-525.
In article      
 
[29]  Pena, D., & Tiao, G. C. (1991). A Note on Likelihood Estimation of Missing Values in perspective. Multivariate Behavioral Research, 33, 545-571.
In article      
 
[30]  Pourahmadi, M. (1989) Estimation and interpolation of missing values of a stationary time series. Journal of Time Series Analysis 10(2), 149-69.
In article      View Article
 
[31]  Priestley, M.B. (1980). State dependent models: A general approach to time series analysis.profile data. Bioinformatics 2003; 19(16):2088-2096.
In article      
 
[32]  Ripley, B. (1996).Pattern recognition and neural networks. Cambridge: Cambridge UniversityPress.
In article      View Article
 
[33]  Smith, K.W and Aretxabaleta, A.L (2007). Expectation–maximization analysis of spatial time series. Nonlinear Process Geophys 14(1):73-77.
In article      View Article
 
[34]  Subba Rao,T. and Gabr,M.M. (1980). A test for non-linearity of stationary time series.Time Series Analysis, 1,145-158.
In article      View Article
 
[35]  Subba, R.T.; Gabr, M.M.(1984). An Introduction to Bispectral Analysis and Bilinear Time Series Models. Lecture notes in statistics, 24. New York. Springer.
In article      
 
[36]  Thavaneswaran, A.; Abraham (1987). Recursive estimation of Nonlinear Time series models. Institute of statistical Mimeo series No 1835. Time Series. The American statistician, 45(3), 212-213.
In article      
 
[37]  Tong, H. (1983).ThresholdModels in Non-Linear Time Series analysis. Springer Verlag, Berlin.
In article      View Article
 
[38]  Troyanskaya, O, Cantor, M, Sherlock G, Brown, P, Hastie, T, Tibshirani R, Botstein D and Russ B. Altman1 (2001). Missing value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17 no. 6 2001Pages 520-525.
In article      View Article  PubMed
 
[39]  Sesay, S.A and Subba Rao, T (1988): Yule Walker type difference equations for higher order moments and cumulants for bilinear time series models. J. Time Ser. Anal.9, 385-401.
In article      View Article
 
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn