Missing Values Estimation for a Stable Bivariate Autoregressive Process

I.A. Iwok

American Journal of Applied Mathematics and Statistics

Missing Values Estimation for a Stable Bivariate Autoregressive Process

I.A. Iwok

Department of Mathematics/Statistics, University of Port-Harcourt, P.M.B.5323, Port-Harcourt, Rivers State; Nigeria

Abstract

This work proposed a method for the estimation of missing values in a stable bivariate autoregressive time series process. Missing observations were created at different positions in a stable bivariate series and the method was applied. Despite its ease of implementation, the obtained results suggested good performance of the method. The estimates obtained were compared with those of other existing methods. The result showed that the proposed method provides better estimates than the existing methods.

Cite this article:

  • I.A. Iwok. Missing Values Estimation for a Stable Bivariate Autoregressive Process. American Journal of Applied Mathematics and Statistics. Vol. 4, No. 3, 2016, pp 67-73. http://pubs.sciepub.com/ajams/4/3/2
  • Iwok, I.A.. "Missing Values Estimation for a Stable Bivariate Autoregressive Process." American Journal of Applied Mathematics and Statistics 4.3 (2016): 67-73.
  • Iwok, I. (2016). Missing Values Estimation for a Stable Bivariate Autoregressive Process. American Journal of Applied Mathematics and Statistics, 4(3), 67-73.
  • Iwok, I.A.. "Missing Values Estimation for a Stable Bivariate Autoregressive Process." American Journal of Applied Mathematics and Statistics 4, no. 3 (2016): 67-73.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

1. Introduction

The problem of missing observations in time series data cannot be overemphasised. It is a common problem usually encountered during data collection. An observation may not be available at the time of need due to faulty equipment used in measurement, lot records of events or mistakes, uncooperative response during data collection etc. In practice, statistical analysis is usually carried on a data set with complete observations. In any situation where an observation is missing, it has to be estimated and replaced in the missing position before a conclusion and inference is drawn from the data set.

Estimation of missing values has gained much grounds in other areas of Statistics such as experiment design, multivariate analysis etc. However, less achievement is recorded in the area of time series; perhaps due to its complicated structure.

Time series is an ordered sequence of observations. The ordering is usually through time. In some cases, it might involve other dimensions such as space. A discrete univariate time series variable is denoted by where belongs to an index set . A particular time series is believed to have been generated by an underlying mechanism called a stochastic process. Thus, a stochastic process is a family of time indexed random variables ; where belong to a sample space.

Just like in other areas of Statistics, an extension of the univariate case is the multivariate time series models. The simplest of its kind is the bivariate case. A bivariate process consists of two component univariate time series. However, the methods of handling such vector are rather complicated and quite different from the bivariate cases of other areas of Statistics. This is because the indexed set , the correlation and cross correlation structures at different lags are always taken into consideration. Unfortunately, most of these methods were design merely for predictive purposes and depended on the completeness of the sample in space or time. However, the fact that missing observations are inevitable demands the need to incorporate in these techniques, a method that can accommodate such problems.

2. Review of Literature

In statistical analysis, the model estimation problem in the presence of missing data is usually solved by maximum likelihood approach or imputation methods [1]. [1] considered the case of multidimensional time series with a part of observations that are completely or partially missing. He proposed a model estimation procedure based on composite likelihood combined with a model based imputation method. The proposed method was validated by simulation studies and the results enlightened the effect of imputation strategies on model estimation. The method gave better results at least for the variance covariance parameters.

[9] worked on the estimation of parameters on the dynamic factor analysis using the EM algorithm. The result showed that the dynamic factor analysis can analyse short, non stationary time series containing missing value.

[3] considered the problems of predicting missing observations and forecasting future values in incomplete time series data. In their work, three forecasting models (a dynamic multivariate autoregressive model, a multivariate local trend model and a Gaussian process model) were studied. Each of these models was analysed using air temperature data collected by a network of weather sensors. Comparisons of these models were made and it was discovered that the dynamic linear model coped easily with incomplete or missing observations.

[2] presented a unified approach to the analysis of messy data. The paper examined irregularities such as missing values, outliers, structural breaks and irregular spacing. Here, a missing observation was handled by introducing a dummy variable into the measurement. By introducing a state space frame work to explanatory variables, [2] discovered that the method of dummy variable has exactly the same effect as skipping a filter update introduced by Kalman filters.

[8] proposed a method for modelling and fitting multivariate spartial time series data based on current spartial methodology coupled with the parameterization of the ARIMAX model. Because of the physical constraints imposed on multivariate data collection in both space and time, the estimation and identification procedures tolerated general patterns of missing or incomplete data.

[5] considered the problem of estimating parametric multivariate density models when unequal amount of data are available on each variable. It was discovered that there exist a significant evidence of time variation in the conditional copula of the exchange rates (Yen-US dollars and Euro-US dollars), and evidence of greater dependence during extreme events than under the normal distribution.

[7] proposed an imputation method to be used with singular spectrum techniques which is based on a weighted combination of the forecasts and hind casts yield by the recurrent forecast method. Despite its ease of implementation, the obtained result suggested an overall good fit of their method. This is because it yielded similar adjustment ability in comparison with the alternative method according to some measures of predictive performance.

According to [6], the commonest way of dealing with missing observations is to replace them with the mean of the data. This is because every observation is expected to be distributed around the mean under normal situation. According to them, any observation that deviate much from the mean has to be tortured to reflect it membership before being used for analysis.

Actually, there is no much direct literature on missing values of a stable vector autoregressive (VAR) process. Some of the literatures cited above are either indirectly or lightly linked with the subject matter. Hence, our comparative study shall base on few related existing methods that are computationally less cumbersome.

3. Methodology

Let and be two univariate time series under consideration. Then, is said to be a bivariate time series. According to [4], the general vector autoregressive model of order for can be expressed as:

(1)

where

is a vector of time series variables,

are fixed coefficient matrices

is a fixed vector of intercept terms allowing for the possibility of non zero mean .

is a white noise process or innovation process. That is,

covariance matrix which is assume to be non singular if not otherwise stated.

The model can be written in the matrix form as

3.1. VAR Order Selection

For any n-dimensional time series vector assumed to be generated by a process (where is the order of the model), three model order selection criteria are usually considered:

(i) Akaike Information Criteria (AIC)

This is given by

The estimate for p is chosen so that this criterion is minimized. Here the constant in the VAR model may be ignored as freely estimated parameter because counting them would just add a constant to the criterion which does not change the minimizing order.

(ii) Hannan-Quin Criterion (HQC)

This is given as

The estimate is the order that minimizes () for

(iii) Schwarz Criterion (BIC)

This is given by

The estimate is chosen so as to minimize the value of the criterion.

Where is the VAR order

is the estimate of white noise covariance matrix

is the number of time series components of the vector time series

is the sample size.

3.2. Stable VAR () Processes

Any VAR processes with can be written in VAR (1) form [4]. More precisely, if is VAR , a corresponding -dimensional VAR (1) is given by

where

The process is said to be stable if

Specifically, the process (1) is stable if its reverse characteristic polynomial of the VAR () has no roots in and on the complex unit circle. That is, is stable if

This condition is called the stability condition. A stable VAR () process is also stationary.

4. The Missing Value Approach

Now, suppose we have observations for both series; and the missing observations occur at the th and th position of and respectively. We estimate these missing observations by the following approach:

Let and be the two missing observations. If , the first step involves obtaining VAR lag order of the two series and . However, if , we obtain the VAR lag order of the series and . Of course this particular step also involves fitting the model to the above series and obtaining the estimate of the parameters as shown:

The above fitted model is then used to obtain the first estimate of the missing observations and .

The second step involves substituting these estimates and in their missing positions in the data and the bivariate analysis is repeated on the complete data to obtain the final model as shown:

This model is then used to obtain the final estimates of the missing observations and .

The superscripts in braces in the above expressions only indicate that we are obtaining the first (1) or second (2) estimate of the missing observations.

5. Illustration and Result

We illustrate the above proposed approach of estimation using 50 monthly cases of hypertension and diabetes data (see appendix 5). First, we removed the and from the data. Next, we conducted VAR analysis on the series and using gretl software. The selected VAR order (in this case as indicated by minimum values of the three criteria at lag 1) and the result of the analysis at this first step are displayed on appendix 1 and 2. Thus, the resulting model is:

(2)

The above model (2) is then used for the estimation of the first estimates of the missing values which gives: and .

In step two, these estimate ( and ) are replaced in their missing positions and the analysis is repeated for the entire series. That is, for the series and . The results of the analysis are displayed in appendix 3 and 4 below; and we have the final model:

(3)

The above model (3) is then used to compute the final estimates of the missing observations. Thus we have as our final estimates:

The absolute deviations (denoted AD) of these estimates from their respective actual values in the data can be computed as errors of the estimates. Thus, we have

We see here that the errors conceived by these estimate are negligible. We all know that the primary practice of dealing with missing observations is to replace them with the mean. Comparatively, however, the error created by our method of estimation is far less than that created by using the means ( and ).

5.1. Stability of the VAR (1) Process

For this process, the reverse characteristic polynomial is

which gives the roots: and . These roots are outside the unit circle; and the process is therefore stable.

The table (Table 1) below shows the estimates obtained by creating additional four missing observations at various (th and th) positions in the and series. Comparisons of these estimates are made between our proposed new method (PNM) and other methods: the mean (M) and the Stoffer method (SM); based on the absolute deviations (AD). The absolute deviations from the actual values are placed in bracket under their respective estimates.

Table 1. Estimates and Errors of the Different Methods

As seen in the above table, comparison based on AD (errors) shows that the proposed new method provides better estimates than the mean and the Stoffer method.

6. Summary and Conclusion

This work has provided a method of estimating missing values in a stable VAR bivariate process. The approach was illustrated using real life data. The method was found to outperform other methods of estimation with minimum error. In this light, this proposed method has offered a practical framework of dealing with missing observations in a stable VAR process.

References

[1]  Giovanni, F. and Grassetti, (2011). Multidimensional time series model estimation in the presence of partially missing observations. Workshop SIG-annual conference paper, Vol.3, No. 3, pp. 28 -35.
In article      
 
[2]  Harvey A. and Koopman S.J. (2015). Messy time series ‘’A unified approach’’. Advances in Econometrics; Vol. 13, pp 103-143.
In article      View Article
 
[3]  Lee S.M. and Robert S. (2008). Multivariate time series forecasting in incomplete environments. Technical report PARG 08-03. Robostics Research group. Department of Engineering Science, University of Oxford.
In article      
 
[4]  Lutkepol H. (2005): New introduction to multiple Time Series Analysis. Springer Berlin Heidebelg New York.
In article      View Article
 
[5]  Patton, A.J. (2006). Estimation of multivariate models for time series of possibly different length. Journal of applied Econometrics: 21: 147-173.
In article      View Article
 
[6]  Peterson, H. and Pederson, B. (2010). Handling missing values and Outliers in a purely random data. Journal of Mathematical Scieces. Vol.3, No. 7, pp. 37-41.
In article      
 
[7]  Rodrignes, P.C. and Carvalho, M. (2014). Spectral Modelling of time series with missing data. Centre for Mathematics and applications. Faculty of Science and Technology. Nova University of Lisbon; 2829-516 Caparica, Portugal.
In article      
 
[8]  Stoffer, D.S. (1986). Estimation and Identification of space-time ARMAX models in the presense of missing data. Journal of the American Statistical Association. Vol. 81, No. 395, theory and methods.
In article      View Article
 
[9]  Zuur, A.F. et al (2015). Estimating common trends in multivariate time series using dynamic factor analysis. J. ana. In Stat. Vol.2, No. 5, pp. 19-24.
In article      
 

Appendix

Appendix 1. VAR Order Selection with Missing Observations

Appendix 2. Analysis Result with Missing Observations

Appendix 3. VAR Order Selection with Replaced Estimates

Appendix 4. Analysis Result with Replaced Estimates

Appendix 5. [Monthly Cases of Hypertension and Diabetes (2009 – 2013)]

  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn