Comparison between Performance of Statistical and Low Cost ARIMA Model with GFDL, CM2.1 and CGM 3 Atmosphere-Ocean General Circulation Models in Assessment of the Effects of Climate Change on Temperature and Precipitation in Taleghan Basin

According to the importance of climate change, the necessity of develop a fast and accurate tool is undeniable. Although the comparison of a statistical model with specialized models which were designed regard to non-linear complexities of a phenomenon is not common, in this study ARIMA statistical model was analyzed and evaluated with GFDL CM2.1 and CGM3 Atmosphere-Ocean General Circulation Models (AOGCMs) in order to investigate on the effects of climate change on temperature and precipitation in the Taleghan basin. The results showed although GFDL CM2.1 model showed better performance in MAE and R 2 validation criteria and the predicted temperature had similar trend with the observational data, the difference between the model results and observations is significant. The CGM 3 model showed better performance in R 2 for precipitation, temperature and MAE for long term average of precipitation in addition to having similar trend to the observed data. However, for long term average of both temperature and precipitation, the general predicted trend had a considerable distance with the observational values. In contrast, although the statistical ARIMA model predictions had some fluctuations, they had better conformity to the general trend of observations. These results show that contrary to popular belief, in some cases like this investigated case, even cheap statistical models can likely provide acceptable results. between Performance of Statistical and Low Cost ARIMA Model with GFDL, and CGM 3 Atmosphere-Ocean General Circulation Models in Assessment of the Effects of Climate Change on Temperature and Precipitation in


Introduction
The Earth climate is consisted of four components of the atmosphere, cryosphere, hydrosphere and biosphere. Climatology investigates the weather of a particular region during certain time intervals that usually takes decades. Studies have shown that internal factors resulting from interactions between climate components and natural external factors caused by solar radiation, volcanic activity and excessive increase in greenhouse gases will cause imbalance between these components. Among external factors, only increase of greenhouse gases can affect the Earth climate system abnormally. Climate change is said to changes in climate for a long period such as several decades or more. These changes could be due to natural variability of climate or human activities [1]. Climate could be warmer or colder and average values of each factor increase or decrease over period of time. Climate change is a complex atmospheric-ocean and long-term phenomenon that can be influenced by natural factors such as volcanoes, solar activities, ocean and atmosphere, which could have interactions or as a result of human activities [2]. Industries and factories growth from the beginning of industrial revolution and consequently fossil fuels consumption and also destruction of forests and grasslands and change the usage of agricultural land; All of them are result of human activities and have increased the concentration of greenhouse gases particularly CO 2 in recent decades, therefore concentration of this gas has raised from 280 ppm in 1750 to 379 ppm in 2005. Researches show that if current trend in use of fossil fuels continues, the concentration of the gas by the end of the twenty-first century could reach more than 600 ppm [1]. Many of researchers consider gradual increase in global temperature and oceans due to increase in greenhouse gases as the most important factor in climate change. Global warming of the Earth caused two important phenomena in recent century: increase in average global temperature and increase in sea level consequently, while even small changes in hydrological variables can lead to considerable changes in water resources, climate change has a considerable influence on precipitation, evaporation, surface runoff in regional and local scales [3]. The negative effects of change of climatic variables on the Earth climate and various systems has made this phenomenon which has been considered as the most dangerous problem among ten human-threatening issues in the 21st century [4]. It is worth to say that in this classification the threat of massacre weapons stands in third place. Climate could become warmer or colder and average value of each factor of it can be increased or decreased over time. By changing climatic variables, other systems which are affected by these variables such as water resources, agriculture, environment, health and economy will change [4].
According to undeniable importance of climate change, the need for a tool that can assess the effects of this phenomenon with favorable speed and accuracy is extremely significant. Although comparison of a statistical tool with specialized model which was originally designed with respect to the non-linear complexity of the a phenomenon is uncommon, in this research ARIMA statistical model was analyzed and evaluated with two Oceanic-atmosphere General Circulation Model of GFDL CM2.1 and CGM3 in order to investigate the effects of climate change on temperature and precipitation in the Taleghan basin.
With the development of models and numerical tools, statistical modeling were developed by using advanced statistical methods and based on the long-term elements, phenomena and major climate causes. The investigations which are conducted by Katsoulis in 1987 [5], Karl in 1988 [6], Galbraith and Green in 1992 [7], Graf et al., 1995 [8]; Brunetti et al 2000 [9], Yuu and Hoshino in 2003 [10], Li et al. in 2004 [11] and Eyni 2014 [12] are some of related researches in this area. IPCC provided the primary series of emissions in 1992 with name of IPCC (IS92a-IS92f). In this scenario the amounts of greenhouse gases will increase with a fixed rate until 2100. In 1996, in order to update and replace IS92scenario, a set of emission scenarios as a Special Report of Emissions Scenario (SRES) published to study of climate change. Emission scenarios have been founded to explore the future development in the global environment and provide special reference to emissions of greenhouse gases and suspended particles in the atmosphere. IPCC has presented four major assessments on climate change (FAR -1990, SAR -1995, TAR -2001and AR4 -2007. So far, the use of presented GCM model in AR4-2007 about climate change studies from 2007 has grown considerably in comparison with the models presented in previous reports. The output of these models is accessible from Data Distribution Center (DDC) that was created based on the recommendation of the Working Group on Climate Impact Assessment in 1998.

ARIMA Model
An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model which is very common in statistics and econometrics, and in particular in time series analysis. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied one or more times to eliminate the non-stationarity [13].
The AR part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged (i.e., prior) values. The MA part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The I (for "integrated") indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible. Non-seasonal ARIMA models are generally denoted ARIMA(p,d,q) where parameters p, d, and q are non-negative integers, p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the movingaverage model. Seasonal ARIMA models are usually denoted ARIMA(p,d,q)(P,D,Q)m, where m refers to the number of periods in each season, and the uppercase P,D,Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model [14,15]. When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping "AR", "I" or "MA" from the acronym describing the model. For example, ARIMA (1,0,0) is AR(1), ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1).
Given a time series of data X t where t is an integer index and the X t are real numbers, an ARMA(p,q) model is given by [16]: Where Lis the lag operator, the α i are the parameters of the autoregressive part of the model, the θ i are the parameters of the moving average part and the ε t are error terms. The error terms ε t are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean. If ) has a unit root (a factor (1 − ) of multiplicity d. Then it can be rewritten as: An ARIMA (p,d,q) process expresses this polynomial factorisation property with p=p'−d, and is given by: And thus can be thought as a particular case of an ARMA (p+d,q) process having the autoregressive polynomial with d unit roots. (For this reason, no ARIMA model with d > 0 is wide sense stationary.)The above can be generalized as follows. This defines an ARIMA (p,d,q) process with drift δ/(1 − Σφi). 1 1

Atmosphere General Circulation Models and the Down Scaling
General Circulation Models are developed to simulate current climate on the Earth and are able to predict future climate change on the Earth [17]. These models were introduced and used based on Philips's personal investigation for the first time in the 1960s. Atmosphere General Circulation Models solve continuity equations for fluid dynamics in spatial and temporal discrete scales, and their structure is the same as numerical weather prediction models. The main difference is that in these models the weather predictions have been done in a shorter period of time (a few days) by defining the initial conditions precisely and their accuracy is limited to a regional with dimensions less than 150 kilometers. But the network which is defined for GCM may include some geographic latitude and longitude to simulate long-term weather [3].
In early GCM models, physical characteristics of the atmosphere at the Earth's surface were used as boundary conditions, but recently in these models atmosphere-ocean boundary conditions are used for ocean modeling and surface temperature and soil moisture are used for Earth's surface. One of the major weaknesses of these models is the disability to modeling the effects of clouds on the atmosphere and inadequate accuracy to express the effects of hydrological variables such as land use. In general, GCM models have better performance to simulate and predict large-scale climate events such as assessment of enormous storms rather than expression of local and regional climate processes such as rainfall-runoff process [3].
Due to computational limitations, analysis of general climate predictions have been doing by limited centers which are equipped with specific supercomputers for these calculations. Currently, more than 40 organizations in the world have developed different models of general circulation for the planet Earth. One of the major limitations in using the output of GCM models is having large-scale computational cells in terms of their spatial and temporal which does not have required match to hydrological models. Different methods exist to produce regional climate scenarios from climate scenarios of GCM models, which are called small scaling of these methods. In proportional method usually monthly ratios are achieved for historical series. For this purpose, it is necessary to produce scenarios of climate change for temperature and precipitation at first step. In order to calculate the climate change scenario in each model, the difference values for temperature (equation (5)) and the ratio of rainfall (equation (6)) are being calculated for long-term average in each month in future periods and basic simulated periods by the model for each cell of the computational grid [18]. .
In the above equations is simulation of long-term average of temperatures by GCM in the same period with observed period for i months [19]. Because of the large computational cells in GCM models, simulation of climatic fluctuations is associated with turbulence. In order to eliminate these turbulences, usually instead of direct use of GCM data in climate change calculations, the long-term periodic average of data is used, then Change Factor method is used for minimizing scale of data. In Change Factor method to make careful time series of climatic scenario in future, climate change scenarios are added or multiplied in observed values.
. obs P P P = + ∆ In the above equations, Tobs and Pobs are time series of observation temperature and precipitation in the base period, respectively; T, and P are time series of climatic scenarios of temperature and precipitation in future period, ΔT and ΔP are climate change scenarios of minimized scale of temperature and precipitation.

Introduction of the Case Study
Based on the water master plan of Iran, large SefidRood River basin is divided to 17 sub-basins and upper basin of Taleghan with 960 Km 2 area is located in east of SefidRood basin and covered 2.5% of its surface [20].
The upstream of Taleghan catchment is located in central Alborz Mountains and the most important feature is its high altitude and steep slope. The average elevation of basin is 2665 meters above sea level and its maximum and minimum height is 4300 and 1390 meters, respectively. Also 50percent of Taleghan catchment has more than 40% slope and its general direction is east-west. Distribution of precipitation in its different locations varies between 250 and 1000 mm per year, also annual average rainfall in the entire of basin is 600 mm. Length of Taleghan River is 85 meters, which is located in this catchment and it has its own maximum discharge in spring [21].

The Method of Work
Due to time constraints of available data for rainfall, temperature and runoff in a common period in selected stations and the need for long period for using rainfallrunoff model, 1968-2008 period was chosen. Rainfall, temperature and monthly runoff data of selected basin stations were corrected and completed. To obtain the average rainfall of the basin, weighted average of the selected stations were used. Therefore, for each station weighted average was taken into based on its elevation and area and average level and total area of the basin.
According to the chart (1) the average of annual rainfall is 600 mm in the basin and the rainiest months are April and May.For all stations, winter months experience more than 45 millimeters of rainfall. Joostan, Gateh deh and Galidar precipitation stations have high impact on average rainfall and consequently on catchment runoff according to its rate of precipitation and catchment height. However, according to information of Zidasht station, the average temperature in this area is the 7.8°C. The absolute maximum temperature is 37°C in July and minimum temperature is -18°C which is measured in March [22].
As chart (2) shows, for temperature variable monthly data of Zidasht with elevation of 2000 meters due to its lowest difference elevation with the average elevation of the basin (372'2 m), this station was considered as the basis and according to elevation difference between Ziadasht station and the average elevation of the upstream basin of Taleghan, with use of a temperature gradient and height, temperature data related average basin would be calculated [23].
Because of limitations in available rainfall and temperature data in the study area, 1968-2008 period which was common among all stations was chosen as the base period. In order to evaluate the performance of selected GCM models in simulation of regional climate variables, while A2 emission scenario shows stricter conditions for the status of greenhouse gas emissions in future periods than other scenarios emissions, it was considered. Then monthly precipitation and temperature data selected from GCM models that contain the timeseries variables of computational cells surrounding the Earth's climate, were taken from CCCSN, and monthly precipitation and temperature data for the base period related to computational cell which was located in selected stations of the basin (original cells) were extracted, after that, 40-years average of monthly precipitation and temperature of the cells were determined. Finally, these amounts were compared with 40 years average of monthly observed precipitation and temperature of basin in base period.
In modeling of temperature and precipitation by using ARIMA as the results of before study shown that forecasting time-series (Forecast) in ARIMA model which was used in this study has significantly more favorable results than two time series of Lower and upper Actual (overestimate). In this study only Forecast time series was examined. In the next step the average of time series which was predicted by ARIMA models and the results of AOGCM model that examined in the present study were calculated and validated to find conformity with the average of observed data in the basin.

Methods and criteria for validation
In statistics, correlation refers to any statistically significant relationship between two variables, Pearson correlation coefficient has been developed by Karl Pearson based on an original idea of Francis Galton, which measures linear relationship between two random variables. The correlation coefficient can have values between -1 to +1 in which if the correlation be close to +1, correlation is more and direct, and if correlation be close to -1, correlation is more but indirect and zero mean a lack of correlation. In this study, the Pearson correlation coefficient was used to compare the results of generated data with observed one based on definition for a statistical sample with n couples (O i , P i ) we have: The mean absolute error (MAE) is criteria to measure how much predicted results are close to desirable ones. This criterion is measurable by the following equation: Root mean square deviation (RMSD) or root mean square error (RMSE) is a common measuring criterion that is calculated from the difference between the predicted values by model or estimator and the observed data. In fact, RMSD indicates the sample standard deviation from the predicted values and the observed data. These differences are called residual when calculations are estimated from samples and are called forecast error when are predicted out-of-sample. RMSE is an acceptable measure to compare the prediction errors of a special variable that is measurable by the following equation: In all equations O i is the observed data, P i is estimated value, � is the average of observational data, � is the average of estimated data and n is the number of data.

Analysis and Assessment of Results
Chart 3 and Chart 4 show a comparison between the average of monthly precipitation and observation temperature and examined models in this study based on basis period. In order to evaluate the performance of models in simulation of regional climatic variables, the criteria of performance of coefficient of determination (R 2 ), Root Mean Square Error (RMSE) and mean absolute error (MAE) were used. Table 2 shows the criteria of performance AOGCM models in simulation of precipitation and temperature data in the region, than the average observational data of basin.

Conclusion
According to Table 2, the comparison between the results of conducted forecasts by using validation R 2 , MAE and RMSE criteria, temperature and precipitation parameters show while both ARIMA model and the standard GFDL CM2.1 R 2 show favorable results, according to these criteria, ARIMA model is not able to predict precipitation precisely.
In terms of RMSE validation criterion, ARIMA model provides appropriate results for temperature and precipitation, nevertheless in the case of the average of rainfall, the MAE criteria of GFDL CM2.1 has been considered more acceptable. Also, comparison of obtained results of predictions which were done by using R 2 , MAE and RMSE validation criterion of temperature and precipitation shows that while both ARIMA model and CGM3 show favorable results based on R 2 criterion, according to this criterion ARIMA model is not able to predict precipitation precisely. In terms of RMSE validation criterion, ARIMA model has shown acceptable results, but both models have the same performance for precipitation based on this validation criterion. In terms of MAE criterion, although there is no significant difference in the rainfall data, ARIMA model has presented better performance about temperature parameter. According to chart (3), Average of predicted rainfall by ARIMA model in November, February, April and July are more than the long-term average of the basin. In contrast, predicted results in GFDL CM2.1 model show low rainfall on most months in the Taleghan basin. Therefore, this amount has a considerable distance from the long-term average of basin. The comparison of the established overall trend shows that the average of predicted rainfall by ARIMA model has a slightly better fit with the mean of the observations of basin than GFDL CM2.1 model. It also seems that despite the relative similarity of the general predicted trend by the model CGM3, average of estimated rainfall by this model has a significant difference with the observation data, so that except for June, it is estimated much less than the average of basin. Average of predicted rainfall by ARIMA model is considerably more than the long-term average of values of basin in November, February, April and July. By comparing the established general trend it can be inferred that the average of predicted rainfall by ARIMA model has better agreement with the observed data in the basin than the CGM3 model. According to Chart (4), it seems that the average of predicted temperature by ARIMA model is far less than the long-term average of the basin in July. Also, mean of predicted temperature by ARIMA models has a satisfactory conformity with an average of observation data except for the months of October and June that is higher than the average temperature of the catchment. In contrast, GFDL CM2.1 model forecasts temperature which is higher than the average of basin in most months. Therefore, the temperature predicted by this model is dramatically higher than the average basin except in December that the predicted temperature is much lower than average temperature of basin. By examining the general trend of curves, it can be concluded that the ARIMA model was able to predict the changes more desirable and presented less difference for the average of basin than GFDL CM2.1 model. Furthermore, it could be concluded that the CGM3model estimates significantly more temperature than the average of catchment except in the months of October, November, March and September that its predictions are less. In the case of ARIMA model seems that the mean of predicted temperature is much lower than the long-term average of basin in July, however the mean of predicted temperature by ARIMA models has satisfactory conformity with the average of observation data, except for the months of October and June that is higher than the average temperature of the catchment, By examining the general trend of curves, it can be concluded that the ARIMA model was able to predict the changes more desirable and presented less difference for the average of basin than CGM3 model. So, it can be concluded that with investigation in the effects of climate change on the studied basins, it seems that although the GFDL CM2.1 model has better performance in terms of MAE and R 2 validation criteria, and in the case of temperature parameter shows similar trend with the overall trend of observed data, however chart of the average of predicted temperatures by this model has a relatively considerable difference with a chart of the average of observation data. In contrast, it seems that the results of the ARIMA model in terms of conformity of predicted values with the observed Statistic have an acceptable agreement.
Also in investigation of the effects of climate change on the watershed in this research, although it seems that CGM3 model in terms of R 2 validation criterion for both temperature and precipitation and in the case of MAE validation criterion precipitation parameter shows similar trend with the overall trend of observed data, however the predicted general trend by this model about the average of both parameters has a relatively significant difference with the chart of mean observation data which is more tangible in precipitation one. So, it can be concluded that in terms of temperature parameters, although there is margin difference between ARIMA and AOGCM models, according to the high value of R 2 , all of three investigated models have shown excellent results in terms of R 2 criterion. However in the case of error criteria of RMSE and MAE, the results of error in ARIMA model is half of obtained error in AOGCM models, which indicates better performance of ARIMA model. In the case of rainfall data, in terms of R 2 parameter the results of CGM3 and GFDL CM2.1 models are better than ARIMA model which is almost twice, however, the investigation of MAE error criterion shows that CGM3 and ARIMA models ranked first, second respectively and third place were allocated to the GFDL CM2.1 model. The results of RMSE error criterion also indicates the slightly better performance of the ARIMA model than CGM3 with a relatively small difference, but the GFDL CM2.1 model has significant differences the other two models.