Source Traffic Modeling Using Pareto Traffic Generator

MOHSEN HOSAMO

Journal of Computer Networks

Source Traffic Modeling Using Pareto Traffic Generator

MOHSEN HOSAMO

Abstract

This paper describes realistic modeling of source burst traffic using Pareto distribution. The source traffic generators, which was investigated depending on the Pareto distribution, is ParetoON/ParetoOFF for generating the burst length and gap time. Network parameters such as allowed cell rate (ACR), switch input / output rate, memory access time, queue length and cell transfer delay (CTD) have been estimated considering six data sources. Mathematical model of the source data traffic has been developed and results for ParetoON/ParetoOFF traffic generators are presented considering up to 1000 count values of the uniformly distributed random number S. The effect of shape parameter “α” of ParetoON/ParetoOFF, is reported.

Cite this article:

  • MOHSEN HOSAMO. Source Traffic Modeling Using Pareto Traffic Generator. Journal of Computer Networks. Vol. 4, No. 1, 2017, pp 11-19. http://pubs.sciepub.com/jcn/4/1/2
  • HOSAMO, MOHSEN. "Source Traffic Modeling Using Pareto Traffic Generator." Journal of Computer Networks 4.1 (2017): 11-19.
  • HOSAMO, M. (2017). Source Traffic Modeling Using Pareto Traffic Generator. Journal of Computer Networks, 4(1), 11-19.
  • HOSAMO, MOHSEN. "Source Traffic Modeling Using Pareto Traffic Generator." Journal of Computer Networks 4, no. 1 (2017): 11-19.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

At a glance: Figures

1. Introduction

A good simulation for the network traffic may lead to a better understanding of the characteristics of the network traffic itself, which in turns can help in designing better network devices. The first traffic model, based on Poisson traffic generator [1] was born in the context of telephony, where the call arrivals could be considered independent and identically distributed. It was proved that Poisson traffic generator not suitable to simulate the bursty traffic, where we need more appropriate traffic generators to simulate the bursty traffics. As the QoS performance [2] of the network is greatly influenced by the traffic behavior, it is essential to consider the appropriate model for simulation studies. For example, Asynchronous transfer mode (ATM), Wide area network (WAN) and Local Area Network (LAN) traffic are statistically self similar [3] which causes a highly variable or bursty traffic over a wide range of time scales and the bursts do not average out even over long time scales irrespective of the scale size. The self-similarity is usually attributed with heavy-tailed distributions of objects and therefore a closer model of bursty traffic independent of time scale can be achieved by considering heavy-tailed distributions. The self-similar heavy-tailed traffic can be generated with ON/OFF sources; where ON represents sending cells and OFF being idle state of the source. The superposition of a large number of ON/OFF sources results in self-similar [4, 5] traffic if ON and OFF durations are described by heavy tailed probability distributions [6, 7]. These heavy tailed distributions have high or even infinite variance and therefore show extreme variability on all time scales that matches to the realistic traffic bursts.

The goal of this paper is to point out the source traffic modelling of Pareto distribution (bursty traffic generator) along with the analytical and simulation studies.

The paper is organized as the following: Traffic burst modeling using Pareto distribution is described in Section 2. Analytical results introduced in section 3, and simulation results in section 4.

2. Traffic Burst Modeling Using Pareto Distribution

As the QoS performance of the network is greatly influenced by the traffic behavior it is essential to consider the appropriate model for simulation studies. For example, ATM and Ethernet LAN traffic are statistically self similar [7, 8, 9] which causes a highly variable or bursty traffic over a wide range of time scales [8] and the bursts do not average out even over long time scales irrespective of the scale size. The self-similarity is usually attributed with heavy-tailed distributions of objects and therefore a closer model of bursty network traffic independent of time scale can be achieved by considering heavy-tailed distributions [6, 7]. The self-similar heavy-tailed traffic can be generated with ON/OFF sources [8]; where ON represents sending cells and OFF being idle state of the source. The superposition of a large number of ON/OFF sources results in self-similar traffic if ON and OFF durations are described by heavy tailed probability distributions [10]. These heavy tailed distributions have high or even infinite variance and therefore show extreme variability on all time scales that matches to the realistic network traffic bursts. In the present work, we have considered Pareto power-tailed distribution to achieve close resemblance with the network traffic bursts.

A. Pareto Power-tailed Distribution

Pareto distribution [11, 12] is a power-tailed distribution for which there exists positive constants and such that for distribution function

(1)

Referring equation (1) it is evident that power tail function decays geometrically in contrast to the exponential decay of the exponential and gamma functions. Power tail distributions are a subset of a broader class of distributions whose tail decays slower than exponential (i.e.) and it is called heavy / fat / long tailed distribution. It includes lognormal as well as Weibull (with shape parameter <1) distributions. The tails of lognormal and Weibull distributions [16] decay slower than any exponential but not slower than Pareto distribution, which has the following probability density function ():

(2)

where is shape parameter (or tail index) that determines how fast the probability density function decays with, and M is minimum value of . The cumulative distribution function (CDF) is given by

(3)

Mean value (first moment) of a Pareto distribution is defined as:

(4)

Where is not permissible and the variance of the distribution is expressed as

(5)

From equation (5) it is clear that the variance of the distribution becomes infinite for whereas mean attains infinite value for . Therefore for self-similar traffic [14] and lower value of gives higher probability of assuming extremely large value.

The probability density functions for Pareto and Poisson distributions for different values of and (average cell arrival rate) respectively are shown in Figure 1 from which it is clear that Pareto distribution decays more slowly than exponential. The two parameters andof the Pareto distribution are related as

(6)

where is the ratio of (), the minimum time between arrival of consecutive cells to its mean value and the distribution becomes more bursty for value of closer to 1 [13].

Pareto distribution modeling involves a transformation function for converting a random variable of uniform distribution into Pareto distribution. Considering the fundamental transformation law of probabilities for two probability density functions and ,

(7)

Where is the probability density function of random variable and is another probability density function of random variable. Since is a random variable of a uniform distribution in therefore is a constant and hence

(8)

Equation (8) can be used to find source random variables by inverse transformation of . For Pareto distribution, the inverse can be easily found from equation (8) solving integral with corresponding to Pareto distribution which gives

(9)

Therefore the required transformation is

(10)

where is uniformly distributed between 0 and 1 () and defining

(11)

Considering truncated Pareto distribution [14] and taking to be the smallest non-zero value produced by a uniform random number generator, the resulting truncated Pareto-distribution values will not exceed , the maximum (or cutoff) value given as

(12)
Figure 1. Pareto and Poisson Probability Density Functions

The uniformly distributed values of can be generated through Mersenne Twister (MT) [15] for generating uniform random numbers between the smallest value () and 1.0 [,1.0] so that the range of Pareto-distributed values will be within the range and .

The mean value of truncated Pareto distribution can be written as:

(13)
(14)

or

(15)

is a coefficient used to find minimum inter-burst interval such that aggregated traffic from all sources would produce the desired link load. Defining and as the shape parameters for the and periods respectively the mean values of the burst length can be written as

(16)

and

(17)

Therefore the minimum value of the length between consecutive bursts (period) can be found from equation (17) using known values of load of the ith source and the cell size. The load variation of the traffic can be realized by synthesizing predefined load such that the resulting load. Therefore the aggregate traffic from N sources will generate the load ‘’ on a link with rate Mbps giving average throughput of Mbps which necessitates a good estimation of the load generated by an individual source that can be expressed as

(18)

where , , , and () are the mean burst length, cell size, mean length, and minimum inter-cell gap length respectively in bytes. Assuming that and are respectively the minimum and lengths and is the cell size then the load can be found from equation (18). is a secondary parameter dependent on load calculated by the source using Pareto distribution automatically for a desired load.

B. Inter-bursts and Next Burst Length Computations

Using equation (18) the mean inter-burst length () can be expressed as

(19)

Substituting the values of “” and “” from equations (16) and (17) respectively in equation (19), the minimum length can be written as

(20)

Considering the link rate and using relation , the minimum length in terms of time can be computed as

(21)

Now the value of =1/ACR (where “ACR” is the available cell rate) is readily available depending upon the selected value(s) of “ACR” that can be separately taken as variable and thus equation (21) can be re-written as

(22)

Therefore equation (22) can be used for computing the value of that would result in link load closer to using the selected values of K, , , and . However the traffic generated using equation (22) gives the mean values of and periods that slightly deviates from their theoretically computed values. This deviation in the and values is due to generation of discrete values of Pareto-distribution with uniform probability using equation (11) whereas Pareto distribution assumes continuous sample space. However, Pareto-like distribution [15] can be achieved keeping higher sample density towards lower end of the scale. As shown in Figure 2 which indicates probability distribution function for the pseudo-Pareto distribution where there are no values between 12 and 16, or 16 and 26.

Generating distribution function by aggregating samples over some window size gives a plot closer to the one shown in Figure 1 where distance between two neighboring points at the tail end will exceed the window size irrespective of the size of selected window. As a result some windows will contain zero samples even if number of samples approach infinity and this becomes the source of error in the mean values of () and (). This error is minimized by multiplying each of the calculated values of and by coefficient “C” [14], where and thus

(23)
Figure 2. Probability density function for the pseudo-Pareto distribution

Traffic generated using equation (22) shall have limited utility and therefore it is needed to consider the greater probability of having large period than occurrence of period to generate realistic traffic conditions. This is achieved by selecting the shape parameter larger than along with use of aforesaid heuristic coefficient “C” to generate traffic load very close to the specified conditions for all combinations of , and . The of the traffic is used to determine theoretical by selecting appropriate value of such that it matches to the observed value of .

The Pareto traffic generator computes the next idle time using the relation

(24)

The next burst length is computed by the Pareto traffic generator using the relation

(25)

3. Analytical Results

The minimum next length () has been taken as one cell length and , and as traffic parameters. When the source places the data cells and resource management cells into an input of the switch, the RM cells are inserted after number of data cells or whenever there is any BRM cell in the output of the Multiplexer. The ACR is adjusted according to the and values in cells using RRM algorithm. Selecting S (smallest non zero value Mersenne Twister (MT) [15]) =, = 0.3, =149.76 Mbps, = 53*8 bits, =1.05 and =1.15 and =1, the corresponding values of and “Coef” would be=1.069337, =1.044063, and = 13.6969, =7.2419 respectively and thus = using equation (23). Figure 3 shows and periods and their corresponding values for Pareto and uniform distribution random variables as function of 1000 count values.

The analytical results of and periods of Pareto traffic bursts for 1000 count values of shown in Figure 3 and the corresponding computed values of mean, variance, maximum and minimum values for and periods are given in Table 1. It is seen from Table 1 that the minimum values of and periods are “3.4199” and “1.0016” respectively which are greater than their corresponding values of and respectively.

Variation of and periods as functions of and for 100 count values of S are shown in Figure 4 and Figure 5 respectively. The increment steps for (1.05 - 1.99) and (1.15 - 1.99) are 0.094 and 0.084 for between 1 and 2. Referring Figure 4 and Figure 5 it can be concluded that the shape parameter should be selected between 1 and 1.5 for simulation of real bursty traffic because it offers higher peak values of and periods. This is further supported by the observation that for in the range 1.5 to 2.0, the peak values of and periods have relatively lower variation indicating smoother traffic.

Figure 3. ParetoON/ParetoOFF traffic generator for 1000 count values of S

4. Simulation Results

Bursty traffic network simulation considering Pareto traffic distribution (S = , = 0.3, = 53*8 bits, =149.76 Mbps, =1.05 and =1.15, =1.069337, =1.044063, =1, =7.2419, = 13.6969 and =) was carried out considering six data sources (i =1, 2, --6) sending data at the rate (i =1, 2, --6) between and . The performance of the switch was evaluated with respect to the ACR, cell transfer delay (CTD), switch input / output rate, memory access time, queue length and multiplexer output. The initial value of for sources was taken as 10 whereas the final value was kept between 200 to 700 in incremental steps of 100 for i =1, 2, --6 and taking buffer size=1000 cells, = 200 cells, = 100 cells, and assuming that each source has to send a total of 1000 cells. The variations in , switch input / output rate, memory access time, queue length and CTD as a function of the time are given respectively in Figure 5 - Figure 9. Simulation results for mean, variance, maximum and minimum values of, switch input/output rate, memory access time, queue length and are given in Table 2.

Referring Figure 6 and Table 2, the average for the sources is largely dependent on their respective values and it is more than corresponding values of the average of their and . This is advantageous because the sources achieve higher than the average value of their cell rate. The relative change between the input and output rate in Figure 7 is responsible for the variation in the queue length shown in Figure 8 and thus also the memory access time shown in Figure 9. Initially, when the source data rate reaches a threshold value from zero, the buffer starts filling if the input rate becomes greater than output rate and when buffer is full to a specified level the switch signals the source to start reducing its data rate. Consequently, source data rate reduces and its effect appears at the queue that causes reduction in rate of increase in the queue size. For an input rate smaller than output rate, the buffer starts becoming empty and when queue length reaches its minimum value, the source is signaled to start increasing its data rate. There is a time lag between the switch experiencing a traffic load variation, effect of switch feedback control, and the occurrence of the new load due to the feedback.

Referring the Figure 10 and Table 2 it is noticed that time delay between any source and its corresponding destination changes from very small value (close to zero) to a maximum of 0.87 sec. The average value of is 0.375 sec. Further, it can be observed from Figure 8 that minimum, maximum and average values of memory access time are 0.09, 092 and 0.2 sec. respectively and their corresponding values in terms of number of cells are “11”, “562” and “119.03” cells as shown in Figure 9. The simulation results indicating the performance of the switch under Pareto burst distribution of input data traffic are given in Figure 6-Figure 10.

Table 2. MEAN, VARIANCE, MAXIMUM AND MINIMUM VALUES OF ACR, SWITCH INPUT/OUTPUT RATE, MEMORY ACCESS TIME, QUEUE LENGTH and CTD

5. Conclusion

This paper described realistic modeling of source data traffic bursts using power tail Pareto distribution for network traffic. The mathematical model of the traffic burst has been developed and the values of and burst periods have been computed. The effect of shape parameter “” of the Pareto distribution on burst and periods for 1000 count values of S is investigated. The network performance of the switch has been evaluated through simulation studies and simulation results for ACR, switch input / output rate, memory access time, memory size and CTD are presented.

It shown that the lower value of “” in the rage 1.0 to 1.5 should be selected for simulation of real traffic bursts as it gives higher variation in peak values of and periods which is needed for exhibiting bursty nature of the traffic. This is further supported by relatively lower variation of and periods indicating smoother traffic for in the range 1.5 to 2.0.

References

[1]  Shui Feng (auth.), the Poisson-Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors, Springer-Verlag Berlin Heidelberg, 2010.
In article      View Article
 
[2]  Ivan Marsic, Computer Networks - Performance and QOS, Rutgers, 2010.
In article      
 
[3]  Ciprian Tudor, Analysis of Variations for Self-similar Processes: A Stochastic Calculus Approach, Springer International Publishing, 2013.
In article      View Article
 
[4]  Ciprian Tudor, Analysis of Variations for Self-similar Processes: A Stochastic Calculus Approach, Springer International Publishing, 2013.
In article      View Article
 
[5]  Bahman Zohuri, Dimensional Analysis and Self-Similarity Methods for Engineers and Scientists, Springer International Publishing, 2015.
In article      View Article
 
[6]  Sergey Foss, Dmitry Korshunov, Stan Zachary, An introduction to heavy-tailed and subexponential distributions, Springer New York, 2011.
In article      View Article
 
[7]  Sergey Foss, Dmitry Korshunov, Stan Zachary, An Introduction to Heavy-Tailed and Subexponential Distributions [2 ed.], Springer-Verlag New York, 2013.
In article      View Article
 
[8]  Floyd and V. Paxon. Wide-Area Traffic: The failure of Poisson Modeling. in Proc. ACM Sigcomm; London, UK, 1994, 257-268.
In article      
 
[9]  E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson. On the Self-Similar Nature of Ethernet Traffic (Extended Version). IEEE/ACM Trans. Networking; vol. 2, no. 1, 1-15, 1994.
In article      View Article
 
[10]  Walter Willinger, Murad S. Taqqu, Robert Sherman, and Daniel V. Wilson. Self-similarity Through High-variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level. IEEE/ACM Trans. Networking; 1997, 71-86.
In article      View Article
 
[11]  N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous univariate distributions. 2nd edition, Wiley: New York. 1994.
In article      
 
[12]  Barry C Arnold, Pareto Distributions Second Edition, Taylor and Francis, 2015.
In article      
 
[13]  R. Jain, “The Art of Computer Systems Performance Analysis,” John Wiley & Sons, 1991 - page 443, figure 26.2)
In article      
 
[14]  G. Kramer. On Generating Self-Similar Traffic using Pseudo-Pareto Distribution. A Short Tutorial-Like, Network Research Lab; Department of Computer Science – University of California 2000.
In article      
 
[15]  M. Matsumoto and T. Nishimura. Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Trans. Modeling and Computer Simulation; Vol. 8, No. 1, January, 1998, 3-30l.
In article      View Article
 
[16]  John I. McCool, Using the Weibull Distribution: Reliability, Modeling, and Inference, Wiley, 2012.
In article      
 
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn