Coincidences, Goodness of Fit Test and Confidence Interval for Poisson Distribution Parameter via Co...

Victor Nijimbere

American Journal of Applied Mathematics and Statistics

Coincidences, Goodness of Fit Test and Confidence Interval for Poisson Distribution Parameter via Coincidence

Victor Nijimbere

School of Mathematics and Statistics, Carleton University, Ottawa, Canada

Abstract

The probability of the coincidence of some discrete random variables having a Poisson distribution with parameters λ1, λ2, …, λn, and moments are expressed in terms of the hypergeometric function 1Fn or the modified Bessel function of the first kind if n=2. Considering the null hypothesis H0: λ12=….= λn, where θ is some positive constant number, asymptotic approximations of the probability and moments are derived for large θ using the asymptotic expansion of the hypergeometric function 1Fn and that of the modified Bessel function of the first kind if n=2. Further, we show that if the sample mean is a minimum variance unbiased estimator (MVUE) for the parameter λi, then the probability that H0 is true can be approximated by that of a coincidence. In that case, a chi-square χ2 goodness of fit test can be established and a 100(1-α)% confidence interval (CI) for θ can be constructed using the variance of the coincidence (or via coincidence) and the Central Limit Theorem (CLT).

Cite this article:

  • Victor Nijimbere. Coincidences, Goodness of Fit Test and Confidence Interval for Poisson Distribution Parameter via Coincidence. American Journal of Applied Mathematics and Statistics. Vol. 4, No. 6, 2016, pp 185-193. http://pubs.sciepub.com/ajams/4/6/4
  • Nijimbere, Victor. "Coincidences, Goodness of Fit Test and Confidence Interval for Poisson Distribution Parameter via Coincidence." American Journal of Applied Mathematics and Statistics 4.6 (2016): 185-193.
  • Nijimbere, V. (2016). Coincidences, Goodness of Fit Test and Confidence Interval for Poisson Distribution Parameter via Coincidence. American Journal of Applied Mathematics and Statistics, 4(6), 185-193.
  • Nijimbere, Victor. "Coincidences, Goodness of Fit Test and Confidence Interval for Poisson Distribution Parameter via Coincidence." American Journal of Applied Mathematics and Statistics 4, no. 6 (2016): 185-193.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

1. Introduction

A discrete random variablehaving a Poisson distribution with parameter is denoted as , and its probability mass function (p.m.f) [9] is

(1.1)

Poisson distribution is often used for modeling real-life discrete random phenomena and has many applications in economics, business, health care, science, and engineering [3, 5, 7, 13]. For instance, the number of hungry persons entering MacDonald's restaurant, the number of birth, death, marriages, the number of patients arriving at an emergency room, the number of customers who call in call centers by the end of the month to complain about a service problem, and many more follow a Poisson distribution model [3, 5, 7, 13]. The mathematical and statistical analysis throughout this paper may apply to most of the above examples, however, we mainly focus on scenarios of telephone calls in call centers in order the analysis to be comprehended without much difficulties.

Now, let us consider that are Poisson distributed random variables, independent and identically distributed , and Then the joint p.m.f of is obviously given by

(1.2)

In the context of applications, we may think about call centers each receiving an average of telephone calls in an hour. Let the random variable and represents the number of calls received by the center at the time. A coincidence will occur if all the centers receive exactly the same amount of calls

(1.3)

at the time (or in hours) for all Thus the coincidence the event is defined by

(1.4)

And its probability of occurrence is

(1.5)

More coincidences can be defined. For instance, let , such that for all ; and let denotes the set . In addition, let

where , and consider the event

(1.6)

where and the inner union is over all possible combinations such that and for all . Considering once again the call-center scenario, a near coincidence will occur if one center receives exactly one more call than other centers at the time, while the others receive exactly the same number of calls at that time. Setting so that becomes

where . Then a near coincidence is the event corresponding to the event with and and is defined by

(1.7)

Further coincidences can be defined using (1.6). Consider that 3 centers receive exactly 2 more calls than other centers in a given hour, while the others receive exactly the same number of calls. In this case, so that becomes

. Then, the coincidence is the event corresponding to the event with and and is

(1.8)

We may define as many coincidences using (1.6). Is it important? Yes, it is certainly. For instance, understanding (or interpreting) and using such events (coincidences) for inferential and statistical, and other purposes has attracted researchers in many fields such as brain and cognitive sciences, psychology, law, discrete mathematics, physics and many more, obviously probability and statistics [7, 8, 10, 12]. In this paper, we show (later in section 6) that a coincidence such the event given by (1.4) may lead to interesting statistical inference results, as it can be interpreted as the null hypothesis of a statistical hypothesis test.

Next, let us consider the null hypothesis

(1.9)

against the alternative hypothesis

(1.10)

We observe that is a linear function of the time . This makes the test hypothesis a linear function of time. Moreover, if hours for all, see (1.3), then , where . Alternatively, we may consider another statistics test on slopes which is steady in time. In that case, we should consider the null hypothesis

(1.11)

against the alternative hypothesis

(1.12)

Returning to the call-center scenario, if is true almost surely (with probability 1), then the centers become undistinguishable. This means that knowing the average number of calls in one center in a hour, for example , implies we have enough information for all centers. In this case, is a sufficient statistics for [4], and if for instance is large, then a confidence interval (CI) for can readily be obtained using the Central Limit Theorem (CLT) [4]. However taking into account the alternative hypothesis , one has to be careful when constructing a CI for since may be a biased estimator for .

Before we proceed to the main objectives of the paper, we shall first give the definition of the generalized hypergeometric function as it is an important mathematical tool that we are going to use in this paper.

Definition 1. The generalized hypergeometric function is a special function denoted as [1, 11], where and are arbitrary constants. It is given by the power series

(1.13)

where for any complex and integer , with , and is the gamma function.

This paper has indeed two main goals. As a first goal, the probability and moments of , see (1.4), is expressed in terms of the hypergeometric function . Since the properties of this function is well known [1, 11], its asymptotic expansion is used to derive the asymptotic approximations for the probability and moments of the event for large , under the null hypothesis , see (1.9). And the probability , where is given by (1.6), is also expressed in term of , and its asymptotic expression under the null hypothesis is also derived.

The second goal is to establish a chi-square goodness of fit test to examine, see (1.9), via the variance of the coincidence, and to propose a new method to construct a CI for the parameter using the variance of the coincidence.

The paper is organized as following. In section 2, formulas for the probabilities of the events and , and, are obtained in terms of the hypergeometric function and under the null hypothesis given in (1.9). The moments of and its variance are expressed in terms of in section 3. Since the asymptotic expansion of is well known [11] as already mentioned above, it is used to obtain the asymptotic expressions of the probabilities and in section 4 and that of the variance of in section 5. The asymptotic expansions in sections 4 and 5 are valid for . In the case with , the moments are expressed in terms of the modified Bessel functions of the first kind in Appendix A. In section 6, a chi-square test to examine the null hypothesis test in (1.9) is established and a confidence interval for is constructed using the variance of the coincidence obtained in section 5. Important discussion and conclusions are given in section 7.

2. The Probabilities P(C) and P(Cl,m) in Terms of the Hypergeometric Function 1Fn

In this section, the probability of the event and that of the event , andrespectively, are written in terms of the hypergeometric function . The results are summarized in form of Theorems 1 and 2.

Theorem 1. The probability of the coincidence, in (1.4), is given by

(2.14)

Then, under the null hypothesis , see (1.9),

(2.15)

Proof. From (1.5), we have

(2.16)

Hence, under

(2.17)

It is known that, for , can be written in terms of the Modified Bessel of the first kind [7, 14]. Theorem 1 generalizes the results in [7] for any integer . We also would like to point out that the series that Griffths [7] calls remodified Bessel functions are in fact hypergeometric functions [1, 11].

Theorem 2. The probability of any event defined using (1.6) is given by

(2.18)

Then, under the null hypothesis ,

(2.19)

Proof.

(2.20)

where

(2.21)

Hence, under ,

(2.22)

3. Moments and the Variance Associated with the Event C in Terms of the Hypergeometric Function 1Fn

Here, the moments and the variance associated with the coincidence are written in terms of the hypergeometric function . We mainly focus on the coincidence rather than , the reason of doing so will become clear later in section 6. Results are summarized in Lemma 1 and Theorem 2.

Lemma 1. The moment of the coincidence is given by

(3.23)

where .

Proof. The moment associated with the coincidence is given by

(3.24)

Theorem 3. Under , the first order moment (mean) associated with the coincidence is

(3.25)

while the second order moment is

(3.26)

Thus the variance associated with the coincidence is

(3.27)

Proof. Setting in (3.24) in Lemma 1, and using the fact that under , , gives

(3.28)

while setting and taking into account the null hypothesis gives

(3.29)

Hence, using the fact that the variance gives (3.27).

4. Asymptotic Evaluation of P(C) and P(Cl,m) under H0, and for Large θ

In this section, we consider the case with, and derive the asymptotic approximations of the probabilities of and when is large. The case with is considered later in Appendix A. The main reason is that the formulas used here do not work for. The results are summarized in Theorems 4 and 5.

Theorem 4. Under , see (1.9), if , then for large ,

(4.30)

Proof. To prove (4.30), we use formulas (16.11.1), (16.11.3), (16.11.4) and (16.11.9) in [11]. Setting and in (16.11.3) yields and. Substituting and, and in (16.11.1) gives

(4.31)

where is given by formula (16.11.4) in [11]. We now let , use formula (16.11.9) in [11] and obtain

(4.32)

Hence, dropping the terms corresponding to while keeping the leading term corresponding to yields

(4.33)

Theorem 5. Under, see (1.9), if, then for large ,

(4.34)

Although, the proof of Theorem 5 is similar to that of Theorem 4, we prefer to show it here. For instance, one should doubt about (4.34) since the subscript in (2.19) does not appear in its asymptotic approximation (4.34).

Proof. To prove (4.35), we use formulas (16.11.1), (16.11.3), (16.11.4) and (16.11.9) in [11] as before. Setting and in (16.11.3) yields , and

Substituting and , and in (16.11.1) gives

(4.35)

where is given by formula (16.11.4) in [11].

We now let , use formula (16.11.9) in [11] and obtain

(4.36)

Hence, dropping the terms corresponding to while keeping the leading term corresponding to yields

(4.37)

5. Asymptotic Evaluation of Moments of the Coincidence for Large θ, and under H0

For the same reason as in section 4, we also consider that , and as before, we use formulas (16.11.1), (16.11.3), (16.11.4) and (16.11.9) in [11] to obtain asymptotic expressions for the moment and the variance of the coincidence , , which are valid whenis large. Results are given in Theorem 6.

Theorem 6. Under, see (1.9), if, then for large ,

(5.38)

And the variance is asymptotically given by

(5.39)

Proof. To derive an asymptotic expression for valid for large , we set and in (16.11.3) in [11] and obtain, and

Substituting and, and in (16.11.1) gives

(5.40)

where is given by formula (16.11.4) in [11].

Setting in (5.40), taking into account the null hypothesis , and using formula (16.11.9) in [11] as before yields

(5.41)

Hence, dropping the terms corresponding to while keeping the leading term corresponding to yields

(5.42)

And hence,

(5.43)

6. A Chi-square Goodness of Fit Test and Confidence Interval for θ via Coincidence

In this section, we use the fact that, under certain conditions that we are going to mention shortly, the coincidence can be interpreted as the null hypothesis, see (1.9) or (1.11) and thus establish a chi-square goodness of fit test to examine the null hypothesis via the variance of the coincidence , and consequently construct a CI for the parameter if there is no significance evidence to reject . We also consider that is large so that we apply the Central Limit Theorem (CLT). This justifies the derivation of the asymptotic approximation of the variance of the coincidence for large in section 3.

For centers and each receiving a total of calls at the time, we define the overall standard deviation as

(6.44)

where the generalized mean is given by

(6.45)

If , then approximately follows a normal distribution with mean and variance by the CLT [4]. Thus,

(6.46)

After a certain amount of time, hours, and under the null hypothesis ,

(6.47)

Theorem 7. If is a minimum variance unbiased estimator (MVUE) [4] for and for all , and can not be rejected, then the probability that is true is approximately given by

(6.48)

where is given by (2.15) or (4.33) when the parameter

Proof. The proof is straightforward. If is a minimum variance unbiased estimator (MVUE) [4] for and for al l, and the null hypothesis can not be rejected, then

(6.49)

Moreover, if there is no evidence to reject, we shall expect a coincidence to occur after hours, and hence,

(6.50)

where is the variance of and is approximated by (5.39) if is large. In that case,

(6.51)

where

(6.52)

Using (5.39) gives

(6.53)

Thus a Chi-square goodness of fit test can now be carried out as following. The null hypothesis will be rejected if

(6.54)

or (and)

(6.55)

where is the significance level needed to be achieved by the decision maker.

If that is not the case, there is no significant evidence to reject , a confidence interval for the variance of , can eventually be obtained,

(6.56)

and leads to a confidence interval for ,

(6.57)

where and respectively satisfy

(6.58)

and

(6.59)

For and , (6.58) and (6.59) can readily be solve, while for and , (6.58) and (6.59) can readily be solved using basic numerical methods.

For ,

(6.60)

and

(6.61)

For ,

(6.62)
(6.63)

7. Discussion and Conclusion

Having expressed the probability of the coincidence given by (1.4) and that of given by (1.6) and the moments of in terms of the special function for (Theorem 1, Theorem 2 and Theorem 3) on one hand, and in terms of the modified Bessel function of the first kind of order , (Corollary 1) if on another, we also derived their corresponding approximations (Theorem 4, Theorem 5 and Theorem 6).

Griffths [7] defined some coincidences, for example the near coincidence (see (1.7)), and expressed their probabilities in terms of some power series that he named re-modified Bessel functions. Here we have shown in Theorem 1 that these series are indeed hypergeometric functions for any event that can be defined using (1.6).

We also found that if is a minimum variance unbiased estimator (MVUE), for [4] and for all and there is no evidence to reject the null hypothesis, where is some positive number, then the probability that is true can be approximated by that of the coincidence, (Theorem 7). One may understand this fact this way. If it happens that is true almost surely then . This scenario is no longer a coincidence because the centers become undistinguishable. But since always, then we shall be able to distinguish one center to another.

Moreover, if the probability that is true almost equals the probability of the coincidence , and the null hypothesis can not be rejected, then the variance of is that of the coincidence . In that case, with a test as established in section 6, if there is no evidence to reject , one can readily construct a confidence interval (CI) for the parameter (section 6 and appendix A) using the variance of the coincidence (or via coincidence).

The results obtained in this paper can be used, for example, to achieve better results in telephone traffic measurements [2].

A. A Chi-square Goodness of Fit Test and 100(1-α)%CI for θ via Coincidence for n=2

In sections 4 and 5, we considered that the number of call centers is greater or equals 3. As mentioned before, the main reason is that the asymptotic expansion of the hypergeometric functions for in sections 4 and 5 is valid when. Here, we consider that , write the first and second moments and in terms of the modified Bessel function of the first kind, and use the asymptotic expansion of the modified Bessel function [1] to derive the asymptotic approximations for and valid for . A goodness of fit test is established and hence a CI for the parameter is obtained if the null hypothesis is not rejected.

Definition 2. The modified Bessel function of the first kind of order , , is the series (formula 9.6.10 in [1])

(A.64)

where is some number.

Corollary 1. For , the first and second order moments of the coincidence , see Theorem 3, are

(A.65)

and

(A.66)

respectively.

Thus the variance of the coincidence is

(A.67)

Proof. Setting in Theorem 3 yields

(A.68)

Rearranging terms, setting and in (A.64) gives

(A.69)

On the other hand,

(A.70)

Rearranging terms, setting and in (A.64) gives

(A.71)

Hence, the variance of the coincidence is

(A.72)

Theorem 8. If , then for , the variance of the coincidence , , is asymptotically given by

(A.73)

Proof. The asymptotic approximation of (formula 9.6.10 in [1]), is

(A.74)

for any . Then,

(A.75)

and

(A.76)

Hence,

(A.77)

Following section 6, a chi-square goodness of fit test can be conducted as following.

The null hypothesis will be rejected if

(A.78)

or (and)

(A.79)

where, as before, is the significance level needed to be achieved by the decision maker.

If there is no evidence to reject the null hypothesis, a CI for is thus given by

(A.80)

where and respectively satisfy

(A.81)

and

(A.82)

References

[1]  M. Abramowitz, I.A. Stegun, Handbook of mathematical functions with formulas, graphs and mathematical tables, Nat. Bur. Stands, 1964.
In article      
 
[2]  Andersen, B., Hansen, N.H.., Iversen, V.B., Use of minicomputer for telephone traffic measurements, Teletenik (Engl. Ed.), 15, 33-46, 1971.
In article      
 
[3]  D. Bear, Principles of telecommunication traffic engineering, 3rd Edition, Peter Peregrinus Ltd, 1988.
In article      
 
[4]  G. Casella, R.L. Berger, Statistical inference, 2nd Edition, Duxbury, 2001.
In article      
 
[5]  D. Doane, L. Seward, Applied statistics in business and economics, 3rd Edition, Macgraw-Hill, 2010.
In article      
 
[6]  R.A. Donnelly Jr., Business statistics, Pearson Education inc., Upper Saddle River New Jersey, 2012.
In article      
 
[7]  Griffths, M., Remodified Bessel functions via coincidences and near coincidences, J. Integer Seq., 14, 1-10, 2011.
In article      
 
[8]  Griffths, T.L., Tenenbaum, J.B., From mere coincidences to meaningful discoveries, Cognition, 103 (2), 180-226. May 2007.
In article      View Article  PubMed
 
[9]  G. Grimmett, D. Stirzaker, Probability and random processes, 3rd Edition, Oxford University Press, 2001.
In article      
 
[10]  Nemenman, E., Coincidences and estimetion of entropies of random variables with large cardinalities, Entropy, 13(12), 2013-2023. Dec. 2011.
In article      View Article
 
[11]  NIST Digital Library of Mathematical Functions. [Online]. Available: http://dlmf.nist.gov/16.11. [July15, 2016].
In article      
 
[12]  Sullivan, S.P., Probative inference from phenomenal coincidence demistifying the doctrine of chances, Law, Probability and Risk, 14(1), 27-50, 2015.
In article      View Article
 
[13]  R.M. Weiers, Introduction to business statistics, Cengage Learning, Mason Ohio, 1988.
In article      
 
[14]  Skellam distribution, Wikipedia, The free encyclopedia, 2016. [Online] Available: http://en.wikipedia.org/wiki/Skellam_distribution. [June 15, 2016]
In article      
 
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn