Article Versions
Export Article
Cite this article
  • Normal Style
  • MLA Style
  • APA Style
  • Chicago Style
Research Article
Open Access Peer-reviewed

Parallelism of Test Items: Estimating the Means (µ), Variances (σ2) and Covariances (Cσ2) of Alternate Test Forms

Simon Ntumi , Sheilla Agbenyo, Tapela Bulala
International Journal of Data Envelopment Analysis and *Operations Research*. 2022, 3(1), 1-7. DOI: 10.12691/ijdeaor-3-1-1
Received July 24, 2021; Revised November 11, 2021; Accepted February 23, 2022

Abstract

Background: Within the space of classical test theory (CTT), alternate test forms are needed so that they can be applied to different groups or at different testing occasions. This CTT theoretical assumption urged the researchers to construct alternate test forms and estimate their parameters (µ, σ2 and Cσ2). Methods: To obtain the parameter estimates (µ, σ2 and Cσ2), three (3) alternate test forms (X1, X2 and X3) were carefully constructed and administrated to fifty-eight (58) business students at University Practice Senior High School in the Cape Coast metropolis, Ghana. One psychological test scale (DASS21) was also adopted as the form Y. The tests were administered to the students under suitable and conductive examination conditions and this ensured validity and reliability of the scores. Findings: After the statistical estimations, the study found that mean parameter of the four forms (X1, X2, X3 and Form Y) were unequal (µX1 ≠µX2 ≠µX3 ≠ µY). That is X1 (µ=7.23, n=58), X2 (µ=7.14, n=58), X3 (µ= 8.01, n=58) and Form Y-DASS21 (µ=7.92, n=58) p (0.306, CI95%) > 0.05. On the variance parameter, similar results were accrued as the test forms are not equal in their variances (σ2X1X2≠σ2X1X3≠σ2X2 X3≠σ2Y). This was reported as X1 (σ2 =6.120, n=58), X2 (σ2=9.007, n=58), X3 (σ2=8.040, n=58) and Form “Y” DASS21 recorded a variance of (σ2=8.034, n=58) (p-value 0.121>0.05). Finally, on the covariance parameter, we found that the test forms were not equal (Cσ2X1Y≠Cσ2X2 Y≠Cσ2X3Y). The result is reported as (X1= Cσ2 =5.338, n=58, p= 0.846), (X2= Cσ2=6.023, n=58, p= 0.831) (X3= Cσ2=7.898, n=58, p= 0.783). Conclusions: The study concluded that the constructed alternate test forms met the congeneric parallelism conditions. The estimated parameters were similar in content, where the µ, σ2 and Cσ2 were similar across all the test forms (X1, X2, X3 and Form Y).

1. Introduction

Within the trajectory of classroom assessment, testing is known to be a multi-faceted and intricate field in which right decision-making is very complicated 1, 2. Clearly, it is believed that in order for any evaluation to be reliable and valid, a number of considerations should be taken into consideration 3, 4, 5. In fact, classroom assessment and evaluation usually lead into making decisions about individuals and situations; therefore, several consequences will follow as a result of the decisions. Some of these consequences are social or psychological, affecting individual’s motivation, goal, and even social status 5, 6. Precisely, because of the importance given to test scores in our society, any mistake that may emerge from the test can have serious consequences in educational decision making. Classical true score theory is a simple model that describes how errors of measurement can influence observed score 7, 8, 9.

In the work of 10, parallel forms are seen as tests that are different subsets of the same universe of items, which capture the same attribute with the same accuracy. As a measurement model for the scores of parallel forms, parallel measures are assumed, so that the correlation between the scores, i.e., the parallel-forms reliability or parallel-test reliability matches the reliability of both forms. Parallel test construction has been of interest and is of interest to test developers 11, 12.

Within the assumptions of CTT, test parameters which includes reliability, means, variances and covariances are central issues in testing can only be estimated from parallel tests 13. Any time test developers talk of alternate test, what comes to mind is parallel. If parallel tests that are statistically equivalent to the original test can be developed, validities established for the original measures should also be applicable to the parallel measures for purposes of establishing the job-relevance and legal defensibility of the tests 14, 15.

Reading the works of 16, 17 it is asserted that construction of a parallel test is not assured at the beginning of the test construction. This means that at the beginning of the test construction, the test developer can only consider them as alternate forms. Similarly, 18 stated that a careful consideration of item-by-item parallelism during development results in alternate forms that are parallel at the item level. Creating alternate forms that are parallel is what is termed as parallelism by 19. Also, 10 pointed out that alternate test forms should consist of the same general and group factors in order to be considered as measures of the same construct(s), and that these tests should have equal true score means, standard deviations, and item intercorrelations.

In furtherance to the above 11 asserted that two or more alternate forms are said to be parallel depending on the closeness of the means, variances, covariance, content, true score consistency and the covariance of one form and other test. This means that parallel test is a matter of degree however, some forms of alternate forms may be more parallel than the other 12.

1.1. Classical Parallel Forms or Parts

The parallel model is the most restrictive measurement model for use in defining the composite true score. In addition to requiring that all test items measure a single latent variable (unidimensionality), the parallel model assumes that all test items are exactly equivalent to one another. All items must measure the same latent variable, on the same scale, with the same degree of precision, and with the same amount of error 13, 14. All item true scores are assumed to be equal, and all error scores are likewise equal across items. For this form of parallel forms, there is content similarity, true score consistency, the means, variances, covariance of the forms and the covariance between one test form and any other test are all equal. This is what 15 classified as a true classical parallel test.

1.2. Essentially Classical Parallel Forms or Parts

This form of parallelism deviates slightly from the classical parallel form in that, the means are not equal, and the true scores are not consistent. There is some sort of error margin in the true score either negative or positive, but all other features are same as the classical parallel. That is there is content similarity, variances, covariance of the forms and the covariance between one test form and any other test are all equal.

1.3. Tau-Equivalents Forms or Parts

The tau-equivalent model is identical to the more restrictive parallel model, save that individual item error variances are freed to differ from one another. This implies that individual items measure the same latent variable on the same scale with the same degree of precision, but with possibly different amounts of error 11, 12. All variance unique to a specific item is therefore assumed to be the result of error. The tau-equivalent model implies that although all item true scores are equal, each item has unique error terms This form also deviates from the classical parallel form to some degree. Here with the exception of the variances which are not equal, all other things are the same as the parallel forms.

1.4. Essentially Tau-Equivalent Forms or Parts

The essentially tau-equivalent model is, as its name implies, essentially the same as the tau-equivalent model. Essential tau-equivalence assumes that each item measures the same latent variable, on the same scale, but with possibly different degrees of precision 12, 20. Again, as with the tau-equivalent model, the essentially tau-equivalent model allows for possibly different error variances. The difference between item precision and scale is an important distinction to make. Whereas tau-equivalence assumes that item true scores are equal across items, the essentially tau-equivalent model allows each item true score to differ by an additive constant unique to each pair of variables 10, 13, 15. This form of parallelism deviates to large degree from the classical parallel form which is considered as a true parallel form. For this form, the variance and mean are not equal; the true score is not consistent. However, there is content similarity, the covariances of the forms are equal and the covariance between one test form and any other test are all equal.

1.5. Congeneric Forms or Parts

The congeneric model is the least restrictive, most general model of use for reliability estimation. The congeneric model assumes that each individual item measures the same latent variable, with possibly different scales, with possibly different degrees of precision, and with possibly different amounts of error 9, 11, 17. Whereas the essentially tau-equivalent model allows item true scores to differ by only an additive constant, the congeneric model assumes a linear relationship between item true scores, allowing for both an additive and a multiplicative constant between each pair of item true scores 9, 10.

Essentially, examinees who take credentialing tests and other types of high-stakes assessments are usually provided an opportunity to repeat the test if they are unsuccessful on initial attempts. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign an alternate form to repeat examinees. This appears to be missing in most Ghanaian classrooms where classroom teachers are only conversant with constructing only one test form.

Clearly, one of the key basis of classical test theory is to expose students, researchers, test developers etc. on how they can effectively use and classroom assessment principles. This is to help test users use statistical techniques and applied them to improve classroom assessment practices. Against this backdrop, we undertook this study to practically handle different forms of test by estimating their means (µ), variances (σ2) and covariances (Cσ2). The core purpose of this study was to estimate the means (µ), variances (σ2) and covariances (Cσ2) of alternate forms. The rationale was to find out the condition at which the test forms could meet degrees of parallelism.

2. Methods

2.1. Tests Construction and Administration Process

In our quest to obtain data for the study, we carefully and extensively constructed three (3) alternate test items (that is: X1, X2 and X3) and one adapted psychological test (that is Y). We administered the test items forms (X1, X2, X3 and Form Y) with one of the Secondary Schools in the Cape Coast Metropolis (University Practice Senior High School). The three (3) alternate forms of core mathematics test and the one psychological test scale (DASS 21) used were administered to the students. To respond to the test items, fifty-eight (58) business students were selected for the study. The tests were administered under suitable and conductive examination conditions. These conditions were put in place to improve the validity and reliability of the test.

2.2. Tests Specification Process

To ensure content similarities of the test forms, the Head of Department of the mathematics department of the school was contacted to know the topics/contents that students have covered. Also, some of the teachers of the form classes were contacted to confirm the topics covered and the favourable time to conduct the test. With the topics, a table of specification was prepared to ensure the construction of the items to cover all the topics and at appropriate level of thinking. This was done to serve as guide in the construction of the three alternate forms of test. Again, test specification which is made of item specifications was prepared for consistency and similarity of the test items for the three forms (X1, X2, and X3). These processes guaranteed and ensured some level of content similarity of the alternate forms.

2.3. Composition of the Test Items on the Specification

The items were constructed using test specifications that covered seven (7) general topics in core mathematics (these include: sets and operation of sets, real number system, mapping, relations and functions, linear equations and inequalities, algebraic expressions, number bases and plane geometry). In constructing the items, most of the items (n=11) measured knowledge aspect of the students, those that measured comprehension followed (n=06). Those that measured application were the least (n=03). The items did not cover up to analysis, synthesis and evaluation aspect of the taxonomy. This was based on the assumption of Scully (2017) who asserted that most multiple-choice items are suitable in measuring lower order thinking of learners.

We further described the item specifications content, objectives and description of the test items. Here, we defined the content of the item thus, where the item was found in the syllabus. To inform our test takers, we also spelt out clearly the objective for constructing the test items which and explained the rationale for constructing item. Finally, we also provided a vivid description of the items by setting out our expectations. Example of how the items were constructed is specified in Table 1, Table 2 and Table 3. Before responding to the questions, we provided instructions for the students. The instructions stated that there are 20 set objectives with four options (A, B, C and D) for each the items. You are required to respond to all the questions by circling out the correct respond.

2.4. Data Analysis

The analysis focused on the parameters that is; means, variances and covariances of the obtained alternate forms. In our quest to estimate these parameters of parallelism, standard deviation, F-values, correlations (relationship among test forms) and p-values were also reported. These values were accompanied with interpretations and discussions therefore. To estimate these key parameters, the administered and scored tests were analysed using descriptive and inferential statistics with the help of Statistical Package for Social Sciences (SPSS) v.25 software.

3. Findings

This aspect of the study reported the findings that emerged from the data administration and scoring. The findings are based on the estimated parameters.

Estimating Mean Parameter of the Test Forms (X1, X2, X3 and Form Y)

The mean is the average or the most common value in a collection of numbers. In statistics, it is a measure of central tendency of a probability distribution along median and mode. It is also referred to as an expected value. Our first task was to find out whether the mean scores of the test forms. To determine whether the means of the alternate forms are equal, we performed a descriptive analysis of the means scores on each form. If item 1 of form is realised to be faulty hence has to be bonus, scores of items 1 of all forms were excluded from the scores for the analysed. This is because bonus questions positively affect the scores of that form than the others. The result of the means is presented in Table 4.

The result in Table 4 shows the mean and standard deviation of X1 (µ=7.23, SD= 2.643), X2 (µ=7.14, SD=3.023), X3 (µ= 8.01, SD=3.195) and Form Y-DASS21 (µ=7.92, SD=3.232). As depicted, the results show that the means are very close and the standard deviations are approximately 3 for all the test forms. This shows the closeness or similarity in response of the students on the four (4) forms of the tests. The closeness of the means is confirmed by the F-value of .306 and sig. value of p=0.736 (CI95%> 0.05) which implies that the mean values are not statistically significant suggesting that there are no differences in the mean scores of the students. In essence, the results from Table 4 showed that the means of the three forms of test and the Form “Y” (DASS21) were unequal or not the same (µX1 ≠µX2 ≠µX3 ≠ µY….).

Estimating the Variance Parameter of the Test Forms (X1, X2, X3 and Form Y)

The term variance refers to a statistical measurement of the spread between numbers in a data set. More specifically, variance measures how far each number in the set is from the mean and thus from every other number in the set. Variance is often depicted by the symbol σ2. In this paper, one determinant of parallelism is to estimate the variance parameter. Our task was to find the test for the equality of the variances among all the obtained test forms. The result is presented in Table 5.

The variances of the scores of the alternate forms are presented in Table 5. The result is rpoerted as the that a variance of X1 (σ2=6.120, n=58), X2 (σ2=9.007, n=58), X3 (σ2 =8.040, n=58) and Form “Y” DASS21 recorded a Variance of (σ2=8.034, n=58). The accrued results suggest that even though the variances of the forms are not the same, however, the differences are not so much large. Viewing the results of the Levene statistics of test of homogeneity of the variance, the results show that the variances are assumed equal statistically across the forms (p=0.121 >0.05, CI95%). In essences, it is therefore evident that the variance parameters among the test forms were not equal (σX1X2≠σX1 X3≠σX2 X3≠σY).

Estimating the Covariance Parameter of the Test Forms (X1, X2, X3 and Form Y)

Covariance measures the total variation of two random variables from their expected values. Using covariance, one can only gauge the direction of the relationship (whether the variables tend to move in tandem or show an inverse relationship). In this paper, one of the core objectives was to estimate the covariance parameter of the test forms. That is, we wanted to find out if the covariances are equal among the test forms. To achieve this, the scripts were coded so that scores on all four (4) forms could be entered for a particular student. The scores were entered without treating the test as a factor. This made it possible to estimate the correlation between each pair. The obtained results are presented in Table 6.

The result in Table 6 shows that the covariance of X1 and X2 was (Cσ2=5.338, n=58) with correlation of 0. 846. For X1 and X3, the covariance was recorded as (Cσ2= 6.023, n=58) with correlation of 0.831 and that of X1 and Form Y was recorded as (Cσ2=7.898, n=58) with correlation of 0.783, all being significant at p < 0.01 (CI95%). The results suggest that the covariances are not the same for all the test forms. In other words, there are differences in the covariances of the test forms (Cσ2X1Y≠ Cσ2X2 Y≠ Cσ2X3 Y≠). The overall results suggest that we could not obtain equal parameters for the test items. Therefore, our parallelism of test items could only meet the congeneric model condition where the test forms had similar content and are only alternative to each other.

4. Discussion

The ensued findings from the study gave ample evidence to settle that, the multidimensionality of items as well as of the test form itself makes it difficult to obtain equal parameters that is same means (µ), variances (σ2) and covariances (Cσ2) of alternate forms. The results suggest that practically it is highly impossible to generate a pool of items that adequately represents the content of the test and have same difficulty level. The results generated in this study suggest that the parallelism was congeneric model. This model assumes that each individual item measures the same latent variable, with possibly different scales, with possibly different degrees of precision, and with possibly different amounts of error 3, 8, 16, 18.

The findings give reasons to agree with the assertions of classical scholars such as 4, 5 who asserted that it may be impossible to achieve such a goal according to the classical test theory (CTT)-based definition of parallel tests, in which true scores and variances of observed test scores across forms must be identical for any possible subpopulation of examinees. When CTT conditions for parallelism are not strictly met, post-administration equating and passing score determination is adapted to adjust for differences among test forms so that it makes no difference which form an examinee takes

Relatedly, it is asserted by previous authors that using an item bank to construct parallel forms of multidimensional measures also creates problems because often the constructs making up the content of these measures are not well understood, and thus it is difficult to replicate the original test, or even to attempt to create separate item banks for different content areas because content domains are difficult to identify. If a single item bank is used, the number of items needed to create a pool representing the content of the original measure would be so large that resulting alternate forms would tend to have an unstable content structure 2, 13.

The essence of the study lends strong support claims that alternate test forms are designed to avoid or reduce content- or item-specific practice effects that are associated with repeated administrations of the same neuropsychological test(s) 16, 17. Relatedly, examination manuals for many intellectual and neuropsychological tests illustrate that practice effects are common, especially over brief retest intervals (e.g., days or weeks) and this could lead to unequal parameters of the test items. Our results lend ample evidence to the work of 19 who indicated in any test construction, alternate test forms should include the same number of items, and the items should be of equivalent and have the same content though it may be difficult to obtain the same parameters (equal means, variances and covariances).

From the results accumulated from the study, we could infer that parallel test development procedures are complicated, however, by characteristics of the original test, test developers, classroom teachers etc. could easily construct similar test items to measure the similar constructs or traits of the test takers 2, 11. However, we must reiterate that if the test or items are multidimensional, it is difficult to construct an alternate form which is parallel in content using more traditional development procedures (sampling similar test items from a general content item pool). In response to these issues, scholars have proposed that test developers must be guided by subject content to produce alternate test forms which are parallel in terms of means, standard deviation, and factor structures 20.

5. Concluding Remarks

For classroom and standardized testing purposes, this paper exposes students, classroom teachers and researchers to how parameter of alternate test items that is means (µ), variances (σ2) and covariances (Cσ2) could be estimated. It is observed that out of the conditions necessary for parallelism, the paper only met the congeneric model condition where the alternate forms had similar content but different parameter estimates (X1≠X2≠X3≠Y). From the study, it is instructive for classroom teachers and assessment practitioners to note that constructing test items with similar contents is very beneficial. This is because it helps in measuring a construct of interest and to others who are attempting to preserve the validities (and the legal defensibility) of the original form of the test. Therefore, classroom teachers and standardized testing companies must give keen interest in constructing alternate test forms.

Abbreviations

CV: Covariances; CTT: Classical Test Theory; CI: Confidence Interval; DASS 21: Depression, Anxiety and Stress Scale - 21 Items; SPSS: Statistical Package for Social Sciences

Acknowledgements

The authors would like to thank their anonymous reviewers for their academic stimulation and constructive criticism throughout the development of the paper. We are again grateful to our colleague senior lecturers in educational measurement and evaluation for their input in the paper.

Conflicts of Interest

No conflict of interest exists. Clearly, we wish to confirm that there are no known conflicts of interest associated with this publication, and there has been no significant financial support for this work that could have influenced its outcome.

Consent for Publication

Not applicable.

Ethics Approval and Consent to Participate

Not applicable.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Authors’ Contributions

SN1 conceived the study, drafted methodology and performed all the analysis and drew conclusions. SA2 drafted the introduction of the study. TB3 discussed the paper. All the authors (SN1, SA2, TP3) reviewed multiple drafts and proposed additions and modifications. SN had the final responsibility to submit the paper. All the authors read and approved the final manuscript.

Funding

No funding was received for this study.

References

[1]  Clause, C.A., Mullins, M. E., Nee, M. T. Pulakos, E. & Schmitt, N. (2016). Parallel test form development: A procedure for alternate predictors and an example. Personnel Psychology, 6(51), 1-287.
In article      View Article
 
[2]  Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12(1), 1-16
In article      View Article  PubMed
 
[3]  Drasgow, F. (2016). Technology and testing: Improving educational and psychological measurement. New York: Routledge.
In article      
 
[4]  Gierl, M., Daniels, L., & Zhang, X. (2017). Creating parallel forms to support on-demand testing for undergraduate students in psychology. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 288-302.
In article      View Article
 
[5]  Hilger, N., & Beauducel, A. (2017). Parallel-forms reliability. In Encyclopedia of Personality and Individual Differences (pp. 1-3). Springer, Cham.
In article      View Article
 
[6]  Kowalski, I. M., Protasiewicz-Fałdowska, H., Dwornik, M., Pierożyński, B., Raistenskis, J., & Kiebzak, W. (2014). Objective parallel-forms reliability assessment of 3-dimension real time body posture screening tests. BMC Pediatrics, 14(1), 1-8.
In article      View Article  PubMed
 
[7]  Lord, F. M & Novick, R. M. (2000). Statistical theories of mental test scores. Educational testing services: New York University.
In article      
 
[8]  Lovibond, S.H. & Lovibond, P.F. (2014). Manual for the depression anxiety & stress scales. (2nd Ed.) Sydney: Psychology Foundation.
In article      
 
[9]  Luecht, R. M. (2016). Computer-based test delivery models, data, and operational implementation issues. In F. Drasgow (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 179-205). New York: Routledge.
In article      
 
[10]  Miller, J., & Ulrich, R. (2003). Simple reaction time and statistical facilitation: A parallel grains model. Cognitive Psychology, 46(2), 101-151.
In article      View Article
 
[11]  Raykov, T. (2015). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173-184.
In article      View Article
 
[12]  Raykov, T., Patelis, T., & Marcoulides, G. A. (2011). Examining parallelism of sets of psychometric measures using latent variable modeling. Educational and Psychological Measurement, 71(6), 1047-1064.
In article      View Article
 
[13]  Scully, D. (2017). Constructing multiple-choice items to measure higher-order thinking. Practical Assessment, Research & Evaluation, 22(4), 4-13.
In article      
 
[14]  Sharma, P., Dunn, R. L., Wei, J. T., Montie, J. E., & Gilbert, S. M. (2016). Evaluation of point-of-care PRO assessment in clinic settings: integration, parallel-forms reliability, and patient acceptability of electronic QOL measures during clinic visits. Quality of Life Research, 25(3), 575-583.
In article      View Article  PubMed
 
[15]  Singhal, S. P., & Sridevi, M. (2019). Comparative study of performance of parallel Alpha Beta Pruning for different architectures. In 2019 IEEE 9th International Conference on Advanced Computing (IACC) (pp. 115-119). IEEE.
In article      View Article
 
[16]  Sireci, S., & Zenisky, A. (2016). Computerized innovative item formats: Achievement and credentialing. In S. Lane, M. Raymond, & T. Haladyna (Eds.), handbook of test development (2nd ed., 313-334). New York: Routledge.
In article      
 
[17]  Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60(2), 174-195.
In article      View Article
 
[18]  Wolfinger, R. D. (2014). Heterogeneous variance: covariance structures for repeated measures. Journal of Agricultural, Biological, And Environmental Statistics, 8(7), 205-230.
In article      
 
[19]  Wu, S. L., Tio, Y. P., & Ortega, L. (2021). Elicited imitation as a measure of L2 proficiency: New insights from a comparison of two L2 English parallel forms. Studies in Second Language Acquisition, 8(7), 1-30.
In article      View Article
 
[20]  Yarnold, P. R. (2014). How to Assess the Inter-Method (Parallel-Forms) Reliability of Ratings Made on Ordinal Scales: Emergency Severity Index (Version 3) and Canadian Triage Acuity Scale. Optimal Data Analysis, 3(4), 50-54.
In article      
 

Published with license by Science and Education Publishing, Copyright © 2022 Simon Ntumi, Sheilla Agbenyo and Tapela Bulala

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/

Cite this article:

Normal Style
Simon Ntumi, Sheilla Agbenyo, Tapela Bulala. Parallelism of Test Items: Estimating the Means (µ), Variances (σ2) and Covariances (Cσ2) of Alternate Test Forms. International Journal of Data Envelopment Analysis and *Operations Research*. Vol. 3, No. 1, 2022, pp 1-7. https://pubs.sciepub.com/ijdeaor/3/1/1
MLA Style
Ntumi, Simon, Sheilla Agbenyo, and Tapela Bulala. "Parallelism of Test Items: Estimating the Means (µ), Variances (σ2) and Covariances (Cσ2) of Alternate Test Forms." International Journal of Data Envelopment Analysis and *Operations Research* 3.1 (2022): 1-7.
APA Style
Ntumi, S. , Agbenyo, S. , & Bulala, T. (2022). Parallelism of Test Items: Estimating the Means (µ), Variances (σ2) and Covariances (Cσ2) of Alternate Test Forms. International Journal of Data Envelopment Analysis and *Operations Research*, 3(1), 1-7.
Chicago Style
Ntumi, Simon, Sheilla Agbenyo, and Tapela Bulala. "Parallelism of Test Items: Estimating the Means (µ), Variances (σ2) and Covariances (Cσ2) of Alternate Test Forms." International Journal of Data Envelopment Analysis and *Operations Research* 3, no. 1 (2022): 1-7.
Share
[1]  Clause, C.A., Mullins, M. E., Nee, M. T. Pulakos, E. & Schmitt, N. (2016). Parallel test form development: A procedure for alternate predictors and an example. Personnel Psychology, 6(51), 1-287.
In article      View Article
 
[2]  Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12(1), 1-16
In article      View Article  PubMed
 
[3]  Drasgow, F. (2016). Technology and testing: Improving educational and psychological measurement. New York: Routledge.
In article      
 
[4]  Gierl, M., Daniels, L., & Zhang, X. (2017). Creating parallel forms to support on-demand testing for undergraduate students in psychology. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 288-302.
In article      View Article
 
[5]  Hilger, N., & Beauducel, A. (2017). Parallel-forms reliability. In Encyclopedia of Personality and Individual Differences (pp. 1-3). Springer, Cham.
In article      View Article
 
[6]  Kowalski, I. M., Protasiewicz-Fałdowska, H., Dwornik, M., Pierożyński, B., Raistenskis, J., & Kiebzak, W. (2014). Objective parallel-forms reliability assessment of 3-dimension real time body posture screening tests. BMC Pediatrics, 14(1), 1-8.
In article      View Article  PubMed
 
[7]  Lord, F. M & Novick, R. M. (2000). Statistical theories of mental test scores. Educational testing services: New York University.
In article      
 
[8]  Lovibond, S.H. & Lovibond, P.F. (2014). Manual for the depression anxiety & stress scales. (2nd Ed.) Sydney: Psychology Foundation.
In article      
 
[9]  Luecht, R. M. (2016). Computer-based test delivery models, data, and operational implementation issues. In F. Drasgow (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 179-205). New York: Routledge.
In article      
 
[10]  Miller, J., & Ulrich, R. (2003). Simple reaction time and statistical facilitation: A parallel grains model. Cognitive Psychology, 46(2), 101-151.
In article      View Article
 
[11]  Raykov, T. (2015). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173-184.
In article      View Article
 
[12]  Raykov, T., Patelis, T., & Marcoulides, G. A. (2011). Examining parallelism of sets of psychometric measures using latent variable modeling. Educational and Psychological Measurement, 71(6), 1047-1064.
In article      View Article
 
[13]  Scully, D. (2017). Constructing multiple-choice items to measure higher-order thinking. Practical Assessment, Research & Evaluation, 22(4), 4-13.
In article      
 
[14]  Sharma, P., Dunn, R. L., Wei, J. T., Montie, J. E., & Gilbert, S. M. (2016). Evaluation of point-of-care PRO assessment in clinic settings: integration, parallel-forms reliability, and patient acceptability of electronic QOL measures during clinic visits. Quality of Life Research, 25(3), 575-583.
In article      View Article  PubMed
 
[15]  Singhal, S. P., & Sridevi, M. (2019). Comparative study of performance of parallel Alpha Beta Pruning for different architectures. In 2019 IEEE 9th International Conference on Advanced Computing (IACC) (pp. 115-119). IEEE.
In article      View Article
 
[16]  Sireci, S., & Zenisky, A. (2016). Computerized innovative item formats: Achievement and credentialing. In S. Lane, M. Raymond, & T. Haladyna (Eds.), handbook of test development (2nd ed., 313-334). New York: Routledge.
In article      
 
[17]  Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60(2), 174-195.
In article      View Article
 
[18]  Wolfinger, R. D. (2014). Heterogeneous variance: covariance structures for repeated measures. Journal of Agricultural, Biological, And Environmental Statistics, 8(7), 205-230.
In article      
 
[19]  Wu, S. L., Tio, Y. P., & Ortega, L. (2021). Elicited imitation as a measure of L2 proficiency: New insights from a comparison of two L2 English parallel forms. Studies in Second Language Acquisition, 8(7), 1-30.
In article      View Article
 
[20]  Yarnold, P. R. (2014). How to Assess the Inter-Method (Parallel-Forms) Reliability of Ratings Made on Ordinal Scales: Emergency Severity Index (Version 3) and Canadian Triage Acuity Scale. Optimal Data Analysis, 3(4), 50-54.
In article