ISSN(Print): 2327-6126
ISSN(Online): 2327-6150

Article Versions

Export Article

Cite this article

- Normal Style
- MLA Style
- APA Style
- Chicago Style

Research Article

Open Access Peer-reviewed

Oku Kingsley, Iweka Fidelis^{ }

Published online: March 19, 2018

This study was centered on the Development, Standardization and Application of Chemistry Achievement Test using the One-Parameter Logistic Model of Item Response Theory. Eight research questions and one hypothesis guided the study. The hypothesis was tested at 0.05 level of significance. The research design was instrumentation research. The Researcher developed an instrument titled “OKUKINS Chemistry Achievement Test (OKUKINS CAT) consisting of 80 items. A proportional stratified random sampling technique was used to obtain a sample of 200 students from a population of 19,082 Senior Secondary School Two (SS II) students from Delta Central Senatorial District of Delta State. The sample was further divided into two equal parts (set A and set B) representing sample A and B respectively while the original sample represent sample O. The OKUKINS CAT yielded favourable statistics under the One-Parameter Logistic Model (1-PLM) with regards to the difficulty (b) parameters and ability estimates. Analyses done using the Xcalibre 4.2 software confirmed suitable indices of b parameters as prescribed by the Xcalibre manual. Using sample O, b parameters ranged from -2.417 to +2.834. Using sample A, b parameters ranged from -2.115 to +2.293. Using sample B, the b parameters ranged from -2.138 to +2.960. The validity of the instrument was estimated using the fit statistics. The results indicated perfect fit of One-Parameter Logistic Model ( 1-PLM). Using sample O, 74 items out of the 80 items of the OKUKINS CAT fitted the 1-PLM. Using sample A, 79 items out of the 80 items of the OKUKINS CAT fitted the 1-PLM. Using sample B, all the 80 items which constitute the OKUKINS CAT fitted the 1-PLM. The study also revealed that gender has no significant influence on academic achievement in chemistry. Based on the findings, it was recommended among others that, all examination bodies such as JAMB, WAEC, NECO and NABTEB should re-visit and review their test items using IRT models since this has been recommended by the International Testing Commission.

Learning is a psychological construct and usually very difficult to tell or explain how it takes place or occurs. It can only be inferred from performance but performance is not learning. What this means is that, an examinee can perform very well in an examination but may not have learnt the learning materials most especially if he or she engaged in examination malpractice. In a different dimension, a test when administered may give inconsistent results if it contains misleading, ambiguous and series of defective items. In another dimension, an examinee can fail in examination even if he /she has mastered the various concepts that form the basis of the test. This may be due to the fact that, the examinee is not ready psychologically prior to writing the test. Thus, there is a negative interaction between the affective and cognitive traits ^{ 1}.

It becomes evident that ability estimation of examinees with a test is a complicated task. Based on this, test constructors usually painstakingly undergo some rigorous processes to be certain that each item in a test is qualified to be incorporated in a test. This is achieved via item analysis. Item analysis is a statistical technique employed by test constructors to improve on the quality of their tests. According to Payne as cited in ^{ 2}, item analysis is conducted for four general purposes as thus:

1. To select the best available item for the final version of a test.

2. To detect learning difficulties of the class as a whole; identifying general content area or skills that need to be scrutinized by the instructor.

3. To identify structural or content defects in the items and

4. To identify for individual students, area of weakness in need of remediation.

For instance, if an instrument is achievement test, it is expected that, when administered, it should produce consistent scores. These scores depict the performance of the examinees. Academic achievement according to ^{ 3} is the performance outcomes that depict the extent to which a person has achieved definite goals that were the focal point of activities in instructional environments, particularly in school, college and university. If an instrument is poorly constructed, when administered, it will produce defective results. Thus, test developers usually undergo test validation to ensure that the results that will be obtained from the administration of a measuring instrument can be used to draw unquestionable inference, interpretation and conclusive prediction of the future occurrence.

According to ^{ 4}, “psychometrians are mainly concerned about the worth of test items as well as how examinees respond to them when constructing tests”. Measurement experts generally use psychometric methods to find out the validity and reliability of test items. Psychometric theory offers two measurement frameworks in analyzing test data: Classical Test Theory (CTT) and Item Response Theory (IRT). Both theories guide test developers to forecast results of psychological tests by recognizing parameters of items difficulties and the ability of examinees.

Classical Test Theory (CTT) has been the basis for psychometric theory for decades. The conceptual basics, assumptions, and expansion of the basic premises of CTT have paved way for the construction of some outstanding psychometrically sound scales in the measurement practices of educational bodies in Africa; “This is owing to the simplicity of interpretation which can usually be applied to testees achievement and aptitude test performance” ^{ 5} Despite the popularity of classical item statistics as a fundamental part of standardized testing and measurement framework, it is fraught with many shortcomings. Among these are the values of standard item parameters that are not invariant across groups of test takers that vary in ability. The invariance characteristic of item parameters demands that the estimate of the parameter of an item across two or more groups of population that vary in ability must be the same.

IRT is a model which expresses the probability of the relationship between an individual’s answering an item and the underlying latent trait often called ability measured by the instrument. According to ^{ 6}, in IRT, the true component is defined on the latent variable of interest rather than on the test, as in the case in CTT. IRT relates item performance to the ability measured by the whole test. The ability is denoted by θ; IRT uses the ability test to ascertain the amount of latent ability an individual possesses. The ability test is an interval with a midpoint of zero and a standard deviation of one. It ranges from plus infinity to minus infinity, but practical consideration is usually from -3 to +3, in some cases from -4 to +4. The relationship between the probability of answering an item and the ability scale is called Item Response Function ^{ 7}. The higher the ability, the higher the probability of getting a right response or responding to a higher category. A trace line called Item Characteristic Curve (ICC) or the Category Response Curve (CRC) is used to illustrate the Item Response Function. The Item Characteristic Curve is the fundamental building block of Item Response Theory; all the other constructs of the theory rely on this curve ^{ 8}. The Item Characteristic Curve is describe by two technical properties. The first is the item difficulty. Under Item Response Theory, the item difficulty describes where the item functions along the ability scale. It is the point on the ability scale at which the probability of correct response to an item is 0.5. It is designated by b. For instance, an easy item functions among the low-ability testees and difficult items functions among the high ability testees: Thus, difficulty is a location index.

The second technical property is discrimination, which depicts how well an item can distinguish between examinees having abilities below and above the item location. This property basically shows the steepness of the ICC in its middle section. The item discriminates better as the curve become steeper. However, the item discriminates less as the ICC become flatter. This is because, the probability of answering an item correctly at low ability levels is almost equivalent as it is at high ability levels. Using these two descriptors, one can illustrate the general form of the ICC. These descriptors are also used to talk about the technical attributes of an item.

Item Response Theory Framework covers a group of models and the applicability of each model relies on the nature of the test item. Models could be unidimensional, models measuring one construct or multi-dimensional, models assessing more than one construct. Unidimensional models are either dichotomously scored (wrong / right, yes / no) or polytomously scored (e.g Likert Scale). The dichotomous models are used for cognitive tests or measures where there are right and wrong answers. The dichotomous models are expressed by the number of parameter they employ. One-Parameter Logistic Model (1-PLM) or the Rasch Model which employ the b(difficulty) parameter assumes that item discriminates equally and there is no guessing but varies with difficulty. The Two-Parameter Logistic Model (2-PLM) which employs the b (difficulty) and a (discrimination) parameters assume that items differ in difficulty and discrimination and there is no guessing. The Three-Parameter Logistic Model (3-PLM) took the guessing parameter, c into consideration. It assumes that the low ability examinee will answer a difficult item correctly by guessing. The other two models have the b value at the point where the probability of responding to an item correctly is 0.5, but with the Three-Parameter Model, the lower limit of the Item Characteristic Curve is the value c rather than zero. Hence, the b parameter is at the point on the ability scale where the probability of correct response to an item is (I + c)/2, while the slope of the Item Characteristic Curve at θ=b is actually a (1 – c) /4.

In recent times, attention has been drawn towards the tests used in measuring the presence or absence of identified trait of interest, considering the precision made by the instrument constructed. Several researchers have questioned the effectiveness of the Classical Test Theory technique anchoring on the following limitations among others:

a) CTT estimates of difficulty of item are group dependent.

b) The p and r coefficients also rely on the sample from which they are taken.

The ability scores of the examinees are completely test dependent. These limitations among others are now being complemented by the IRT technique. IRT provides more accurate estimates of ability, several improvements in scaling persons and items, greater flexibility in situations where different groups or test forms are used. Despite these merits, IRT has not been given sufficient coverage in graduate education. Worse still, most psychometricians and educators are not familiar with both the theoretical and practical aspects of IRT as very few empirical works have been done on it. In fact, IRT has not been given the needed attention as CTT in the development of measuring instruction such as chemistry achievement test.

Chemistry is one the science subjects taught in the Junior Secondary Schools together with Physics and Biology as Basic Science. It is taught as a single subject in the Senior Secondary Schools. Kirti (n.d) conceptualized Chemistry as the branch of science that deals with the recognition of the substances of which matter is composed, the examination of their attributes and the ways in which the matter interact, combine, change, and the use of these processes to form new substance. The importance of Chemistry to humanity cannot be over-emphasized. According to Smita as cited in ^{ 9}, our all three basic needs i.e food, shelter and cloth are made by chemical processes using chemicals and fibres. That is, Chemistry is always present around us. So, Chemistry is important. It was based on this premise ^{ 10} concluded that “Chemistry is involved in the manufacture of all manmade objects and things. So, in your life time, everything you touch is tough by chemistry”. Despite the importance of Chemistry in our everyday life, available records have showed poor performance among students in the subject in external examinations. ^{ 11} researched on students performance in external examinations in Boarding and Day Secondary Schools in Kano Metropolis, Nigeria using the examinations conducted from 2005 -2011 for selected subjects: English, Mathematics, Sciences (including Chemistry ), and Home Economics. The results indicated poor performance among students in the selected subjects including Chemistry. According to the researcher, there was no year where the performance level of Day Senior Secondary Schools in Kano Metropolis reached 50% in any subject from 2005 -2011. Most of the measuring instruments used over the years were anchored on CTT. The credibility of these measuring instruments is yet to be extensively reviewed in the measurement community.

Considering the importance of chemistry to humanity, the need to develop an instrument which will reliably and validly measure students’ ability in chemistry cannot be over emphasized. This study adopted the One-Parameter Logistic Model (1-PLM) of Item Response Theory in the Development and Standardization of Chemistry Achievement Test. The choice of One-Parameter Logistic Model (1-PLM) for this study was based on the fact that Chemistry content as indicated in the scheme of work graduate in difficulty i.e. from simple to complex. So, if item discrimination is kept constant, an instrument should be able to measure the learning progress of students in chemistry, taken cognizance of the content areas.

The following research questions were answered in this study:

1) What is the item difficulty of each item of the OKUKINS Chemistry Achievement Test and their associated Standard Errors of Measurement?

2) What are the ability estimates of each examinee’s and their associated Standard Errors of Measurement using the OKUKINS Chemistry Achievement test?

3) To what extent do the items of the OKUKINS Chemistry Achievement Test fit the One-Parameter Logistic Model (1-PLM)?

4) What are the correlation coefficients of the Standard Errors of Measurement across samples associated with difficulty parameters as an evidence of item-by-item reliability of the OKUKINS Chemistry Achievement Test?

5) What is the correlation coefficient of the Standard Errors of Measurement across samples associated with ability estimates as an evidence of person reliability of the OKUKINS Chemistry Achievement Test?

6) What are the Item Characteristic Curves (ICC) of the items of the OKUKINS Chemistry Achievement Test?

7) How much information does the OKUKINS Chemistry Achievement Test provide over the ability trait range?

8) What difference exists in the academic achievement of male and female students in the OKUKINS Chemistry Achievement Test?

The following hypothesis was tested at 0.05 level of significance

1) There is no significant difference in the academic achievement of male and female students in the OKUKINS Chemistry Achievement Test.

This study adopted the instrumentation research design since it deals with development and standardization of a measuring instrument. The population of the study is all the Nineteen Thousand and Eighty-Two (19,082) SS II students in the One Hundred and Eighty-five (185) secondary schools in the eight Local Government Areas in Delta Central Senatorial District of Delta State. Research evidence had shown that, in IRT, sample size has direct relationship with the precision in which item parameters can be estimated. According to ^{ 12}, inadequate sample size can lead to increased estimation error with unfavourable implications for analysis of item and test data. Suen as cited in ^{ 2} recommended a sample size of 200 examinees for One-Parameter Logistic Model (1-PLM). The present researcher used a sample size of 200 SS II students selected from 4 sampled Local Government Areas in Delta Central Senatorial District of Delta State using proportional stratified random sampling technique. The sample was further divided into two equal parts (set A and B) each consisting of 100 students. The original sample was designated as sample O. Set A and set B were designated as sample A and sample B respectively. Equal numbers of male and female students were selected for the study.

The instrument for this study was titled “OKUKINS Chemistry Achievement Test (OKUKINS CAT). The researcher used a table of specification to generate 100 items across the six cognitive domains: knowledge, comprehension, application, analysis, synthesis and evaluation. The content areas considered were:

• Introduction to Chemistry

• Particulate Nature of Matter

• Symbols, Formulae and Equations

• Chemical Combination

• Carbon and its Compounds

• Gas Laws

• Water

• Acids, Bases and Salts

• Standard Separation Techniques

The items generated were administered to 100 students who were not part of the sample used for the investigation. After administration, item analysis was performed on each item using Marginal Maximum Likelihood Estimation Method via Xcalibre 4.2 software anchoring on One-Parameter Logistic Model (I-PLM). The result of the item analysis showed that 99 items of the OKUKINS CAT fitted the One-Parameter Model (1-PLM). One item has F flag with z Residual statistically significant, i.e. it does not fit the model. Out of the 99 items that fitted the 1-PLM, 19 items had flags as outlined below:

a) One item has Lb flag which indicate that the b parameter was lower than the minimum acceptable value.

b) 18 items had K flags which indicate that, the items keyed alternative did not have the highest correlation with the total score. The remaining 80 items which did not have flags either for F, K, Lb or Hb were selected to form the OKUKINS Chemistry Achievement Test (OKUKINS CAT).

The validity of the instrument was established using face, content and fit statistics. In IRT, the internal validity of a test is assessed in terms of the statistical fit of each model. According to ^{ 13}, if the fit statistic of an item is acceptable, the item is valid. During the item analysis, 99 items fitted the One-Parameter Logistic Model. Out of the 99 items that fitted the One-Parameter Logistic Model (1-PLM), 80 items were selected to form the OKUKINS CAT. Data analysis using three different samples (sample O, sample A and sample B) showed that, using sample O, 74 items out of the 80 items of the OKUKINS CAT fitted the 1-PLM; using sample A, 79 items out of the 80 items of the OKUKINS CAT fitted the 1-PLM and using sample B, all the 80 items of the OKUKINS CAT fitted the One-Parameter Logistic Model. Thus, the OKUKINS CAT is valid internally. The reliability of the instrument was estimated using correlation evidence of standard errors of measurement associated with difficulty parameters and ability estimates. Item-by-item reliability of the instrument was established through correlation evidence using the Standard Errors of Measurement associated with the difficulty parameters. Coefficients of 0.946, 0.721 and 0.604 were obtained. In the same dimension, person reliability of the instrument was estimated through correlation evidence using the Standard Errors of Measurement associated with the ability estimates. Coefficient of 0.595 was obtained.

The validated of OKUKINS Chemistry Achievement Test was administered to the subjects in the locale under investigation by the researcher and eight research assistants. The duration of the test was 1hour 30mins after which the answer scripts were retrieved from the examinees. Data obtained were analyzed using descriptive statistics of mean and standard deviation, independent sample t-test, pearson product moment correlation, factor analysis and marginal maximum likelihood Estimation Technique via Xcalibre 4.2 software anchoring on One-Parameter Logistic Model of IRT.

After data analysis, the result obtained for research questions 1-8 and hypothesis 1 were summarized and presented in tables and figures as indicated below.

Table 1 shows the difficulty parameters of the 80 items which constitute the OKUKINS Chemistry Achievement Test and their associated Standard Errors of Measurement (SEM) using sample O. The b parameters ranged from -2.417 to +2.834. Item 54 has the lowest difficulty parameter of -2.417, very easy item while item 74 has the highest difficulty parameter of +2.834, very difficult item. Table 1 also showed the Standard Errors of Measurement (SEM) associated with the b parameters. The SEM of the items of the OKUKINS CAT ranged from 0.148 to 0.344 with mean and standard deviation of 0.167 and 0.030 respectively.

Table 2 indicates the b values of the 80 items which form the OKUKINS Chemistry Achievement Test and their associated Standard Errors of Measurement using sample A and sample B. As shown in Table 2, in sample A, the difficulty parameters of the items of the OKUKINS Chemistry Achievement Test ranged from -2.115 to +2.293. The Standard Errors of Measurement associated with the b parameters using sample A (SEM_A) ranged from 0.210 to 0.382 with mean of 0.235 and standard deviation of 0.037. Also presented in Table 2 for sample B, the difficulty parameters of the items of the OKUKINS CAT ranged from -2.138 to +2.960. The Standard Errors of Measurement associated with the b parameters using sample B (SEM_B) ranged from 0.208 to 0.558 with mean of 0.240 and standard deviation of 0. 049.

Table 3 shows the ability estimates of each examinee’s and their associated Standard Errors of Measurement using sample O. The ability of the examinees ranged from -2.276 to +2.163. Also, the Standard Errors of Measurement associated with the ability estimates ranged from 0.2457 to 0.3128 with mean of 0.258 and standard deviation of 0.016. Classification of the score of each examinee’s using sample O, revealed that 39 students representing 19.5% have high scores, that is, scores above the item location while 161 students representing 80.5% have low scores, that is, scores below the item location. This implies that, the pass rate of the OKUKINS Chemistry Achievement Test is 19.5% using sample O anchoring on a cut score of 0.500.

Table 4 depicts the ability estimates of each examinee’s and their associated Standard Errors of Measurement using the OKUKINS CAT for Sample A and Sample B. For sample A, ability estimate of each examinee’s ranged from -1.743 to +1.652. The Standard Errors of Measurement for sample A associated with the ability estimates (SEM_Atheta) ranged from 0.246 to 0.3047 with mean and standard deviation of 0.257 and 0.014 respectively. For Sample B, ability estimates of each examinee’s ranged from -2.278 to +2.290. Also, the Standard Errors of Measurement for Sample B associated with the ability estimates (SEM_Btheta) ranged from 0.247 to 0.3582 with mean of 0.260 and standard deviation of 0.018. Classification of the examinees scores using Sample A showed that, 25 students representing 25% have high scores, that is, scores above the item location while 75 students representing 75% have low scores, that is, scores below the item location. This means that, the pass rate of the OKUKINS Chemistry Achievement Test is 25.0% using Sample A anchoring on a cut score of 0.500. In the same dimension, classification of the examinees scores using Sample B, revealed that, 10 students representing 10% have high scores, that is, scores above the item location while 90 students representing 90% have low scores, that is, scores below the item location. This implies that, the pass rate of the OKUKINS chemistry Achievement Test (OKUKINS CAT) is 10.0% using Sample B anchoring on cut score of 0.500.

Table 5 shows the item parameters for all calibrated items depicting the fit statistic of each item using the Z Residual. Four (4) items (item 6, 23, 29 and 65) have K flags, which indicate that the items keyed alternatives did not have the highest correlation with the total score. One item (item 11) had F flag which shows that the fit statistic was significant, thus, item 11 did not fit the One-Parameter Logistic Model (Z Resid = 2.119, P = 0.034). Five (5) items (item 17, 53, 61, 74 and 79) had both K and F flags. This means that, the items keyed alternatives did not have the highest correlation with the total score. Also, the items fit statistics were statistically significant, thus, the items did not fit the One-Parameter Logistic Model (item 17: Z Resid = 2.410, P = 0.016); item 53: Z Resid = 2.056, P =0.040; item 61: z Resid = 2.340, P =0.019; item 74: Z Resid = 2.484, P =0.013; item 79: Z Resid = 2.585, P =0.010). On the whole, using sample O, 74 items of the OKUKINS Chemistry Achievement Test (OKUKINS CAT) fitted the One- Parameter Logistic Model.

Table 6 depicts the item parameters for all calibrated item showing the fit statistic of each item using the Z Residual. Fifteen (15) items have K flags which indicate that the items keyed alternative did not have that highest correlation with the total score (item 4, 6, 17, 29, 30, 33, 36, 38, 51, 53, 61, 62, 65, 72 and 79). One (1) item (item 74) had K and F flag. This implies that, item 74 keyed alternative did not have the highest correlation with the total score. Also, the item fit statistic is statistically significant (Z Resid =2.481, P = 0.000), thus, item 74 did not fit the One -Parameter Logistic Model using sample A. The result shows that, 79 items out of the 80 items of the OKUKINS CAT fitted the One-Parameter Logistic Model using sample A.

Table 7 shows the item parameters for all calibrated items indicating the fit statistic of each item using the Z Residual. Fifteen (15) items had K flags (item 6, 11, 17, 23, 33, 35, 36, 38, 51, 53, 60, 61, 62, 69 and 79). This implies that, the items keyed alternative did not have the highest correlation with the total score. None of the items were flagged either for F, Lb or Hb, thus, all the 80 items of the OKUKINS Chemistry Achievement Test fitted the One-Parameter Logistic Model using sample B.

A cursory examination of the results in Table 5, Table 6 and Table 7 reveals that the items of the OKUKINS ACT fitted the I- PLM more perfectly using a sample size of 100 examinees than when a sample size of 200 examinees was used.

Table 8 shows the correlation coefficient on the Standard Errors of Measurement across samples associated with difficulty parameters. Relating the Standard Errors of Measurement associated with b parameters in Sample O (SEM_O) with Sample A (SEM_A), a coefficient of 0.946 was obtained. Relating the Standard Errors of Measurement associated with b parameters in Sample O (SEM_O) with Sample B (SEM_B), a coefficient of 0.604 was obtained. Relating the Standard Errors of Measurement associated with b parameters in Sample A (SEM_A) with Sample B (SEM_B), a coefficient of 0.721 was obtained. P value of 0.000 was obtained from the correlation of SEM_O versus SEM_A, SEM_O versus SEM_B and SEM_A versus SEM_B. The p value of 0.000 is statistically significant at 0.05 alpha level, thus, the item-by-item reliability of the OKUKINS Chemistry Achievement Test is verified through correlation evidence using the Standard Errors of Measurement across samples associated with the difficulty parameter. According to ^{ 14}, item reliability as used in IRT, is the degree in which item difficulties are differentiated. In this case, the difficulty parameters of the items of the OKUKINS Chemistry Achievement Test are appropriately differentiated.

Figure 1 presents the Test Response Function (TRF) for all calibrated items. The Test Response Function predicts the number of items an examinee will respond correctly as a function of theta. In this case, the TRF predicts 84.6% of the score of each examinee on the OKUKINS Chemistry Achievement Test.

Table 9 indicates the correlation coefficient of the Standard Errors of Measurement across samples associated with ability estimates. Relating the Standard Errors of Measurement associated with ability estimates in Sample A (SEM_A) with that of Sample B (SEM_B), a coefficient of 0.595 was obtained, with probability of 0.000. The P value of 0.000 is statistically significant at 0.05 alpha level. Thus, the Standard Errors of Measurement (SEM) across samples associated with ability estimates are consistent. This is an evidence of person reliability of the OKUKINS Chemistry Achievement Test. Person reliability is the degree in which a measuring instrument differentiates persons in the test outcome. In this case, the OKUKINS CAT differentiates ability appropriately in the test outcome.

Figure 2 to Figure 4 present the Item Characteristics Curves for three items of the OKUKINS CAT. Also presented along side with the ICC are four tables.

1) **Item Information Table:** This table records the information supplied by the control file (classic Data Header) for the item.

2) **Classical Statistics Table:** This table presents the classical statistics for the item.

3) **IRT Parameter:** This table records item parameter estimates for the item.

4) **Option Statistics:** This table provides detailed statistics for each item, which helps to identify issues in items with poor statistics.

In the classical statistics table, the P value and the point-biserial correlations are presented in the first three columns. The P value is the proportion of testees that answered an item in the keyed direction. It ranges from 0 to 1. The S- Rpbis and T- Rpbis are the point- biserial correlations of an item with total score and theta respectively. The last column in the classical statistics table is the Alpha w/o which is the Cronbach’s alpha computed with the current item excluded.

The IRT parameters table presents data for item parameters and fit statistics. In this case, the discriminative parameter is constant for all items i.e. a = 1.0, while the b parameter varies depending on the difficulty of the item. Also presented in the IRT parameter table is the Standard Error (SE) for each item. A large SE for an item parameter (compared to the other items) shows that the item parameter was poorly estimated. The z Residual is also presented in the IRT parameter table, used to determine the fit statistic of a dichotomous item. For dichotomous items, the P value for rejecting the item as poor fit was computed using the z residual with the standard normal distribution as its sampling distribution.

In the option statistics table, the responses of examinees across the options for each item are clearly indicated. ^{ 8} used five verbal terms to describe the difficulty of an item: very easy --- easy --- medium –hard --- very hard.

According to ^{ 14} “Higher b-parameters (> 1.0) indicate that the item is more difficult, a value below -1.0 indicates that the item is very easy”. Following the interpretation of Kpolovie and Emekene, it can be inferred from the result that, item 32 is an easy item; item 13 is an item of medium difficulty and item 28 is difficult item.

Figure 5 displays a graph of the information function for all calibrated items of the OKUKINS CAT. The Test Information Function (TIF) is a graphical representation of how much information the test is providing at each level of theta. In this case, the maximum information was 16.561 at theta = -0.050, thus, the TIF provides satisfactory information over the ability trait range since it takes the shape of a normal distribution curve.

Figure 6 presents the graph of the Conditional Standard Error of Measurement (CSEM) Function. The CSEM is an inverted function of the TIF. It estimates the amount of error in theta estimation for each level of theta. In this case, the minimum CSEM was 0.246 at theta = -0.050.

Table 13 shows the mean scores of male and female students’ academic achievement in the OKUKINS Chemistry Achievement Test as 34.91 and 32.48 respectively, with respective standard deviation of 11.83 and 9.56. This suggests that, there is a difference between male and female students Performance in chemistry in the locale under investigation. When these values were subjected to independent sample t test analysis, it was revealed that the calculated t which is 1.598 is not statistically significant at 0.05 alpha level. Since the probability (p- value) of 0.112 is greater that the significant level of 0.05, the null hypothesis is therefore accepted. That is, there is no significant difference in the academic achievement of male and female students in the OKUKINS Chemistry Achievement Test.

The result of this study indicated that the items of the OKUKINS CAT fitted perfectly the 1-PLM when a sample size of 100 examinees was used for item calibration than when a sample size of 200 examinees was used. Using a sample size of 200 examinees, 74 items out of the 80 items which form the OKUKINS CAT fitted the One-Parameter Logistic Model (I-PLM). Using a sample size of 100 examinees (set A), 79 items fitted the 1-PLM. Using a sample size of 100 examinees (set B), all the 80 items of the OKUKINS CAT fitted the One-Parameter Logistic Model. The results are in consonance with that of ^{ 15} who reported that the 15 items of the Advanced Progressive Matrices Scale- Smart Version (APM-SV) fitted perfectly the One-Parameter Logistic Model (1-PLM), Two-Parameter Logistic Model (2-PLM) and Three-Parameter Logistic Model (3-PLM). The findings diverges from that of ^{ 15} who reported that out of the 40 items of the 2010 Botswana Junior Certificate (JC) Mathematics paper 1, only one item fitted the 1-PLM, eleven (11) items fitted the 2-PLM and 23 items fitted the 3-PLM. Based on the result, the 3-PLM was used to estimate the item parameters.

One of the result of the study showed that, the difficulty parameters of the items of the OKUKINS CAT yielded satisfactory result. The result of item parameters for all calibrated items showed that, using sample O, the b parameters ranged from -2.417 to +2.834. Using sample A, the difficulty parameters ranged from - 2.115 to +2.293. Using sample B, the b parameters ranged from -2.138 to +2.960. According to the Xcalibre manual, difficulty index ranges in theory from negative infinity to positive infinity, but practical consideration is usually from -3.0 (very easy) to + 3.0 (very difficult). In this case, the b parameter across samples graduated from very easy to very difficult. Similar result was found in a related study by ^{ 16} who used the 3-PLM to calibrate all the 36 items of the Advanced Progressive Matrices (APM) Scale. The result of item parameters for all calibrated items yielded b parameters ranging from -2.595 to + 2.133.

The results of this study also revealed that, the Test Information Function (TIF) of the OKUKINS CAT provide satisfactory information over the ability trait range since it took the shape of a normal distribution curve. Using sample O, the maximum information was 16.561 at theta = -0.050. Using sample A, the maximum information of the OKUKINS CAT provided was 16.494 at theta = -0.050. Using sample B, the maximum information was 16.395 at theta = -0.050. In a similar study, ^{ 16} reported maximum information of 8.258 at theta = 0.000 for the Advance Progressive Matrices (APM) Scale. Also, ^{ 14} reported maximum information of 8.090 at theta = - 0.200 for APM –SV. The TIF in the two investigations also took the shape of a normal distribution curve.

It was revealed in this study that, gender has no significant influence on academic achievement in chemistry. The mean scores of the male and females students were 34.91 and 32.48 respectively, with respective standard deviation of 11.83 and 9.56. When these observations were subjected to independent t-test analysis, it was revealed that, no significant difference exist between the academic achievement of the male and female students in the OKUKINS Chemistry Achievement Test.

The overall result showed that, gender has no significant influence on academic achievement. Similar results were found in related studies by Nwachukwu and Inomiesa as cited in ^{ 17} that, gender does not really counts in academic achievement in the sciences. That is, there is no significant difference between the performance of male and female students in the sciences. However, ^{ 18} and ^{ 19} reported contrary results that, there is significant difference between the performance of male and female students in favour of the male students. In a different dimension, Weerakkody and Ediriweera reported that, significant difference exist between the performance of male and female students in favour of the female students.

Over the years, research indicated that, the African woman was never perceived traditionally to be on equal footing with the male counterpart. The corollary deprivation and marginalization of the African woman has been substantiated in the investigation of Otite and Ogionwo as cited in ^{ 18}. The pervasive system of patriarchy placed the man on an enviable and revered pedestal and gave him ample latitude and social leverage to Lord it over the African woman, thereby relegating them to the position of subservience. According to Jike and Buadi as cited in ^{ 17} “this patriarchally –induced gender dichotomies have been transposed to the education system”.

Nosike as cited in ^{ 18} observed that, social customs often account for the popular belief that, girls do not need education since they will marry and raise children other than work at a Job outside their home where educational qualifications are required. In line with the above, most girls do opt for expressive Jobs like cooking, cleaning and child nurturing in tandem with social perception of woman’s role. The result of this study clearly showed that, the subservient position of the African woman as noted by Jike and Buadi and the social customs substantiated by Nosike have been over-taken by events. Owing to sensitization and more enlightenment, the female students have now on equal footing to compete with their male counterparts in the academia.

Based on the findings, the following are contributions to existing knowledge

1. Through the study an instrument to measure students’ true ability had been developed.

2. An instrument that will aid tailor testing is brought to the open.

3. Through the study, the workability of IRT in developing a test is determined.

Based on the findings, the following conclusions are made:

1) The OKUKINS CAT fitted the One-Parameter Logistic Model of IRT.

2) The difficulty parameters of the items of the OKUKINS CAT ranged from -2.417 (very easy) to +2.834 (very difficult)using sample O. Using sample A the difficulty parameters ranged from -2.115 (very easy) to +2.293 (very difficult) using sample B, the difficulty parameters ranged from -2.138 (very easy) to +2.960 (very difficult). Thus, the range if difficulty parameters of the items of the OKUKINS CAT fall within the range of difficulty parameters prescribed by the Xcalibre manual i.e +3.0 (very easy) to -3.0 (very difficult).

3) The ability estimate of the OKUKINS CAT ranged from -2.276 to +2.163 using sample O; from -1.743 to + 1.652 using sample A; from -2.278 to +2.290 using sample B, all yielding favourable statistics.

4) The standard errors of measurement associated with difficulty parameters ranged from 0.148 to 0.344 using sample O, from 0.210 to 0.382 using sample A; from 0.208 to 0.558 using sample B, all producing favourable indices.

The standard errors of measurement associated with ability estimates ranged from 0.2457 to 0.3128 using sample O, from 0.246 to 0.3047 using sample A; from 0.247 to 0.3582 using sample B, all yielding favourable statistics.

Based on the findings of this study the following recommendations are made:

1. IRT software such as the Xcaliber 4.2 should be made available in every institution of higher learning considering its uniqueness in data analysis.

2. All examination bodies such as JAMB, WAEC, NECO and NABTEB should re-visit and review their test items using the IRT models since this has been recommended by the International Testing Commission.

3. Effort should be made by psychometricians in Nigeria to promote the use of IRT models during test construction.

4. The female students should be encourage to opt for science courses in the institution of higher learning, since the result of this study had disclosed that gender does not influence performance in the sciences.

5. Non-Governmental Organizations (NGOs) interested in measuring the basic ability of students in Chemistry for the sake of scholarship and concerned parents and guardians who want to assess the learning progress of their children and wards in Chemistry should opt for the OKUKINS Chemistry Achievement Test.

[1] | Oku, K. (2015) Measures for achieving academic excellence. Paper presented at the lesser chapter All Saints’ Cathedral (Anglican Communion), Ughelli Delta State, during the AYF youth week celebration. | ||

In article | |||

[2] | Iweka, F. (2014) Comprehensive guide to test construction and administration. Omoku: Chifas Nigeria. | ||

In article | |||

[3] | Steinmagr, R, Meibner, A, Weidinger, A.F,& Wirthwein, L. (2015). Academic achievement. Retrieved from https://www.Oxford bibliographies.com/view/document/obo-9780199756810/obo-9780199756810-0108. Xml. | ||

In article | View Article | ||

[4] | Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The international journal of educational and psychological assessment, 1(1), 1-11. | ||

In article | View Article | ||

[5] | Ojerinde, D. (2013) Classical Test Theory VS Item Response Theory: An evaluation of the comparability of item analysis results. Retrieved from https://ui.edu.ng/sites/default/files/PROF%20OJERINDE%27S%20LECTURE%20(Autosaved).pdf. | ||

In article | View Article | ||

[6] | Kpolovie, P.J. (2014) Test measurement and evaluation in education. Owerr i: Springfield. | ||

In article | |||

[7] | Bomo, C. A. (2015) Application of Item Response Theory in the development of students’ attitude towards Mathematics Tests. Unpublished dissertation, University of Port Harcourt. 1-6. | ||

In article | |||

[8] | Baker, F. B. (2001) The Basics of Item Response Theory. USA: ERIC Clearinghouse on Assessment and Evaluation. | ||

In article | View Article | ||

[9] | Anne, M.H. (2015) Why is chemistry important? Retrieved from https://www.thoughtco.com/why-is-chemistry-important-604144. | ||

In article | View Article | ||

[10] | Jason, S. (2013) The importance of chemistry in everyday life. Retrieved from https://sciencezoneja.wordpress.com/2013/12/24/the-importance-of-chemistry-in-everyday-life/. | ||

In article | View Article | ||

[11] | Ogunbanwo, R. A. (2014) Analysis of students’ performance in West African Senior Certificate Examination in boarding and day secondary schools in Kano metropolis, Nigeria. Master Thesis, Ahmadu Bello University, Zaria. | ||

In article | |||

[12] | Wheadon, (2014) Classification: Accuracy and consistency under item response theory models using the package classify. Journal of statistical software, 56, issue 10. | ||

In article | View Article | ||

[13] | Dayalata, A. & Obinne, E. (2013) Test item validity: Item response theory(IRT) perspective Nigeria. Research journal in organizational psychology and education studies, 2(1), | ||

In article | |||

[14] | Kpolovie, P. J. & Emekene, C.O. (2016) Psychometric advent of advanced progressive matrices- smart version (APM-SV) for use in Nigeria. European journal of statistics and probability, 4(3), 20-30. | ||

In article | View Article | ||

[15] | Omobola, O.A & Adedoyin, J. A (2015) Assessing the comparability between Classical Test Theory (CTT) and Item Response Theory (IRT) models in meters. Herald journal of education and general studies 2(3), 107 -114. | ||

In article | |||

[16] | Emekene, C.O. (2017) Psychometric analysis of the advanced progressive matrices scale for use in Nigeria. Ph.D Thesis, University of Port Harcourt. | ||

In article | |||

[17] | Oku, K. (2008) Comparison of academic achievement of sandwich and regular postgraduate students of the University of Port Harcourt. Unpublished master thesis, University of Port Harcourt. | ||

In article | |||

[18] | Ileh, L. N. (2017) Development and validation of Mathematics achievement test for continuous assessment in junior secondary schools in Delta State. Unpublished thesis, Delta State University, Abraka. | ||

In article | |||

[19] | Akpochafo, W. P.( 2003) Gender related differences and academic achievement in junior secondary school social studies. Journal of educational research and development, 2(1), 35-42. | ||

In article | |||

Published with license by Science and Education Publishing, Copyright © 2018 Oku Kingsley and Iweka Fidelis

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/

Oku Kingsley, Iweka Fidelis. Development, Standardization and Application of Chemistry Achievement Test Using the One-Parameter Logistic Model (1-Plm) of Item Response Theory (Irt). *American Journal of Educational Research*. Vol. 6, No. 3, 2018, pp 238-257. https://pubs.sciepub.com/education/6/3/11

Kingsley, Oku, and Iweka Fidelis. "Development, Standardization and Application of Chemistry Achievement Test Using the One-Parameter Logistic Model (1-Plm) of Item Response Theory (Irt)." *American Journal of Educational Research* 6.3 (2018): 238-257.

Kingsley, O. , & Fidelis, I. (2018). Development, Standardization and Application of Chemistry Achievement Test Using the One-Parameter Logistic Model (1-Plm) of Item Response Theory (Irt). *American Journal of Educational Research*, *6*(3), 238-257.

Kingsley, Oku, and Iweka Fidelis. "Development, Standardization and Application of Chemistry Achievement Test Using the One-Parameter Logistic Model (1-Plm) of Item Response Theory (Irt)." *American Journal of Educational Research* 6, no. 3 (2018): 238-257.

Share

[1] | Oku, K. (2015) Measures for achieving academic excellence. Paper presented at the lesser chapter All Saints’ Cathedral (Anglican Communion), Ughelli Delta State, during the AYF youth week celebration. | ||

In article | |||

[2] | Iweka, F. (2014) Comprehensive guide to test construction and administration. Omoku: Chifas Nigeria. | ||

In article | |||

[3] | Steinmagr, R, Meibner, A, Weidinger, A.F,& Wirthwein, L. (2015). Academic achievement. Retrieved from https://www.Oxford bibliographies.com/view/document/obo-9780199756810/obo-9780199756810-0108. Xml. | ||

In article | View Article | ||

[4] | Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The international journal of educational and psychological assessment, 1(1), 1-11. | ||

In article | View Article | ||

[5] | Ojerinde, D. (2013) Classical Test Theory VS Item Response Theory: An evaluation of the comparability of item analysis results. Retrieved from https://ui.edu.ng/sites/default/files/PROF%20OJERINDE%27S%20LECTURE%20(Autosaved).pdf. | ||

In article | View Article | ||

[6] | Kpolovie, P.J. (2014) Test measurement and evaluation in education. Owerr i: Springfield. | ||

In article | |||

[7] | Bomo, C. A. (2015) Application of Item Response Theory in the development of students’ attitude towards Mathematics Tests. Unpublished dissertation, University of Port Harcourt. 1-6. | ||

In article | |||

[8] | Baker, F. B. (2001) The Basics of Item Response Theory. USA: ERIC Clearinghouse on Assessment and Evaluation. | ||

In article | View Article | ||

[9] | Anne, M.H. (2015) Why is chemistry important? Retrieved from https://www.thoughtco.com/why-is-chemistry-important-604144. | ||

In article | View Article | ||

[10] | Jason, S. (2013) The importance of chemistry in everyday life. Retrieved from https://sciencezoneja.wordpress.com/2013/12/24/the-importance-of-chemistry-in-everyday-life/. | ||

In article | View Article | ||

[11] | Ogunbanwo, R. A. (2014) Analysis of students’ performance in West African Senior Certificate Examination in boarding and day secondary schools in Kano metropolis, Nigeria. Master Thesis, Ahmadu Bello University, Zaria. | ||

In article | |||

[12] | Wheadon, (2014) Classification: Accuracy and consistency under item response theory models using the package classify. Journal of statistical software, 56, issue 10. | ||

In article | View Article | ||

[13] | Dayalata, A. & Obinne, E. (2013) Test item validity: Item response theory(IRT) perspective Nigeria. Research journal in organizational psychology and education studies, 2(1), | ||

In article | |||

[14] | Kpolovie, P. J. & Emekene, C.O. (2016) Psychometric advent of advanced progressive matrices- smart version (APM-SV) for use in Nigeria. European journal of statistics and probability, 4(3), 20-30. | ||

In article | View Article | ||

[15] | Omobola, O.A & Adedoyin, J. A (2015) Assessing the comparability between Classical Test Theory (CTT) and Item Response Theory (IRT) models in meters. Herald journal of education and general studies 2(3), 107 -114. | ||

In article | |||

[16] | Emekene, C.O. (2017) Psychometric analysis of the advanced progressive matrices scale for use in Nigeria. Ph.D Thesis, University of Port Harcourt. | ||

In article | |||

[17] | Oku, K. (2008) Comparison of academic achievement of sandwich and regular postgraduate students of the University of Port Harcourt. Unpublished master thesis, University of Port Harcourt. | ||

In article | |||

[18] | Ileh, L. N. (2017) Development and validation of Mathematics achievement test for continuous assessment in junior secondary schools in Delta State. Unpublished thesis, Delta State University, Abraka. | ||

In article | |||

[19] | Akpochafo, W. P.( 2003) Gender related differences and academic achievement in junior secondary school social studies. Journal of educational research and development, 2(1), 35-42. | ||

In article | |||