The Classical Test or Item Response Measurement Theory: The Status of the Framework at the Examination Council of Lesotho
Abstract
While the Examination Council of Lesotho (ECOL) is burdened with a huge workload of assessment tasks, their procedures for developing tests, analysing items, and compiling scores heavily rely on the classical test theory (CTT) measurement framework. The CTT has been criticised for its flaws, including being test-oriented, sample dependent, and assuming linear relationships between latent variables and observed scores. This article presents an overview of CTT and item response theory (IRT) and how they were applied to standard assessment questions in the ECOL. These theories have addressed measurement issues associated with commonly used assessments, such as multiple-choice, short response, and constructed response tests. Based on three search facets (Item response theory, classical test theory, and examination council of Lesotho), a comprehensive search was conducted across multiple databases (such as Google Scholar, Scopus, Web of Science, and PubMed). The paper was theoretically developed using the electronic databases, keywords, and references identified in the articles. Furthermore, the authors ensure that the keywords are used to identify relevant documents in a wide variety of sources. A general remark was made on the effective application of each model in practice with respect to test development and psychometric activities. In conclusion, the study recommends that ECOL switch from CTT to modern test theory for test development and item analysis, which offers multiple benefits.
https://doi.org/10.26803/ijlter.21.8.22
Keywords
Full Text:
PDFReferences
Ackerman, T. A. (2010). The Theory and Practice of Item Response Theory by de Ayala, R. J. Journal of Educational Measurement, 47(4), 471–476. https://doi.org/10.1111/j.1745-3984.2010.00124.x
Adedoyin, O. O. (2010). Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories. International Journal of Educational Sciences, 2(2), 107–113. https://doi.org/10.31901/24566322.2010/02.02.07
Adegoke, B. A. (2013). Comparison of item statistics of physics achievement test using Classical test theory and item response theory frameworks. Journal of Education and Practice, 22(4), 87–96. www.iiste.org
Adewale, J.G., Adegoke, B.A., Adeleke, J.O. & Metibemu, M. A. (2017). A Training Manual On Item Response Theory. Institute of Education, University of Ibadan in Collaboration with National Examinations Council, Minna, Niger State.
Alagoz, C. (2005). Scoring tests with dichotomous and polytomous items. https://getd.libs.uga.edu/pdfs/alagoz_cigdem_200505_ma.pdf
Algina, J., & Swaminathan, H. (2015). Psychometrics: Classical Test Theory. In International Encyclopedia of the Social & Behavioral Sciences: Second Edition (pp. 423–430). Elsevier Inc. https://doi.org/10.1016/B978-0-08-097086-8.42070-2
Ayanwale, M.A. (2019). Efficacy of Item Response Theory in the Validation and Score Ranking of Dichotomous and Polytomous Response Mathematics Achievement Tests in Osun State, Nigeria. In Doctoral Thesis, Institute of Education, University of Ibadan (Issue April). https://doi.org/10.13140/RG.2.2.17461.22247
Ayanwale, Musa Adekunle, Adeleke, J. O., & Mamadelo, T. I. (2019). Invariance Person Estimate of Basic Education Certificate Examination: Classical Test Theory and Item Response Theory Scoring Perspective. Journal of the International Society for Teacher Education, 23(1), 18–26. https://files.eric.ed.gov/fulltext/EJ1237578.pdf
Baker, F.B. (2001). The Basics of Item Response Theory. Test Calibration. ERIC Clearinghouse on Assessment and Evaluation.
Baker, Frank B, & Kim, S. (2017). The Basics of Item Response Theory Using R (S. E. Fienberg (ed.)). Springer International Publishing. https://doi.org/10.1007/978-3-319-54205-8_1
Behavior, S., Yen, Y. C., Chen, H., & Cheng, M. (2012). The Four-Parameter Logistic Item Response Theory Model As a Robust Method of Estimating Ability Despite Aberrant Responses. Social Behavior and Personality: An international journal, 40(10), 1679-1694.https://doi.org/10.2224/sbp.2012.40.10.1679
Bichi, A. A., Embong, R., Talib, R., Salleh, S., & Bin Ibrahim, A. (2019). Comparative Analysis of Classical Test Theory and Item Response Theory using Chemistry Test Data. International Journal of Engineering and Advanced Technology, 8(5), 1260–1266. https://doi.org/10.35940/ijeat.E1179.0585C19
Bichi, A. A., & Talib, R. (2018). Item Response Theory: An Introduction to Latent Trait Models to Test and Item Development. International Journal of Evaluation and Research in Education, 7(2), 142. https://doi.org/10.11591/ijere.v7i2.12900
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M. and Novick, M.R., Eds., Statistical Theories of Mental Test Scores, Addison-Wesley, Reading, 397-479.
Bovaird, J. A., & Embretson, S. E. (2012). Modern Measurement in the Social Sciences. In The SAGE Handbook of Social Research Methods (pp. 268–289). SAGE Publications Ltd. https://doi.org/10.4135/9781446212165.n16
Brown, J. D. (2013). Classical test theory. In The Routledge Handbook of Language Testing (pp. 323–335). Springer, Singapore. https://doi.org/10.4324/9780203181287-35
Cai, L., Choi, K., Hansen, M., & Harrell, L. (2016). Item Response Theory. In Annual Review of Statistics and Its Application (Vol. 3, pp. 297–321). Annual Reviews Inc. https://doi.org/10.1146/annurev-statistics-041715-033702
Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006
Chen, W. H., & Thissen, D. (1997). Local Dependence Indexes for Item Pairs Using Item Response Theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.3102/10769986022003265
Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. https://doi.org/10.1037/pas0000626
Cohen, R.J., & Swerdlik, M. E. (2009). Psychological testing and assessment: An introduction to tests and measurement. (4th ed.). Mayfield Publishing House.
Cohen, R. . J., Swerdlik, M. E., & Sturman, E. (2013). Psychological testing and assessment : an introduction to tests and measurement. Psychological Assessment, 53(4), 55–67. https://perpus.univpancasila.ac.id/repository/EBUPT181396.pdf
Courville, T. G. (2005). An empirical comparison of item response theory and classical test theory item/person statistics. Dissertation Abstracts International Section A: Humanities and Social Sciences, 65(7), 2575. https://oaktrust.library.tamu.edu/handle/1969.1/1064
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Harcourt Brace Jovanovich. https://eric.ed.gov/?id=ED312281
De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. In Medical Education, 44(1), 109–117. https://doi.org/10.1111/j.1365-2923.2009.03425.x
Debelak, R., & Koller, I. (2020). Testing the Local Independence Assumption of the Rasch Model With Q3-Based Nonparametric Model Tests. Applied Psychological Measurement, 44(2), 103–117. https://doi.org/10.1177/0146621619835501
Demars, C. E. (2017). Classical test theory and item response theory. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development, 2(1), 49–73. https://doi.org/10.1002/9781118489772.ch2
Dent, J. A., Harden, R. M., & Hunt, D. (2001). A Practical Guide for Medical Teachers. Journal of the Royal Society of Medicine, 94(12), 653–653. https://doi.org/10.1177/014107680109401222
DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), 50-59. https://doi.org/10.1097/01.mlr.0000245426.10853.30
Downing, S. M. (2003). Item response theory: Applications of modern test theory in medical education. Medical Education, 37(8), 739–745. https://doi.org/10.1046/j.1365-2923.2003.01587.x
Ebel, R. L. (1965). Book Reviews : Measuring Educational Achievement. Educational and Psychological Measurement, 25(4), 1167–1169. https://doi.org/10.1177/001316446502500428
Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5–18. https://doi.org/10.1007/s11136-007-9198-0
Elgadal, A. H., & Mariod, A. A. (2021). Item Analysis of Multiple-choice Questions (MCQs): Assessment Tool For Quality Assurance Measures. Sudan Journal of Medical Sciences, 16(3), 334-346. https://doi.org/10.18502/sjms.v16i3.9695
Embretson, S. E., & Reise, S. P. (2013). Item Response Theory for Psychologists. Lawrence Erlbaum Associates, Inc., Mahwah. 1–371. https://doi.org/10.4324/9781410605269
Esmaeeli, B., Shandiz, E. E., Norooziasl, S., & Shojaei, H. (2021). The Optimal Number of Choices in Multiple-Choice Tests : A Systematic Review. Med Edu Bull, 2(5), 253–260. https://doi.org/10.22034/MEB.2021.311998.1031
Exam Council of, L. (2018). Establishment of ECOL. https://www.google.com/search?q=Exam+Council+of+Lesotho%2C+2018&oq=Exam+Council+of+Lesotho%2C+2018&aqs=chrome..69i57j33i160l2.2034j0j7&sourceid=chrome&ie=UTF-8
Filgueiras, A., Hora, G., Fioravanti-Bastos, A. C. M., Santana, C. M. T., Pires, P., De Oliveira Galvão, B., & Landeira-Fernandez, J. (2014). Development and psychometric properties of a novel depression measure. Temas Em Psicologia, 22(1), 249–269. https://doi.org/10.9788/TP2014.1-19
Finch, H., & Monahan, P. (2008). A bootstrap generalization of modified parallel analysis for IRT dimensionality assessment. Applied Measurement in Education, 21(2), 119–140. https://doi.org/10.1080/08957340801926102
Finch, W. H., & French, B. F. (2015). Modeling of Nonrecursive Structural Equation Models With Categorical Indicators. Structural Equation Modeling, 22(3), 416–428. https://doi.org/10.1080/10705511.2014.937380
Ganglmair, A., & Lawson, R. (2010). Advantages of Rasch modelling for the development of a scale to measure affective response to consumption. In E-European Advances in Consumer Research, 6, 162–167. https://www.acrwebsite.org/volumes/11738
Gay, L.R, Miles, G. E. & Airasian, P. (2011). Educational Research: Competencies for Analysis and Applications. 10th Edition, Pearson Education International, Boston.
González, J., & Wiberg, M. (2017). Applying Test Equating Methods using R. Methodology of Educational Measurement and Assessment. https://link.springer.com/bfm:978-3-319-51824-4/1
Hambleton, R.K. and Swaminathan, H. (1985). Item response theory: principles and applications. p.332. https://doi.org/10.1177/014662168500900315
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x
Hill, C., Nel, J. A., van de Vijver, F. J. R., Meiring, D., Valchev, V. H., Adams, B. G., & de Bruin, G. P. (2013). Developing and testing items for the South African Personality Inventory. SA Journal of Industrial Psychology, 39(1), 1-13. https://doi.org/10.4102/sajip.v39i1.1122
Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of the Pakistan Medical Association, 62(2), 142–147. https://pubmed.ncbi.nlm.nih.gov/22755376/
Immekus, J. C., Snyder, K. E., & Ralston, P. A. (2019). Multidimensional Item Response Theory for Factor Structure Assessment in Educational Psychology Research. Frontiers in Education, 4. https://doi.org/10.3389/feduc.2019.00045
IResearchNet (2022). Classical Test Theory. http://psychology.iresearchnet.com/industrial-organizational-psychology/i-o-psychology-theories/classical-test-theory/
Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment. Applied Psychological Measurement, 40(8), 559–572. https://doi.org/10.1177/0146621616664046
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. https://doi.org/10.1037/h0057123
Khan, H. F., Danish, K. F., Awan, A. S., & Anwar, M. (2013). Identification of technical item flaws leads to improvement of the quality of single best multiple choice questions. Pakistan Journal of Medical Sciences, 29(3), 715. https://doi.org/10.12669/pjms.293.2993
Kim, D., de Ayala, R. J., Ferdous, A. A., & Nering, M. L. (2011). The comparative performance of conditional independence indices. Applied Psychological Measurement, 35(6), 447–471. https://doi.org/10.1177/0146621611407909
Kline, R. B. (2005). “Principles and practice of structural equation modelling ”. ((2nd ed.)). The Guilford Press.
Kline, T. (2014). Classical Test Theory: Assumptions, Equations, Limitations, and Item Analyses. In Psychological Testing: A Practical Approach to Design and Evaluation, 23(2), 91–106. https://doi.org/10.4135/9781483385693.n5
Kolen, M. J. (1981). Comparison of traditional and Item Response Theory methods for equatingTests. Journal of Educational Measurement, 18(1), 1–11. https://doi.org/10.1111/j.1745-3984.1981.tb00838.x
Krishnan, V. (2013). The Early Child Development Instrument ( EDI ): An item analysis using Classical Test Theory ( CTT ) on Alberta ’ s data. Early Child Development Mapping (ECMap) Project Community-University Partnership (CUP) Faculty of Extension, University of Alberta.
Lang, J. W. B., & Tay, L. (2021). The Science and Practice of Item Response Theory in Organizations. In Annual Review of Organizational Psychology and Organizational Behavior, 8, 311–338. https://doi.org/10.1146/annurev-orgpsych-012420-061705
Lee, W., & Ansley, T. N. (2007). Assessing IRT Model-Data Fit for mixed format tests. Journal of Applied Psychology, 92(2), 23–50. http://dx.doi.org/10.1026/apl0000636
Lord, F. M. (2012). Applications of item response theory to practical testing problems. In Applications of Item Response Theory To Practical Testing Problems. https://doi.org/10.4324/9780203056615
Magis, D. (2007). Influence, Information and Item Response Theory in Discrete Data Analysis. Retrieved on 12 June, 2022 from http://bictel.ulg.ac.be/ETDdb/collection/available/ULgetd-06122007-100147/.
Mona, N. (2014). Application of Classical Test Theory and Item Response Theory to Analyze Multiple Choice Questions (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/24958
Nataranjan, V. (2009). Basic Principle of Item Response Theory and Application to Practical Testing and Assessement. Merit Trac Services Publishing Ltd.
Ojerinde, D. & Ifewulu, B. C. (2012). Item Unidimensionality Using 2010 Unified Tertiary Matriculation Examination Mathematics Pre-test. A Paper Presented at the 2012 International Conference of IAEA, 5–18.
Pliakos, K., Joo, S. H., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers and Education, 137, 91–103. https://doi.org/10.1016/j.compedu.2019.04.009
Preston, R., Gratani, M., Owens, K., Roche, P., Zimanyi, M., & Malau-Aduli, B. (2020). Exploring the Impact of Assessment on Medical Students’ Learning. Assessment and Evaluation in Higher Education, 45(1), 109–124. https://doi.org/10.1080/02602938.2019.1614145
Privitera, G. J. (2012). Statistics for the behavioral sciences. Sage Publications, Inc. https://psycnet.apa.org/record/2011-21294-000
Reckase, M. D. (2009). Multidimensional Item ResponseTheory. Springer Verlag.
Reise, S. P. (1990). A Comparison of Item- and Person-Fit Methods of Assessing Model-Data Fit in IRT. Applied Psychological Measurement, 14(2), 127–137. https://doi.org/10.1177/014662169001400202
Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, 27–48. https://doi.org/10.1146/Annurev.Clinpsy.032408.153553
Rupp, A. A. (2003). Item Response Modeling With BILOG-MG and MULTILOG for Windows. International Journal of Testing, 3(4), 365–384. https://doi.org/10.1207/s15327574ijt0304_5
Rusch, T., Lowry, P. B., Mair, P., & Treiblmaier, H. (2017). Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory. Information and Management, 54(2), 189–203. https://doi.org/10.1016/j.im.2016.06.005
Sim, S. M., & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Annals of the Academy of Medicine Singapore, 35(2), 67–71. http://www.ams.edu.sg
Song, Y., Kim, H., & Park, S. Y. (2019). An Item Response Theory Analysis of the Korean Version of the CRAFFT Scale for Alcohol Use Among Adolescents in Korea. Asian Nursing Research, 13(4), 249–256. https://doi.org/10.1016/j.anr.2019.09.003
Steyer, R. (2001). Classical (Psychometric) Test Theory. International Encyclopedia of the Social & Behavioral Sciences, 1955–1962. https://doi.org/10.1016/B0-08-043076-7/00721-X
Tang, X., Karabatsos, G., & Chen, H. (2020). Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items. Applied Measurement in Education, 280–292. https://doi.org/10.1080/08957347.2020.1789136
Tay, L., Meade, A. W., & Cao, M. (2015). An Overview and Practical Guide to IRT Measurement Equivalence Analysis. Organizational Research Methods, 18(1), 3–46. https://doi.org/10.1177/1094428114553062
Toksöz, S., & Ertunç, A. (2017). Item Analysis of a Multiple-Choice Exam. Advances in Language and Literary Studies, 8(6), 141. https://doi.org/10.7575/aiac.alls.v.8n.6p.141
Traub, R. E. (2015). Classical test theory in historical perspective. Journal of Educational Measurement: Issues and Practice, 16(4), 8–14. https://doi.org/10.1111/emip.2015.16.issue-4
Tuerlinckx, F., Rijmen, F., Molenberghs, G., Verbeke, G., Briggs, D., Van den Noortgate, W., Meulders, M., & De Boeck, P. (2004). Estimation and software. In Explanatory Item Response Models, 6, 343–373. https://doi.org/10.1007/978-1-4757-3990-9_12
Vyas, R., & Supe, A. (2008). Multiple choice questions: A literature review on the optimal number of options. In National Medical Journal of India, 21(3), 130–133. https://pubmed.ncbi.nlm.nih.gov/19004145/
Wells, C. S., & Wollack, J. A. (2018). An Instructor’s Guide to Understanding Test Reliability. Testing and Evaluation Services, 1–7. https://testing.wisc.edu/Reliability.pdf
Yen, W. M. (1993). Scaling Performance Assessments: Strategies for Managing Local Item Dependence. Journal of Educational Measurement, 30(3), 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Yu, C. H., Popp, S. O., Digangi, S., & Jannasch-Pennell, A. (2007). Assessing unidimensionality: A comparison of Rasch modeling, Parallel analysis, and TETRAD. Practical Assessment, Research and Evaluation, 12(14), 1–19. https://doi.org/https://doi.org/10.7275/q7g0-vt50
Zhang, J. (2012). Calibration of Response Data Using MIRT Models With Simple and Mixed Structures. Applied Psychological Measurement, 36(5), 375–398. https://doi.org/10.1177/0146621612445904
Zhu, X., & Lu, C. (2017). Re-evaluation of the New Ecological Paradigm scale using item response theory. Journal of Environmental Psychology, 54, 79–90. https://doi.org/10.1016/j.jenvp.2017.10.005
Refbacks
- There are currently no refbacks.
e-ISSN: 1694-2116
p-ISSN: 1694-2493