Item Analysis of a Reading Test in Sri Lankan Context Using Classical Test Theory
Abstract
This paper is based on a research study on a reading test that evaluates the different cognitive processes prescribed by Khalifa and Weir (2009). The 25-item test was designed based on a test specification targeted at the B2 level of the Common European Framework of Reference for Languages (CEFR). The responses of 50 students were used to check the validity and reliability of the test. The validity of the test was ascertained through item analysis involving item difficulty indices, item discrimination indices, and distractor analysis. Each item was studied to provide detailed information leading to the improvement of test construction. To achieve test reliability, the Kuder-Richardson Formula 20 (KR-20) was applied. The results were achieved by simply using Microsoft Excel. Findings revealed that the test met the standards for content validity, indicating acceptable item difficulty indices, with 17 items at the moderate level between the ranges of 0.30 and 0.79. Except for three items, all others functioned well to differentiate between high- and low-ability students, and only five items had malfunction distractors. Meanwhile, the reliability value of the test scores was 0.82, which is deemed a good value, proving the consistency of the test results. It signifies that more than half, that is 88%, of the test items were well functioning and that the test proved to be valid and reliable. The present research can contribute to students, teachers, and test-makers having an insightful understanding of item analysis and test development.
https://doi.org/10.26803/ijlter.21.3.3
Keywords
Full Text:
PDFReferences
Alderson, J. C. (2000). Assessing reading. Cambridge Assessment English.
Bax, S., & Chan, S. H. C. (2016). Researching the cognitive validity of GEPT high intermediate and advanced reading: An eye-tracking and stimulated recall study. LTTC-GEPT Research Reports, 7, 1-47. www.lttc.ntu.edu.tw/lttc-gept-grants/RReport/RG07.pdf
Bichi, A. A., & Embong, R. (2018). Evaluating the quality of Islamic civilization and Asian civilizations examination questions. Asian People Journal (APJ), 1(1), 93-109. www.uniszajournals.com/apj
Brown, H. D., & Abeywickrama, P. (2010). Language assessment: Principles and classroom practices (Vol. 10). Pearson Education.
Carlsen, C. H. (2018). The adequacy of the B2 level as university entrance requirement. Language Assessment Quarterly, 15(1), 75-89. https://doi.org/10.1080/15434303.2017.1405962
Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. https://rm.coe.int/1680459f97
Creswell, J. W. (2012). Educational research: Planning, conducting and evaluating quantitative and qualitative research (4th ed.). Pearson.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Eric.
Deygers, B., Zeidler, B., Vilcu, D., & Carlsen, C. H. (2018). One framework to unite them all? Use of the CEFR in European university entrance policies. Language Assessment Quarterly, 15(1), 3-15. https://eric.ed.gov/?id=EJ1171980
Dundar, H., Millot, B., Riboud, M., Shojo, M., Goyal, S., & Raju, D. (2017). Sri Lanka education sector assessment: Achievements, challenges, and policy options. World Bank Group. https://doi.org/10.1596/978-1-4648-1052-7
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
Eleje, L. I., Onah, F. E., & Abanobi, C. C. (2018). Comparative study of classical test theory and item response theory using diagnostic quantitative economics skill test item analysis results. European Journal of Educational and Social Sciences, 3(1), 57 75. https://www.researchgate.net/publication/343557487
Fleckenstein, J., Leucht, M., & Köller, O. (2018). Teachers’ judgement accuracy concerning CEFR levels of prospective university students. Language Assessment Quarterly, 15(1), 90-101. https://doi.org/10.1080/15434303.2017.1421956
Fulcher, G., & Davidson, F. (2007). Language testing and assessment. Routledge.
Halek, M., Holle, D., & Bartholomeyczik, S. (2017). Development and evaluation of the content validity, practicability and feasibility of the Innovative Dementia-Oriented Assessment System for Challenging Behaviour in Residents with Dementia. BMC Health Services Research, 17(1), 554. https://doi.org/10.1186/s12913-017-2469-8
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x
Kastner, M., & Stangl, B. (2011). Multiple choice and constructed response tests: Do test format and scoring matter? Procedia – Social and Behavioral Sciences, 12, 263-273. https://doi.org/10.1016/j.sbspro.2011.02.035
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17-24. https://doi.org/10.1037/h0057123
Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second language reading. Cambridge University Press.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151-160. https://doi.org/10.1007/BF02288391
Linguapress. (2020). A comparison of different readability scales. https://linguapress.com/teachers/flesch-kincaid.htm
Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The International Journal of Educational and Psychological Assessment, 1(1), 1-11. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1426043
Manalu, D. (2019). An analysis of students reading final examination by using item analysis program on eleventh grade of SMA Negeri 8 Medan. Journal of English Teaching & Applied Linguistics, 1(1), 13-19. http://repository.uhn.ac.id/handle/123456789/2796
McNamara, T. F. (1996). Measuring second language performance. Longman Publishing Group.
Messick, S. (1989). Validity. In R. L. Linn (ed.), Educational measurement (3rd ed.; pp. 13 104). MacMillan.
Natova, I. (2019). Estimating CEFR reading comprehension text complexity. The Language Learning Journal, 49(6), 699-710. https://doi.org/https://doi.org/10.1080/09571736.2019.1665088
Powell, J. L., & Gillespie, C. (1990). Assessment: All tests are not created equally. https://files.eric.ed.gov/fulltext/ED328908.pdf
Pratiwi, R., Antini, S., & Walid, A. (2021). Analysis of item difficulty index for midterm examinations in junior high schools 5 Bengkulu City. Asian Journal of Science Education, 3(1), 12-18. http://www.jurnal.unsyiah.ac.id/AJSE/article/view/18895
Samad, A. (2004). Essentials of language testing for Malaysian teachers. UPM Press.
Shanmugam, S. K. S., Wong, V., & Rajoo, M. (2020). Examining the quality of English test items using psychometric and linguistic characteristics among grade six pupils. Malaysian Journal of Learning and Instruction, 17(2), 63-101. https://files.eric.ed.gov/fulltext/EJ1272266.pdf
Tamil, A. M. (2015). Calculating difficulty, discrimination and reliability index/standard error of measurement. PPUKM. https://ppukmdotorg.wordpress.com/2015/04/02/calculating-omr-indexes/
Turner, R. C., & Carlson, L. (2003). Indexes of item-objective congruence for multidimensional items. International Journal of Testing, 3(2), 163-171. https://doi.org/10.1207/s15327574ijt0302_5
Urquhart, A. H., & Weir, C. J. (1998). Reading in a second language: Process, product and practice. Longman.
Waluyo, B. (2019). Thai first-year university students’ English proficiency on CEFR levels: A case study of Walailak University, Thailand. The New English Teacher, 13(2), 51 71. http://www.assumptionjournal.au.edu/index.php/newEnglishTeacher/article/view/3651
Wright, B. D., & Stone, M. H. (1979). Best test design. Mesa Press.
Yusup, R. B. (2012). Item evaluation of the reading test of the Malaysian University English Test (MUET) (Master’s thesis). The University of Melbourne. http://hdl.handle.net/11343/37608
Zimmerman, D. W. (1972). Test reliability and the Kuder-Richardson formulas: Derivation from probability theory. Educational and Psychological Measurement, 32(4), 939 954. https://doi.org/10.1177/001316447203200408
Zubairi, A. M., & Kassim, N. L. A. (2006). Classical and Rasch analyses of dichotomously scored reading comprehension test items. Malaysian Journal of ELT Research, 2(1), 1 20. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.535.2955&rep=rep1&type=pdf
Refbacks
- There are currently no refbacks.
e-ISSN: 1694-2116
p-ISSN: 1694-2493