A Comparative Analysis of the Rating of College Students’ Essays by ChatGPT versus Human Raters

Potchong M. Jackaria, Bonjovi H. Hajan, Al-Rashiff H. Mastul

Abstract


The use of generative artificial intelligence (AI) in education has engendered mixed reactions due to its ability to generate human-like responses to questions. For education to benefit from this modern technology, there is a need to determine how such capability can be used to improve teaching and learning. Hence, using a comparative?descriptive research design, this study aimed to perform a comparative analysis between Chat Generative Pre-Trained Transformer (ChatGPT) version 3.5 and human raters in scoring students’ essays. Twenty essays were used of college students in a professional education course at the Mindanao State University – Tawi-Tawi College of Technology and Oceanography, a public university in southern Philippines. The essays were rated independently by three human raters using a scoring rubric from Carrol and West (1989) as adapted by Tuyen et al. (2019). For the AI ratings, the essays were encoded and inputted into ChatGPT 3.5 using prompts and the rubric. The responses were then screenshotted and recorded along with the human ratings for statistical analysis. Using the intraclass correlation coefficient (ICC), results show that among the human raters, the consistency was good, indicating the reliability of the rubric, while a moderate consistency was found in the ChatGPT 3.5 ratings. Comparison of the human and ChatGPT 3.5 ratings show poor consistency, implying the that the ratings of human raters and ChatGPT 3.5 were not linearly related. The finding implies that teachers should be cautious when using ChatGPT in rating students’ written works, suggesting further that using ChatGPT 3.5, in its current version, still needs human assistance to ensure the accuracy of its generated information. Rating of other types of student works using ChatGPT 3.5 or other generative AI tools may be investigated in future research.

https://doi.org/10.26803/ijlter.23.2.23


Keywords


ChatGPT; essay writing; generative AI; human raters; inter rater

Full Text:

PDF

References


Adiguzel, T., Kaya, M. H., & Cansu, F. K. (2023). Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemporary Educational Technology, 15(3), Article 429. https://doi.org/10.30935/cedtech/13152

Alrishan, A. M. (2023) Determinants of intention to use ChatGPT for professional development among Omani EFL pre-service teachers. International Journal of Learning, Teaching and Educational Research, 22(12), 187–209. https://doi.org/10.26803/ijlter.22.12.10

Biswas, S. (2023, June 4). Role of ChatGPT in education. SSRN. https://ssrn.com/abstract=4369981

Bitzenbauer, P. (2023). ChatGPT in physics education: A pilot study on easy-to-implement activities. Contemporary Educational Technology, 15(3), ep430. https://doi.org/10.30935/cedtech/13176

Buchholz, K. (2023, July 7). Threads shoots past one million user mark at lightning speed. Statistica. https://www.statista.com/chart/29174/time-to-one-million-users/

Dergaa, I., Chamari, K., Zmijewski, P., & Saad, H. B. (2023). From human writing to artificial intelligence generated text: Examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40(2), 615–622. https://doi.org/10.5114/biolsport.2023.125623

Dikli, S., & Bleyle, S. (2014) Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessment Writing, 22(1), 1?17. https://doi.org/10.1016/j.asw.2014.03.006

Ferrouhi, E. M. (2023). Evaluating the accuracy of ChatGPT in scientific writing. Research Square [preprint]. https://doi.org/10.21203/rs.3.rs-2899056/v1

Firat, M. (2023, January 12). How ChatGPT can transform autodidactic experiences and open education? https://doi.org/10.31219/osf.io/9ge8m

Fuchs, K. (2023). Exploring the opportunities and challenges of NLP models in higher education: Is Chat GPT a blessing or a curse? Frontiers in Education, 8, Article 1166682. https://doi.org/10.3389/feduc.2023.1166682

Gill, S. S., Xu, M., Patros, P., Wu, H., Kaur, R., Kaur, K., Fuller, S., Singh, M., Arora, P., Parlikad, A. K., Stankovski, V., Abraham, A., Ghosh, S. K., Lutfiyya, H., Kanhere, S. S., Bahsoon, R., Rana, O., Dustdar, S., Sakellariou, R., Uhlig, S., & Buyya, R. (2024). Transformative effects of ChatGPT on modern education: Emerging era of AI chatbots. Internet of Things and Cyber–Physical Systems, 4, 19?23. https://doi.org/10.1016/j.iotcps.2023.06.002

Harunasari, S. Y. (2023). Examining the effectiveness of AI-integrated approach in EFL writing: A case of ChatGPT. International Journal of Progressive Sciences and Technology (IJPSAT), 39(2), 357–368. https://ijpsat.org/index.php/ijpsat/article/download/5516/3447

Imran, M., & Almusharraf, N. (2023). Analyzing the role of ChatGPT as a writing assistant at higher education level: A systematic review of the literature. Contemporary Educational Technology, 15(4), ep464. https://doi.org/10.30935/cedtech/13605

Iqbal, N., Ahmad, H., & Azhar, K. (2023). Exploring teachers’ attitudes towards using ChatGPT. Global Journal for Management and Administrative Sciences, 3(4), 97–111. https://doi.org/10.46568/gjmas.v3i4.163

Kasneci, E., Sebler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Gunnemann, S., Hullermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., ... Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, Article 102274. https://doi.org/10.1016/j.lindif.2023.102274

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

Kopp, W., & Thomsen, B. (2023, May 1). How AI can accelerate student’s holistic development and make teaching more fulfilling. World Economic Forum. https://www.weforum.org/agenda/2023/05/ai-accelerate-students-holistic-development-teaching-fulfilling/

Laschinger, H. K. (1992). Intraclass correlations as estimates of interrater reliability in nursing research. Western Journal of Nursing Research, 14(2), 246–251. https://journals.sagepub.com/doi/pdf/10.1177/019394599201400213

Latif, E., & Zhai, X. (2023) Fine-tuning ChatGPT for automatic scoring. Computers and Education: Artificial Intelligence, 6, Article 100210. https://doi.org/10.1016/j.caeai.2024.100210

Li, L., Ma, Z., Fan, L., Lee, S., Yu, H., & Hemphill, L. (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media. Cornell University. https://doi.org/10.48550/arXiv.2305.02201

Liu, M., Xu, W., Ran, Q., & Li, Y. (2015). Using natural language processing technology to analyze teachers’ written feedback on Chinese students’ English essays. International Journal of Learning, Teaching and Educational Research, 11(1), 1–11. https://ijlter.net/index.php/ijlter/article/view/1087

Lingard, L. (2023). Writing with ChatGPT: An illustration of its capacity, limitations & implications for academic writers. Perspective on Medical Education, 12(1), 261–270. https://doi.org/10.5334%2Fpe.1072

Livberber, T. (2023). Toward non-human-centered design: Designing an academic article with ChatGPT. Profesional de la Información, 32(5), 1–19. https://doi.org/10.3145/epi.2023.sep.12

Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), Article 410. https://doi.org/10.3390/educsci1304041

Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial intelligence?written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74(5), 570–581. https://doi.org/10.1002/asi.24750

McNamara, D., Crossley, S., Roscoe, R., Allen, L., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–39. https://researchlanglit.gsu.edu/files/2016/08/272.pdf

Mondal, H., & Mondal, S. (2023). ChatGPT in academic writing: Maximizing its benefits and minimizing the risks. Indian Journal of Ophthalmology, 71(12), 3600–3606. https://doi.org/10.4103/IJO.IJO_718_23

Mondal, H., Marndi, G., Behera, J. K., & Mondal, S. (2023). ChatGPT for teachers: Practical examples for utilizing artificial intelligence for educational purposes. Indian Journal of Vascular and Endovascular Surgery, 10, 200–205. https://doi.org/10.4103/ijves.ijves_37_23

Morozov, E. (2023, July 5). The true threat of artificial intelligence. International New York Times. https://link.gale.com/apps/doc/A755774581/AONE?u=anon~c95b5477&sid=googleScholar&xid=52964d0e

Parker, J., Becker, K., & Corroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721–727. https://doi.org/10.3928/01484834-20231006-02

Paz, M. A., Turner, K., & Racila, E. (2023). Evaluating the performance of ChatGPT in writing autopsy clinicopathological correlations. American Journal of Clinical Pathology, 160(1), S125. https://doi.org/10.1093/ajcp/aqad150.272

Rahman, M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13(9), Article 5783. https://doi.org/10.3390/app13095783

Sabzalieva, E., & Valentini, A. (2023). ChatGPT and artificial intelligence in higher education: Quick start guide. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000385146

Sharma, S., & Yadav, R. (2023). Chat GPT: A technological remedy or challenge for education system. Global Journal of Enterprise Information System, 14(4), 46–51. https://www.gjeis.com/index.php/GJEIS/article/view/698

Sharples, M. (2022). Automated essay writing: An AIED opinion. International Journal of Artificial Intelligence in Education, 32, 1119–1126. https://doi.org/10.1007/s40593-022-00300-7

Siedlecki, S. (2020) Understanding descriptive research designs and methods. Clinical Nurse Specialist, 34(1), 8–12. https://doi.org/10.1097/NUR.0000000000000493

Trust, T., Whalen, J., & Mouza, C. (2023). Editorial: ChatGPT: Challenges, opportunities, and implications for teacher education. Contemporary Issues in Technology and Teacher Education, 23(1), 1–23. https://www.learntechlib.org/primary/p/222408/

Tuyen, T., Osman, S. B., Ahmad, N. S. B., & Dan, T. C. (2019). Developing and validating scoring rubrics for the assessment of research papers writing ability of EFL/ESL undergraduate students: The effects of research papers writing intervention program using process genre model of research paper writing. https://www.semanticscholar.org/paper/Developing-and-Validating-Scoring-Rubrics-for-the-Tuyen-Osman/e86657da7ec761a7d403b4f8d85cc8bf19f99923

Vargas-Murillo, A. R., Pari-Bedoya, I., & Guevara-Soto, F. (2023). Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of ChatGPT usage in higher education. International Journal of Learning, Teaching and Education Research, 22(7), 122–135. https://doi.org/10.26803/ijlter.22.7.7

Waltzer, T., Cox, R., & Heyman, G. (2023). Testing the ability of teachers and students to differentiate between essays generated by ChatGPT and high school students. Human Behavior and Emerging Technologies, Article 1923981. https://doi.org/10.1155/2023/1923981


Refbacks

  • There are currently no refbacks.


e-ISSN: 1694-2116

p-ISSN: 1694-2493