Currently Available GenAI-Powered Large Language Models and Low-Resource Languages: Any Offerings? Wait Until You See

Chaka Chaka

Abstract


A lot of hype has accompanied the increasing number of generative artificial intelligence-powered large language models (LLMs). Similarly, much has been written about what currently available LLMs can and cannot do, including their benefits and risks, especially in higher education. However, few use cases have investigated the performance and generative capabilities of LLMs in low-resource languages. With this in mind, one of the purposes of the current study was to explore the extent to which seven, currently available, free-to-use versions of LLMs (ChatGPT, Claude, Copilot, Gemini, GroqChat, Perplexity, and YouChat) perform in five low-resource languages (isiZulu, Sesotho, Yoruba, M?ori, and Mi’kmaq) in their generative multilingual capabilities. Employing a common input prompt, in which the only change was to insert the name of a given low-resource language and English in each case, this study collected its datasets by inputting this common prompt into the seven LLMs. Three of the findings of this study are noteworthy. First, the seven LLMs displayed a significant lack of generative multilingual capabilities in the five low-resource languages. Second, they hallucinated and produced nonsensical, meaningless, and irrelevant responses in their low-resource language outputs. Third, their English responses were far better in quality, relevance, depth, detail, and nuance than their low-resource language only and English responses for the five low-resource languages. The paper ends by offering the implications and making the conclusions of the study in terms of LLMs’ generative capabilities in low-resource languages.

https://doi.org/10.26803/ijlter.23.12.9


Keywords


generative multilingual capabilities; hallucinations; large language models; low-resource languages; nonsensical, meaningless and irrelevant responsese language models, low-resource languages nonsensical, meaningless, and irrelevant responses

Full Text:

PDF

References


AFP. (2024). ChatGPT faces Austria complaint over ‘uncorrectable errors’. https://dxjournal.co/2024/04/chatgpt-faces-austria-complaint-over-uncorrectable-errors

Aharoni, R., Narayan, S., Maynez, J., Herzig, J., Clark, E., & Lapata, M. (2024). Multilingual summarization with factual consistency evaluation. https://arxiv.org/pdf/2212.10622.pdf

AIContentfy Team. (2023). Evaluating the effectiveness of AI detectors: Case studies and metrics. https://aicontentfy.com/en/blog/evaluating-of-ai-detectors-case-studies-and-metrics

AI Phrase Finder. (2024). The 100 most common AI words. https://aiphrasefinder.com/common-ai-words/

Akula, B., Andrews, P., Ayan, N. F., Barrault, L., Bhosale, S., Costa-jussa, M. R., James Cross, J. … Youngblood, A. (2024). 200 languages within a single AI model: A breakthrough in high-quality machine translation. https://ai.meta.com/blog/nllb-200-high-quality-machine-translation/

Author. (2022).

Author. (2023).

Author. (2024a).

Author. (2024b).

Author. (2024c).

Author & Author. (2023).

Captain Words. (2024). Testing AI detection tools – Our methodology. https://captainwords.com/ai-detection-tools-test-methodology/

Cave, S., & Dihal, K. (20202). The Whiteness of AI. Philosophy & Technology, 33, 685–703. https://doi.org/10.1007/s13347-020-00415-6

Dale, D., Voita, E., Lam, J., Hansanti, P., Ropers, C., Kalbassi, E., Gao, C., Barrault, L., & Costa-jussà, M. R. (2023). HalOmi: A manually annotated benchmark for multilingual hallucination and omission detection in machine translation. https://arxiv.org/pdf/2305.11746.pdf

Delve, H. L., & Limpaecher, A. (2022,). Qualitative content analysis: Manifest content analysis vs. latent content analysis. https://delvetool.com/blog/manifest-content-analysis-latent-content-analysis

Gray, A. (2024). ChatGPT “contamination”: Estimating the prevalence of LLMs in the scholarly literature. https://arxiv.org/pdf/2403.16887

Hadi et al. (2023). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. https://d197for5662m48.cloudfront.net/documents/publicationstatus/181139/preprint_pdf/edf41a1f2a93aadb235a3c3aff2dcf08.pdf

Heugh, K. A. (2021). Southern multilingualisms, translanguaging and transknowledging in inclusive and sustainable education. In P. Harding-Esch & H. Coleman (Eds.), Language and the sustainable development goals (pp. 37-47). British Council.

Huang, H., Tang, T., Zhang, D., Zhao, W. X., Song, T. Xia, Y. & Wel, F. (2023). Not all languages are created equal in LLMs: Improving multilingual capability by cross-lingual-thought prompting. https://arxiv.org/abs/2305.07004

IBM. (2024). What are AI hallucinations? https://www.ibm.com/topics/ai-hallucinations

Kalai, A. T, & Vempala, S. S. (2024, revised version). Calibrated language models must hallucinate. https://arxiv.org/abs/2311.14648

Kassner, M. (2013). Search engine bias: What search results are telling you (and what they’re not). https://www.techrepublic.com/article/search-engine-bias-what-search-results-are-telling-you-and-what-theyre-not/

Kleinheksel, A. J., Rockich-Winston, N., Tawfik, H., & Wyatt, T. R. (2020). Demystifying content analysis. American Journal of Pharmaceutical Education, 84(1), 127-137.

Lankford, S., Afli, H., & Way, A. (2023). adaptMLLM: Fine?tuning multilingual language models on low?resource languages with integrated LLM playgrounds. Information, 14, 638. https://doi.org/10.3390/info14120638

Lee, N. T., Resnick, P., & Barton, G. (2019). Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms. https://www.brookings.edu/research/algorithmicbias-detection-and-mitigation-best-practices-andpolicies-to-reduce-consumer-harms/

Leffer, L. (2024). AI chatbots will never stop hallucinating. https://www.scientificamerican.com/article/chatbot-hallucinations-inevitable/

Lin, C., Gao, Y., Ta, N., Li, K., & Fu, H. (2023). Trapped in the search box: An examination of algorithmic bias in search engine autocomplete predictions. Telematics and Informatics, 85, 102068.

Lorandi, M., & Belz, A. (2023). Data-to-text generation for severely under-resourced languages with GPT-3.5: A bit of help needed from Google Translate. https://aclanthology.org/2023.mmnlg-1.9.pdf

Navigli, R., Conia, S., & Ross, B. (2023). Biases in large language models: Origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15(2), 1-21. https://doi.org/10.1145/3597307

Nguyen, X. P., Aljunied, S. M., Joty, S., & Bing, L. (2023). Democratizing LLMs for low-resource languages by leveraging their English dominant abilities with linguistically-diverse prompts. https://arxiv.org/abs/2306.11372

Perkins, M. (2023). Academic integrity considerations of AI large language models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching & Learning Practice, 20(2), 07. http://dx.doi.org/10.53761/1.20.02.07

Popenici, S. (2023). The critique of AI as a foundation for judicious use in higher education. Journal of Applied Learning and Teaching, 6(2), 378-384. https://doi.org/10.37074/jalt.2023.6.2.4

Qin, L., Chen, Q., Zhou, Y., Chen, Z., Li, Y., Liao, L., Li, M., Che, W., & Yu, P. S. (2024). Multilingual large language model: A survey of resources, taxonomy and frontiers. https://arxiv.org/abs/2404.04925

Guerreiro, N. M., Alves, D. M., Waldendorf, J., Haddow, B., Birch, A., Colombo, P., Martins, A. F. T. (2023). Hallucinations in large multilingual translation models. https://arxiv.org/abs/2303.16104

Rudolph, J., Tan, S., & Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?. Journal of Applied Learning and Teaching, 6(1), 342-363. https://doi.org/10.37074/jalt.2023.6.1.9

Rudolph, J., Ismail, M. F., & Popenici, S. 2024). Higher education’s generative artificial intelligence paradox: The meaning of chatbot mania. Journal of University Teaching and Learning Practice, 21(6). https://doi.org/10.53761/pzd17z29

Snyder, A. (2023). AI’s language gap. https:// studies www.axios.com/2023/09/08/ai-language-gap-chatgpt

Tavani, H., & Zimmer, M. (2020). Search engines and ethics. https://plato.stanford.edu/entries/ethics-search/#SeaEngBiaProOpa

Vaismoradi, M., Turunen, H., & Bondas, T. (2013). Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study. Nursing & Health Sciences, 15(3), 398-405. https://doi.org/10.1111/nhs.12048

Vashee, K. (2023). Making generative AI effectively multilingual at scale. https://blog.modernmt.com/making-generative-ai-multilingual-at-scale/

Wu, J., Yang, S., Zhan, R., Yuan, Y., Wong, D. F., & Chao, L. S. (2023). A survey on LLM-generated text detection: Necessity, methods, and future directions. https://arxiv.org/pdf/2310.14724.pdf


Refbacks

  • There are currently no refbacks.


e-ISSN: 1694-2116

p-ISSN: 1694-2493