Endangered languages AI tools developed by UH researchers

Endangered languages AI tools developed by UH researchers


Reading time: < 1 minute

hand typing at laptop and icons across photo

University of Hawaiʻi at Mānoa researchers have made a significant advance in studying how artificial intelligence (AI) understands endangered languages. This research could help communities document and maintain their languages, support language learning and make technology more accessible to speakers of minority languages.

The paper by Kaiying Lin, a PhD graduate in linguistics from UH Mānoa, and Department of Information and Computer Sciences Assistant Professor Haopeng Zhang, introduces the first benchmark for evaluating large language models (AI systems that process and generate text) on low-resource Austronesian languages. The study focuses on three Formosan (Indigenous peoples and languages of Taiwan) languages spoken in Taiwan—Atayal, Amis and Paiwan—that are at risk of disappearing.

Using a new benchmark called FORMOSANBENCH, Lin and Zhang tested AI systems on tasks such as machine translation, automatic speech recognition and text summarization. The findings revealed a large gap between AI performance in widely spoken languages such as English, and these smaller, endangered languages. Even when AI models were given examples or fine-tuned with extra data, they struggled to perform well.

“These results show that current AI systems are not yet capable of supporting low-resource languages,” Lin said.

Zhang added, “By highlighting these gaps, we hope to guide future development toward more inclusive technology that can help preserve endangered languages.”

The research team has made all datasets and code publicly available to encourage further work in this area. The preprint of the study is available online, and the study has been accepted into the 2025 Conference on Empirical Methods in Natural Language Processing in Suzhou, China, an internationally recognized premier AI conference.

The Department of Information and Computer Sciences is housed in UH Mānoa’s College of Natural Sciences, and the Department of Linguistics is housed in UH Mānoa’s College of Arts, Languages & Letters.



Source link