AI Is Powering the Preservation of African Languages

bird story agency.png

Article by: bird story agency

Publication date:

Artificial Intelligence is beginning to have a profound impact on the documentation and analysis of indigenous languages, with Google Translate taking on a diversity of African languages and dialects.

AI is powering the preservation of African languages [Image Source: Hope Mukami]

Conrad Onyango, bird story agency

More African languages are finding their way to Google’s online translation service, as the giant search engine integrates Artificial Intelligence to learn closely related languages.

During 2024, the search engine has made its largest foray into the translation of African languages – and seen the highest number of new languages added to the service – ever.

"We’re using AI to expand the variety of languages we support. Thanks to our PaLM 2 large language model, we’re rolling out 110 new languages to Google Translate, our largest expansion ever,” said Google Translate Senior Software Engineer, Isaac Caswell.

This development marks a pivotal moment that not only offers to popularise indigenous languages but also facilitates the development of a comprehensive local linguistic resource.

Nearly a quarter of all the recently added languages on the platform are African and Africa now has more than 50 languages on the translation service.

The new language additions include Dholuo, spoken by Kenya’s fourth largest ethnic group, the Luo, with more than 4.2 million speakers across several Nilotic ethnic groups found in Egypt, Sudan, South Sudan, Ethiopia, Northern Uganda, eastern DRC, and a part of Tanzania.

Another is Afar, a tonal language spoken by 2.3 million people in Djibouti, Eritrea, and Ethiopia. Google noted that of all the languages in this launch, Afar had the most volunteer community contributions.

Another addition is N'Ko, a standardized form of the West African Manding languages which unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today.

Tamazight (Amazigh), a Berber language spoken across North Africa, is another important new additions. Although there are many dialects, the written form is generally mutually understandable. It is written using both Latin and Tifinagh script, with Google Translate supporting both.

“Google Translate breaks down language barriers to help people connect and better understand the world around them. We’re always applying the latest technologies so more people can access this tool," Caswell explained.

Other African languages added this year include Fon, Kikongo, Ga, Swati, Venda and Wolof.

In 2022, Google added 24 new languages across the world using Zero-Shot Machine Translation, where a machine learning model learns to translate into another language without ever seeing an example.

While Google said languages have an immense amount of variation ranging from regional varieties, dialects, and different spelling standards making it almost impossible to pick a “right” variety, Its approach prioritized the most commonly used varieties of each language. 

“PaLM 2 was a key piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other. As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time,” explained Caswell.

According to Google, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population. Some of these languages are major world languages with over 100 million speakers, while others are spoken by small Indigenous communities. A few of the languages have almost no native speakers but are undergoing active revitalization efforts.

Swahili is the most widely spoken African language with the United Nations placing the number of speakers at over 200 million. In 2021 the UN designated July 7 as World Kiswahili Language Day.

This year’s event is hosted by Kenya under the theme "Kiswahili, Multilingual Education and the Enhancement of Peace".

Organisers of the event, the East Africa Community and the Kenya government said the annual event offers a platform for Kiswahili stakeholders to share knowledge, research-based evidence, best practices, experiences, and worldviews on the role of Kiswahili education in promoting a culture of peace.

The East Africa Community Deputy Secretary General (DSG) in charge of Infrastructure, Productive, Social and Political Sectors,  Andrea Aguer Ariik, emphasized the significance of language diversity and unity in the EAC.

“Kiswahili, as a widely spoken language in East Africa, not only bridges communication gaps but also represents a common identity among the member states of the EAC,” said Ariik in a statement.

And it is not Google playing in this field alone, Young African scholars studying abroad are also rising up to the challenge with similar initiatives leveraging the power of AI.

Ife Adebara, a programmer and scholar at the University of British Columbia's linguistics department, is among those leading initiatives to deploy AI in preserving local languages, with a focus on African languages.

Her project, Afrocentric Natural Language Processing, aims to raise awareness and develop tools and programs that are accessible to speakers of African languages such as Swahili and Zulu.

The project has already birthed two language identification programs online.  SERENGETI,  Massively Multilingual Language Models for Africa and AfroLID, a neural Language ID toolkit that covers 517 African languages and varieties, utilizing a multi-domain web dataset manually curated from across 14 language families and five orthographic systems.

There are over 2,000 living languages in Africa. Nigeria is home to the most, with 522 languages, according to research firm, Statista.

The research firm places Cameroon (with 275 languages) and the Democratic Republic of Congo (with 217) as countries with the second and third most number of languages used and spoken by people on the continent.

bird story agency