Although Africa is home to a huge proportion of the world's languages – well over a quarter according to some estimates - many are missing when it comes to the development of artificial intelligence (AI).

This is both an issue of a lack of investment and readily available data.

Most AI tools, such as ChatGPT, used today are trained on English as well as other European and Chinese languages.

These have vast quantities of online text to draw from.

But as many African languages are mostly spoken rather than written down, there is a lack of text to train AI on to make it useful for speakers of those languages.

For millions across the continent this means being left out.

Researchers who have been trying to address this issue have recently released what is thought to be the largest known dataset of African languages.

We think in our own languages, dream in them and interpret the world through them. If technology doesn't reflect that, a whole group risks being left behind, said the University of Pretoria's Prof Vukosi Marivate.

The African Next Voices project brought together linguists and computer scientists to create AI-ready datasets in 18 African languages.

In two years, the team recorded 9,000 hours of speech across Kenya, Nigeria and South Africa, capturing everyday scenarios in farming, health and education.

Farmer Kelebogile Mosime utilizes an AI app called AI-Farmer, which recognizes multiple South African languages, including Setswana, to assist her in solving various farming challenges.

Lelapa AI, a young South African company, is focusing on developing AI tools in African languages for banks and telecoms firms, insisting that language should not be a barrier to essential services.

Language is access to imagination, it's not just words – it's history, culture, knowledge. If indigenous languages aren't included, we lose more than data; we lose ways of seeing and understanding the world, noted Prof Marivate.