Cape Town’s top CIOs find out that African languages still aren’t represented online, but will be soon!
In 200 years, archaeologists will be excavating data digitally to inform them about the things that happened today. But Africa won’t exist. During the first-ever CIO Cape Town Summit, the Mother City’s top tech leaders learned that most of Africa doesn’t currently exist online.
“Basic tools like spell check, grammar check, autocomplete and speech recognition are not available for African languages,” Farayi Kambarami, head of central planning and data at Woolworths Food, told attendees.
As a Shona man, Farayi was annoyed about the lack of accessibility for people on the continent. In 2017, he decided to learn more about computational linguistics to help develop technology for African languages.
At the same time, Jade Abbott co-founded research organisation Masakhane with the same idea in mind. “After attending an AI Summit about strengthening African AI, I was inspired to learn new techniques about machine translation to see if I could build a tool for our South African languages,” she explained.
There have been many roadblocks in their way, however.
No data – no problem
Farayi explained that the problem they’re currently trying to solve is using the very little data that’s available in these languages, with as little computing power as possible to try and generate the same amount of information as in other international languages, like English.
“One of the issues we faced was that the data model couldn’t handle the tokenisation that occurs in some languages,” Jade added.
Farayi explained that, in isiZulu, the verb “to love” is “ukuthanda”, and you can make inflections on it to change the context of the word. “In English, there is only “love”, but if you translate it to isiZulu, you would have six different words describing the same thing. The words multiply because every inflection of that word becomes a different word.”
Realising that they couldn’t do it alone, Masakhane approached more than 1,000 people from different African countries to help bridge the gap in information that’s available in these languages.
“A lot of the data that is open source at the moment is parliament speeches, but they don’t actually translate well, so it can be difficult. We’ve also considered SABC, but it’s not open source, so we can’t use the data. So it’s a long-term process of figuring out what will work, what is commercially viable and what is available,” Jade explained.
“My aspiration is that if we build these technologies, we can enable people to use our languages on the internet, and make a digital footprint for African cultures as well,” said Farayi, who is also a member of Masakhane.
A better business case
With the research they are doing, Masakhane will soon be able to build products for these languages.
Farayi added that it will also give a lot of organisations and businesses access to African markets that they might not previously have been able to tap into, because of language barriers.
The attendees, most of whom grew up in Africa and don’t speak English as their first language, nodded with understanding. “Everyone is trying to understand how to break through into the mass market, but no one understands much about our cultures and the way we do things in an African context,” one attendee said.
“Organisations are employing a lot of European ways of doing things, which is making them irrelevant for a large part of the market,” they explained.