Imagine that you are sitting in your car and trying to enunciate a typical Finnish address, such as “Hiihtomäentie 37, Helsinki”. But no matter how hard you try, your navigator keeps cheerily offering you a completely different address in another part of the country.
This frustrating scenario remains all too familiar to many Finnish motorists, and it is a clear indication that most navigators do not understand Finnish well enough.
Mikko Kurimo, Professor of Speech and Language Processing at Aalto University, also concurs that while speech recognition technology can make life easier, its utilization rate in Finland lags behind most of the globe.
“In major language regions, most Google searches are made using speech, since it’s the fastest way to search. And while all these different speech assistants, such as Apple’s Siri, Amazon’s Alexa, or Google’s Assistant, aren’t in every home, there are still a lot of them. They can perform simple tasks that users can conveniently speak out loud.”
In major language regions, most Google searches are made using speech."
For example, English and German speakers can use speech to control the lights and locks in their smart homes. Amazon is also rapidly developing its voice-controlled shopping experience, and people in Dubai can speak to Kone elevators and call them to their floor in advance.
Kurimo’s research group at Aalto focuses on automatic speech recognition. Previously, Kurimo worked at a Swiss AI research institute and as a visiting researcher at Stanford’s SRI and Berkeley’s ICSI research units.
Why, then, does the development of Finnish speech recognition applications lag behind the rest of the world? The reasons, according to Kurimo, are related to Finland’s small market share and its curious language.
“The main reason is economic: we don’t have an ecosystem for this industry, there isn’t enough demand, and global tech giants aren’t investing here.”
Splitting millions of words into smaller units
Not only is Finnish spoken by a relatively small group of people, but its structure also differs greatly from English, which serves as the technical basis for most speech recognition solutions.
“French and Swedish, for example, are quite similar in structure to English. It’s easy to generate a list of all the words in these languages, and artificial intelligence can usually make do with around 60,000 to 100,000 words. Finnish, on the other hand, features a great number of inflections and compound words, with millions of different wordings.”
Aalto researchers first attempted to solve this problem by splitting Finnish words into smaller units and using these as the basis for machine learning models.
However, they soon faced another problem: the complex structure of Finnish necessitated far more speech materials for the development of speech recognition, especially compared to English. And, due to an acute lack of resources, these materials were few and far between.
Aalto researchers first attempted to solve this problem by splitting Finnish words into smaller units and using these as the basis for machine learning models."
The researchers came up with a creative solution. In collaboration with Yle, Finland’s national public broadcasting company, they launched the Lahjoita Puhetta campaign to encourage Finns to donate samples of ordinary speech.
“Our dream had always been to get more samples of everyday, casual speech. We thought about how we could get people excited and involved. We emphasized that with their help, we would be able to develop better Finnish-language AI applications.”
Voice commands can significantly ease the lives of people who find it difficult to use technology through text-based means. These include older people, illiterate children, and immigrants who do not yet have a good grasp of Finnish.
With Yle’s support, the campaign succeeded in collecting a great deal of material, with tens of thousands of speakers donating thousands of hours of speech. The second phase of the project, which focuses on the collection of Swedish-language speech from Swedish-speaking Finns, is currently underway.
A treasure trove of donated speech
“We may now have more speakers in relation to the size of a language than anywhere else in the world. For example, a similar number of speakers provided the English-language materials in use today, but English has a whole lot more speakers than Finnish!”
Kurimo’s group hopes that their success will spark the curiosity of the international scientific community, as this trove of Finnish-language data could be used to test other methods as well.
We may now have more speakers in relation to the size of a language than anywhere else in the world."
“I think that people would be interested in replicating this type of campaign internationally with other small language groups, meaning those with fewer than 10 million speakers. But before that can be done, we need to know how much speech and transcribed materials are truly needed.”
Currently, the research group is comparing its manually transcribed materials with automatic transcriptions generated by speech recognition software. Manually transcribing 1,600 hours of speech was no small feat.
“We also conducted a test where we outsourced a small part of the transcription process to four different companies.
We then counted the number of errors in their transcriptions and compared them to our automatic transcription. It turned out that people don’t always agree with each other and can come up with a lot of different solutions. Our automated solution was fairly close in terms of its error rate.”
Providing machine learning models for wider use
The purpose of the donated speech is to develop new AI applications and improve those that are currently used by Finns. That is why the gathered materials are freely available, subject to certain terms and conditions. In addition to the raw data, Aalto offers the machine learning models it has developed on the basis of the donated speech.
“This is the way of the future. It wouldn’t make sense to teach different AIs the same things over and over again. This will also help make speech recognition more profitable for small domestic companies.”
Many speech recognition applications do not focus on speech commands but on spoken materials, such as the solutions employed by video services like Youtube or Yle’s Elävä Arkisto. Browsing their archives can be difficult if the speech in their videos is not available in a searchable text format.
Kurimo estimates that speech recognition will not eliminate reading or writing but provide flexibility in everyday life.
Speech recognition is also set to become increasingly prevalent in business life."
“If you don’t have time to finish your morning newspaper, you can listen to the rest as a podcast on the bus. And it’s easier to listen to content or dictate messages when you’re driving. And we’re constantly seeing more and more new applications crop up.”
Speech recognition is also set to become increasingly prevalent in business life. For example, phone centers can use speech recognition AIs to answer questions or forward callers, and video contents can be made searchable when their speech content is transcribed.
According to Jonni Junkkari, who is responsible for Aalto University Executive Education’s (Aalto EE) digital training programs, automatic speech recognition offers a lot of business opportunities for Finnish companies. However, this change in how customers operate and purchases are made will require new expertise.
“It’s important for companies to understand new technologies such as voice recognition and how they can be used, but they must also be able to innovate and understand their customers better. Only then can they create a permanent competitive advantage and customer value,” Junkkari emphasizes.
Aalto EE offers a range of programs on data, analytics, and AI that focus on various technical topics, including speech recognition. In these programs, participants also focus on identifying business opportunities and innovating customer value-generating products and services that are enabled by these technologies. New Aalto PRO Data, Analytics, and AI for Professionals program focus on understanding the business opportunities and challenges resulting from AI and how to utilize these services in technical implementations. Read more