The Learning Curve, Part 2: How to Build an AI for Diverse Dialects

5/19/2024 2:37:15 AM

(MENAFN- BCW Global) Galaxy AI now supports 16 languages, helping more people to lower language barriers with real-time and on-device translation. Samsung opened the door to a new era of mobile AI, so we are visiting Samsung Research centers all over the world to learn how Galaxy AI came to life and what it took to overcome the challenges of AI development. While part one of the series examines the task of determining what data is needed, this installment looks at the complex task of accounting for dialects.

Teaching a language to an AI model is a complex process, but what if it isn’t a singular language, but a collection of diverse dialects? That was the challenge faced by the team at Samsung R&D Institute Jordan (SRJO). While Arabic was added as a language option for Galaxy AI features such as Live Translate, the team had to cater to the various Arabic dialects that span the Middle East and North Africa, with each varying in pronunciation, vocabulary and grammar.

Arabic is one of the top six most widely spoken languages around the world, used daily by more than 400 million people.1 The language is categorized into two forms: Fus’ha (Modern Standard Arabic) and Ammiya (the dialects of Arabic). Fus’ha is typically used in public and official events, as well as in news broadcasts, while Ammiya is more commonly used for day-to-day conversations. Over 20 countries use Arabic, and there are currently around 30 dialects in the region.

Unwritten Rules
Recognizing the variation presented by these dialects, the team at SRJO employed a range of techniques to discern and process the unique linguistic features inherent in each. This approach was crucial in ensuring that Galaxy AI could understand and respond in a way that accurately reflects the regional nuances.

“Unlike other languages, the pronunciation of the object in Arabic varies depending on the subject and verb in the sentence,” says Mohammad Hamdan, project leader of the Arabic language development team. “Our goal is to develop a model that understands all these dialects and can answer in standard Arabic.”

TTS is the component of Galaxy AI’s Live Translate feature that lets users interact with speakers of different languages by translating spoken words into written text, and then vocally reproducing them. The TTS team faced a unique challenge, caused by the quirk of working with Arabic.

Arabic uses diacritics, which are guides for the pronunciation of words in some contexts, such as religious texts, poetry and books for language learners. Diacritics are widely understood by native speakers but absent in everyday writing. This makes it difficult for a machine to convert raw text into phonemes, the basic units of sound that are the building blocks of speech.

“There is a shortage of high-quality and reliable datasets that accurately represent how diacritics are correctly used,” explains Haweeleh. “We had to design a neural model that can predict and restore those missing diacritics with high accuracy.”

Neural models work similarly to human brains. To predict diacritics, a model needs to study lots of Arabic text, learn the language’s rules and understand how words are used in different contexts. For instance, the pronunciation of a word can vary greatly depending on the action or gender it describes. Extensive training from the team was the key to enhancing the Arabic TTS model’s accuracy.

Enhancing Understanding
The SRJO team also had to collect diverse audio recordings of the dialects from various sources, which had to be transcribed, focusing on unique sounds, words and phrases. “We assembled a team of native speakers in the dialects who were well-versed in the nuances and variations,” says Ayah Hasan, whose team was responsible for database creation. “They listened to the recordings and manually converted the spoken words into text.”

This work was crucial for enhancing the Automatic Speech Recognition (ASR) process so that Galaxy AI could handle the rich tapestry of Arabic dialects. ASR is pivotal in enabling Galaxy AI’s real-time understanding and response capabilities.

“Building an ASR system that supports multiple dialects in a single model is a complex undertaking,” says Mohammad Hamdan, ASR lead for the project. “It demands a thorough understanding of the language’s intricacies, careful data selection and advanced modeling techniques.”

MENAFN19052024005161011692ID1108230445

Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.