NOOR, the new largest NLP model for the Arabic language

The Technology Innovation Institute (TII), an Abu Dhabi-based research center, recently announced the development of NOOR, the largest Arabic-language natural language processing (NLP) model to date.

With 10 billion parameters, the TII’s researchers believe that NOOR will become the “go-to exploration model in Arabic.” In developing NOOR, the TII’s Artificial Intelligence (AI) Cross-Center Unit collaborated with LightOn, a Paris-based technology company specializing in extreme-scale foundation models.

Large language models have taken the world of natural language processing by storm,” said Ebtesam Almazrouei, the director of the TII’s AI Cross-Center Unit. “The uniquely large Arabic dataset collected to train the model is the result of months of work that included curating, scrapping, and filtering of varied sources.”

According to The National News, a United Arab Emirates (UAE)-based news agency, this model is significantly larger than AraGPT, which was previously considered to be the largest Arabic-language NLP model. AraGPT had about 1.5 billion parameters — NOOR’s increased size allows for it to complete more complex tasks. In order to develop the model, the AI researchers at the TII trained it using a wide variety of texts, including technical texts, poetry, and newspapers.

The model, which takes its namesake from the Arabic word for “light,” is somewhat similar to the GPT-3 model, and can complete various tasks from summarizing texts to chatbot development to language assessment. The TII will continue honing its AI efforts, with Almazrouei noting that NOOR is the institute’s first step toward contributing to the UAE Strategy for Artificial Intelligence, an initiative to boost the country’s worldwide tech profile.

“With this development, we are on track to boost our research capabilities and credentials in AI, as well as elevating the status of Abu Dhabi and the UAE as a serious research ecosystem,” said Ray Johnson, CEO of the TII. “Our expert teams have demonstrated yet again that this region can achieve breakthrough research and development outcomes that impact the world.”

Andrew Warner
Andrew Warner is a writer from Sacramento. He received his B.A. in linguistics and English from UCLA and is currently working toward an M.A. in applied linguistics at Columbia University. His writing has been published in Language Magazine, Sactown Magazine, and The Takeout.

RELATED ARTICLES

Weekly Digest

Subscribe to stay updated

 
MultiLingual Media LLC