Managing Spanish language variations

With over 400 million native speakers across more than 20 countries, Spanish is the second most spoken language in the world. There are numerous variations across regions including differences between European Spanish (also called Peninsular or “Castilian” Spanish) and Latin American Spanish (Mexican Spanish for the purpose of this article).

While there is very little grammatical distinction between the different varieties, Spanish has many dialects, defined as “a regional variety of language distinguished by features of vocabulary, grammar, and pronunciation from other regional varieties and constituting together with them a single language,” according to Merriam Webster’s. 

While a linguist is used to dealing with regional language variation, it’s a serious challenge for any global company. This is especially true for web-based companies that need to cater to the global customer; providing a tailored experience for all users no matter what dialect is spoken, is essential to maintaining brand loyalty. Often, this means incorporating a tool to help customers or clients access products.

For global companies that deal with multiple languages and variations, one of the key entry points to the customer journey is through a website’s search engine. From product inquiries to customer support questions, search engines facilitate most client/company interaction:

In ecommerce websites, 71% said they used search engines to find their products

In customer service, 90% of consumers who have made an online purchase said they used site search to access self-service content.

But with over ten dialects in Mexico alone, and four in Spain, how does a search engine deal with different dialects?


Keywords vs. natural language search engines

The problem is that most commonly used search engines are keyword based, and all text queries and their retrieval will operate under keyword rules of stemming. Increasing complications further, many text indexing systems generally pick up and process every word in a string for a search, except for commonly occurring stop words. 

Keyword searches also have a tough time distinguishing between homographs, or words that are spelled the same way but mean something different (such as lead as in to direct someone or something vs. lead the element). This often ends in frustration because this type of search engine is unable to discern true “meaning” and often delivers zero results, or produces results that are completely irrelevant to a user’s query.

Semantic search engines using natural language processing (NLP) solve the keyword dilemma. Unlike keyword search systems, NLP search systems focus on meaning, or the way we communicate on a daily basis. NLP is concept based, which means it returns search hits on documents that are about the subject or theme you’re exploring based on semantic relations. In other words, if you are trying to find an article that has to do with a discount for a company’s seasonal sale and are using the keyword “coupon” in your search, a semantic search engine will not only provide results having “coupon” in the title and possibly contents, but also provide results that may include “discount,” “promotion” and “offer,” for example.

Results closer to the user’s actual search are produced even if the words in the document aren’t a direct match to the words you have entered into your query.

Although NLP is the closest a search engine can get to “reading the minds” of internet users, there is room for improvement via meaning-text theory.


Application to dialect differences

Although computer and human languages are obviously different on the surface, they share similar properties and structures while conveying some types of meaning. Unfortunately, meaning is not static and can vary depending on the recipient. For this reason the linguistic framework known as meaning-text theory (coined by Aleksandr Žolkovskij and Igor Mel’čuk in 1967) is an ideal complement to the semantic search engine. When applied to the semantic search engine, this linguistic theory deeply rooted in semantics, provides a method for organizing human languages and their subsequent construction.

The semantic search engine with meaning-text theory is therefore elevated to another level and is capable of finding incredibly accurate results no matter how users word their queries because “meaning” is most important.

At the base of many semantic search engines is a frame of reference such as a dictionary. The meaning-text theory framework allows for the addition of lexical functions to address the nuances found across dialects via modules; words and phrases converted into “units” of semantic meaning interpret the intended meaning of any query. These different modules allow for a type of “layering” of concepts on multiple levels that is not just linear, but multidirectional as well.

In the case of a dictionary that has Peninsular Spanish as its base, words and phrases that are particular to Mexican Spanish are added as various lexical functions in different modules. Relations between concepts such as ambiguous vocabulary, synonyms, spelling differences, idiomatic expression, and negations, for example, must be made in order to produce a good result that resembles the way people speak from region to region.

A meaning-text theory semantic search engine would work in the following fashion: For example, the word for car in much of Mexico is carro while coche is used in Spain; the verb drive in Mexico is manejar while in Spain conducir is used. Lexical functions of the meaning-text theory framework will allow for the search engine to understand that manejar el carro and conducir el coche are the same, and are also the same as operate a vehicle or drive a car because relations have been made on multiple semantic levels. Therefore, someone looking for manejar el carro will receive a correct answer based on the dialect used, even if the answer in the database is conducir el coche.

In addition, the framework creates relations between concepts in one area such as synonyms, and can create relations based on other words from different parts of speech that are typically associated with carro and coche such as manejar and conducir. Most important, the semantic search engine with meaning-text theory recognizes the degree or distance of that relation and other concepts and helps the system to derive meaning.

There are multiple ways to create meaning and get to the core of the statement. In addition, one must always consider the needs of the user first and should find ways to “personalize” the customer journey. When applied to the customer experience, semantic search engines with meaning-text theory have the promise to accurately understand questions made by users in their “natural” language, no matter the context, so that customers always get the best results to their queries.