Messenger bots and the demands of localization

By Pavel Doronin December 9, 2016

Wearables, big data, machine learning, neural networks and the Internet of Things are all changing our notions about the future, our everyday lives and our approach to professional activities. The translation industry is experiencing many positive effects and benefits from new technologies. A new era of adaptive machine translation and post-editing, cloud technologies, voice recognition and speech-to-speech translations — all these things are changing our work right now, at this very moment. While I was writing this article, Google presented neural machine translation, a new step in the evolution of machine translation. I am sure that blockchain technologies will surprise us in language-related fields soon too.

All these innovations influence our perception of content, as well as its creation and the various ways of interacting with it. Messenger bots are one such new way of engaging with content. The messenger trend has led us to the bot trend. Many messengers are launching their own bot platforms, and Microsoft recently presented its Bot Framework, which allows for the creation of bots across many platforms: Skype, Slack, Telegram, and even SMS and email.

In this context, bots are not robots resembling androids, they are virtual messenger contacts that allow users or even user groups to interact with services and programs via a messenger interface. In other words, the bots are kind of an alternative content delivery channel. A wide range of bots are available, covering everything from news and weather forecasts to flight reservations and even money transfers. Some are even talking of a paradigm shift from “There is an app for that” to “There is a bot for that.”

If we consider bots in even more general terms, we can look at them as a driver for the development of conversational user interfaces. Accordingly, where there is conversation, there are speech, text and language-related objects (emoji, gestures and so on). In my professional practice, I have faced questions of the creation and localization of texts for bots.

During my time at Doctor Web, our team worked on a research project — an antivirus bot for the Telegram messenger service. We designed and developed a bot that can check files and links on the fly, from within the messenger, regardless of device or platform. The bot can be used as a personal assistant in private chats or even as a guard in group chats. Nobody from our team had created such texts before — we worked on interface texts and documentation for enterprise products. Our challenge in this project was to convert conventional antivirus language patterns into standard messenger phrases. This presents some challenges for technical writers, localization specialists and translators.

First of all, bots represent a special interface type: a mixture of graphical user interface, text-based user interface and voice user interface. Consider a classic command-line interface that you can interact with using buttons, text or voice. When using a messenger service, text plays the most important role, acting as the chief means of control: users interact with the bot through special commands, natural language or buttons with text. Bots’ output can be any kind of multimedia, maps or actions, but for the most part the output is text. This is a challenge not only for content creators but also for translators. Any error or inaccuracy could restrict the functionality or even damage the bot’s usability. For translators this seems to be more of an issue — they have to take into consideration not only the source language, but also the functionality of a bot, and the way users deal with it in the target environment.

A further challenge is the bot voice. Writing bot texts is not the same as writing interface texts. If bots are virtual contacts in a user’s messenger, how should they talk? In the first person? Should bots continue the app interface style of speech, communicating in abstract phrases? Would the imperative mood (“Send your current location,” “It’s going to be rainy today, take your umbrella”) be enough? At the same time, if a bot is part of a product line, an extension of a product or a companion product, its tone must correlate with the product voice or brand, taking into account the specific character of the messenger conversation. For example, casual-sounding first-person speech assigned to an antivirus bot could lower the degree of trust the user places in it, and could pose a risk when discussing information security. We decided to divide our bot’s messages into two types: messages directly related to security and other messages. These were chiefly the bot’s configuration, error messages and greeting phrases. We differentiated the bot’s voice on this basis – in security messages, we used serious and abstract phrases reflecting the tone of antivirus software. In all other messages, our bot uses the first person and even makes jokes. Whatever bot voice is defined, it makes sense to commit it to a style guide to make the source language voice more consistent and to ensure the same tone is mirrored in the other languages.

Mobile interfaces impel us to shorten interface texts, tailoring them to small-sized menu items and short notifications. Bots force us to shrink texts for other reasons. One of these is the fundamental value of instant messengers — instant content delivery from user to user and from service to user. Since bots exist and operate within messengers, they should share in this value.

Imagine you are walking with a friend and receive a message. No matter how long the message is, you will most likely read either the first sentence or a message preview on a lock screen, possibly making a mental note to return to the message later. It is unlikely that you will do this, however, since we are talking about messenger and not email behavior patterns. With our team, we tried to create catchy bot texts that can be read and understood instantly. Our approach was first to write texts as if they were for a desktop antivirus interface. We then went through many iterations, reducing the text while keeping its original meaning. Afterwards, we simulated spoken dialogs and voiced these phrases. If they fit in the dialog smoothly, they were accepted. Here is an example of a button text evolution in a notification settings menu:

l Send notifications when links or files contain threats (the initial text, standard for desktop interface users)

l Send notifications about harmful links and files (a shortened version of the text, adapted for mobile interface text)

l Report on dangerous links and files (the final bot text, shortened and tested as spoken dialog)

There is one more thing that can help to boost content delivery: emoji. Emoji are not just funny yellow faces, but a rich system of characters of any kind. We must learn how to work with them from a localization perspective too. Emoji pose many challenges for content creators and localization specialists.

First of all, there are technical issues: operating systems and working tools (text editors, CAT tools, QA tools) must support emoji; otherwise they will be displayed as unrecognized symbols. I have a favorite joke about this issue: “I ? Unicode.” In addition, the same emoji can look completely different on different platforms. Furthermore, different platforms have different numbers of available emoji. Let’s say you have chosen a well-designed clock emoji for your bot from the Google emoji set. You cannot be sure that this clock will look the same on an iPhone, or that a Firefox user will see it at all. A free and open emoji encyclopedia, emojipedia.org, has helped us to navigate all this chaos.

There are also tricky aspects to handling emoji meanings. First, emoji must be an extension of a message and enhance its meaning. They should allow a text message to be understood more quickly or without any text. Secondly, emoji should convey the same, commonly understood meaning in the target culture, should be culturally acceptable and should not be offensive. This is why translators must be able to see the emoji being used in their tools and, if necessary, replace them with more suitable ones in the translation. There is at least one indisputable example of emoji replacement: the arrow emoji should be replaced by mirrored ones in right-to-left (RTL) languages such as Arabic.

There is one more important reason for text shortening: unlike apps, bots are mostly cross-platform and cross-device. You cannot be sure what operating system and what device users will run the bot on. It could be a smartphone, a laptop, a smart clock or even a Linux-operated refrigerator. These devices can have either a large retina screen or a small, one-row display, and the bot text should fit everywhere. Text shortening, however, is just the first layer of the onion. The unpredictability of a platform or device can strongly limit you in choice of vocabulary and can force you to use rather abstract phrases: the most obvious example is that the verbs click, tap and press should be generalized with the word select or choose. Because you’ll never know exactly how a user will interact with the system and device, the cross-platform Telegram messenger can run on almost every smart device.

In addition, there are still open questions about how to localize a bot’s user-generated content: dynamic content or content from external sources; how far we can continue using machine translation to resolve these issues; how our approach to localizing the voice-controlled bots should look; and how we localize bots for virtual and augmented reality.