July 12, 2019 in Artificial Intelligence

The landscape of conversational AI platforms

How to choose the right platform for your business.

Abe Raher

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2019.05.01

Vijay Ramakrishnan (photo, right) is the technical development leader of MindMeld, a conversational AI platform recently open-sourced by Cisco. Ramakrishnan develops the machine learning (ML) and information retrieval (IR) models in MindMeld, and architects conversational applications on top of the platform. Before Cisco, Ramakrishnan built conversational assistants for Fortune 500 companies such as Starbucks and Fast Retailing.

Georgian Partners defines conversational AI as “the use of messaging apps, speech-based assistants and chatbots to automate communication and create personalized customer experiences at scale.” Artificial Solutions describes it as “a form of artificial intelligence that allows people to communicate with applications, websites and devices in everyday, humanlike natural language via voice, text, touch or gesture input.”

Abe Raher, who writes technical documentation at AppDynamics, recently interviewed Ramakrishnan on the topic of conversational AI platforms and how to choose the right platform for particular business needs. Following are excerpts from the interview:

Describe the conversational AI landscape.

The market is crowded. Players ranging from startups to blue chip companies now offer conversational AI platforms. I think of the major differences between platforms in terms of three tradeoffs: usability versus configurability, cloud versus on-premise and closed source versus open source.

“Black box”-type AI platforms abstract all the internals of machine learning (ML), which makes them easy to use and increases the speed to deploy a prototype application. By contrast, highly configurable platforms expose ML internals to the developer, which makes it easier to fix problems once an application is in production. For example, an application might misclassify some subset or category of queries. This is easier to address with a highly configurable platform, where the developer owns the entire ML stack, than with a “black box” platform.

Cloud-based AI platforms require training data to be uploaded to the cloud. For organizations whose customer data cannot be released to third-party services, uploading data to the cloud can be a deal-breaker. For them, it’s essential to find a platform that can be deployed on-premise, with all training data stored locally.

Platforms that are tightly coupled with consumer products tend to be closed source. The benefits of deep integration must be weighed against the disadvantages of closed source. For example, it might be easier to develop on the recommended AI platform as the device, but support might be limited, and since the code-base is closed, detailed introspection into the code to fix an issue cannot happen.

Open source platforms, meanwhile, are transparent due to their open code-base, so most issues regarding the platform have already been discussed, support is generally faster to provide due to the involvement of the open-source community and any further investigation of an issue can be done by introspecting the code. While evaluating open-source platforms, make sure to consider whether they have integrations to consumer or enterprise devices that you care about, and how vibrant their developer communities are.

Before investing in an AI platform, an organization should decide how best to address these trade-offs. Beyond that, nuances matter; for example, the field of natural language processing (NLP) and ML are active disciplines of research, so better models to understand human language are constantly being discovered. Platforms that can incorporate these recent developments the fastest, which tend to be open due to higher developer/research interest, are at an advantage here.

Once an enterprise decides it needs a conversational AI solution, what is the primary challenge it must confront?

Choosing the best channels for the conversational product. If the goal is for a brand to appeal to millennial consumers through a simple chit-chat application, the answer might be to build an Alexa skill or a Facebook Messenger chatbot. If the goal is to create an enterprise, e-commerce or a voice application that needs to access a company-specific knowledge-base, it is better to create a more sophisticated conversational application.

Figure 1: General architecture of enterprise conversational AI systems. Source: Vijay Ramakrishnan

Depending on the audience to whom the enterprise sells, getting tightly coupled to a single platform can be a risk. When your strategy calls for limiting the application to a single channel, like an Alexa skill, tight coupling is fine. But if management decides to expand to another digital input medium like Facebook Messenger or Slack, this can become a liability. The enterprise should determine its longer-term needs before choosing an AI platform.

How does ease of development vary from platform to platform?

For getting a starter application up and running, Amazon Lex and Dialogflow are the leaders. Their cloud user interfaces are intuitive, and both support the popular JavaScript language.

However, we discovered that multi-turn dialogue flows, defined as a series of back-and-forth conversations with the AI agent to fulfill a given task, were hard to develop in these platforms. Such dialogues are critical in accomplishing certain enterprise tasks like filing the contents of a form.

Figure 2: Example of a multi-turn dialogue with contextual switching. Source: Vijay Ramakrishnan

How does developing a domain specific conversational assistant differ from developing a generic chat assistant?

A domain-specific assistant must have a knowledge base to store domain knowledge. This enables the AI agent to retrieve information to fulfill a certain customer task. For example, an airline booking AI agent would access an internal knowledge base of airline operations to help a customer book a flight. A generic chat assistant does not need such a knowledge base because its main objective is to entertain the user.

What difference does having a chat interface as opposed to a voice interface make, when you develop AI agents?

Input methods have different sources of error for text than they do for voice. Errors in chat-based applications are mainly typos due to fat-finger typing or abbreviated texting jargon. In voice-based applications, the automatic speech recognition (ASR) technology that converts voice to text produces errors in multiple ways. The system might stumble due to a lack of training data for some bodies of knowledge. Or it might be unable to distinguish the speaker’s voice from background noise in some environments. The challenge for conversational applications is to retrieve the correct business object in spite of such errors.

What recent developments in conversational AI interest your team the most?

Lately we are seeing the successful application of large-scale language models to a variety of natural language processing (NLP) tasks like entity extraction and topic modeling. These models are trained on generic, massive Internet corpora using unsupervised ML techniques and then applied to more than one unrelated language problems. For example, you could have a model that is trained on Wikipedia data and then used to make predictions on company-specific data.

For example, OpenAI’s GPT-2 model (Radford et al., 2018) trained on 40GB of Internet data achieved state-of-the-art accuracy results on an entity recognition problem without even being trained on it. This trend significantly decreases the cost of sourcing training data for building a production quality AI model, which could be prohibitively expensive for certain use-cases.

Another thing we’re watching is innovation in question-answering systems. Our knowledge-based information retrieval models primarily use term-frequency inverse-document frequency features to retrieve search results. However, recent applications of universal sentence encodings that vectorize input queries and find all similar queries in the knowledge-base have shown promise in recalling high quality results. This innovation will help in “frequently asked questions”-type conversational applications, where simply ingesting a question and answer pair will be enough to answer any variations of new queries.

What benefits can we expect now that Cisco has open-sourced the MindMeld platform?

With MindMeld as an open-source platform, we are better positioned to keep up with rapid innovations in the conversational AI space. We can maintain our focus on what’s happening while giving our outside partners and customers access to the MindMeld source code to configure applications for their particular use cases. Being open source allows us to continue growing the platform while increasing value to our customers.

Abe Raher

Abe Raher is a senior technical writer at Cisco (AppDynamics).

Keywords: