How Complex Should Your Chatbot Be?

Background

The past decade has shown a rapid increase in the implementation and use of chatbots by many industries. With the shift to a more digital world and changes in the current economic environment as companies are examining ways to reduce expense, it’s likely that chatbots will continue to replace humans in many areas. From customer service, to scheduling, answering FAQs, and billing, the incorporation of chatbots for these services will become more and more common place.
     Although the first chatbot was developed at MIT in the 1960’s, chatbots started really coming into mainstream use in the last decade. The level of chatbot complexity can vary, from simple algorithms to advanced AI. As computing power has increased in recent years and deep learning capabilities are becoming more accessible, chatbots have greatly advanced in their capabilities with the ability to incorporate additional inputs such as image and voice recognition data.
     In this article I will cover a few different methodologies of chatbot creation and try to highlight the advantages and disadvantages of each. The first two of these are Retrieval Models, where the responses are pre-defined, and the last is a Deep Learning Model where the output is uniquely constructed output.
     Each of these are created in Python using a small sample data from an auto dealership. We’ll take a look at the methodologies, focus on the pros and cons of each method, and determine what approach is best for this use case. Although these examples are just scratching the surface of the possibilities for chatbot creation, I hope you will be able to get an understanding of the potential these have.

Method 1: Rule Based

In this simple method, we are going to create a list of categories (“car service”, “oil change”, “sales”, “repair”, etc.) with an associated message (“Let’s connect you to one of our sales representatives.”) and a list of keywords associated with each category. The user input is parsed and the response that contains that keyword will be outputted.

It is as simple as that, but obviously full of many drawbacks. Some of these being: 1) It doesn’t take into account context, 2) if the responses have many of the same words, the accuracy will be very poor, 3) this method does not learn through interactions.

Method 2: Naïve Bayes

Naïve Bayes is a simple algorithm and quick to run. It is more complex than our previous example because it takes all the words in the sentence into account. Although unlike the Sequence to Sequence model we’ll discuss next, it doesn’t take into account word order and assumes independence. The algorithm behind it is very straightforward in that it simply calculates the probability of each category then outputs the category with the highest score. Similar to the previous example, this is supervised learning where we will create different categories for each chat. For example, if the user input is “I would like to buy a car today”, we would categorize this as “Sales” with our corresponding response being “Let’s connect you to one of our sales representatives.”

As can be seen in the video above this appeared to work fairly well even though I only used 40 observations for the training data, on average only 8 per category. With more observations this would probably be even more accurate and be able to accurately answer more complex questions.

Method 3: Sequence to Sequence

The last methodology used is called a Sequence-to-Sequence model (Seq2Seq). This is a true deep learning model that uses Recurrent Neural Networks (RNN) to process massive amounts of data. Unlike the previous models which are retrieval-based and output a pre-determined response, Seq2Seq models create a unique output. One of the big differences between Seq2Seq models and those we previously looked at is that they take into account the sequence of words. Unlike bag of words models that accept each word independently, these models take into account the sequence of the inputs and outputs or questions and answers.
     A good example of how this methodology is utilized is with translation programs. In the instance below, you can see how the position of certain words are different between English and German. The literal translation of “I’m doing well today” is “Today goes it to me good”. As the program takes into account word sequence, this is translated properly with the German words in their proper order rather than just matching word for word and giving an incorrect translation.

One drawback to Seq2Seq is that in order to create a robust model, it is necessary to have large datasets that have much variation in dialogue. Our dataset was only 40 observations. In order to make it large enough to develop a deep learning algorithm with, these observations were copied and pasted until we have thousands of observations to work with. But as can be seen in the video below, since there was not much variation in the questions and responses, the results were inaccurate.

Recap

Below are the results from our models:

Method 1- Rule Based
  • Lines of code: 80
  • Run time: 2 seconds
  • Accuracy: High

  • Method 2- Naïve Bayes
  • Lines of code: 70
  • Run time: 2 seconds
  • Accuracy: High

  • Method 3- Sequence to Sequence
  • Lines of code: 338
  • Run time: 5 minutes
  • Accuracy: High
  • For our scenario of creating a simple chatbot to direct customers with basic questions, I would suggest that the best algorithm to use is Naïve Bayes. It is simple to create, implement, and we don’t really need much to feed much data in order for it to be accurate. Had we massive amounts of data available for numerous topics, with the goal of truly mimicking a human-to-human conversation, the Deep Learning Sequence to Sequence model may have been a better fit.

    Final Thoughts

    I don’t believe that in the near future chatbots will completely replace humans. At some point this will happen, but for now there are still nuances of language that a computer cannot exactly pickup on. Chatbots however are invaluable in supplementing certain tasks and roles, especially simple ones like we saw in the example of the auto dealership. The chatbot can provide immediate answers to basic questions or steer the customer to the right place where they can get more detail on a topic.
         Deep learning based chatbots like the Seq2Seq model are essential for a chatbot that truly needs to impersonate a human. As discussed, one of the biggest drawbacks to this model is the large amount of data that is needed to train the model. If you are looking to create a chatbot in-house and do not have large datasets at your disposal, it would be necessary to purchase this data from a vendor, which could end up being fairly expensive. As a company you need to decide what makes the most sense before investing time and money into outsourcing purchasing a chatbot or creating one in-house.

    Before starting the process, answer the following questions:
    1) What is the purpose of the chatbot? Is it meant to replace a human altogether or just need to help field simple inquiries?
    2) How many topics do you need it to cover and how many responses would you need?
    3) How much data do you have to train the model?
    4) What is the operational complexity involved to implement this on our site?
    5) Ultimately, will the benefit outweigh the cost?

    Hopefully this article gave you insight into how simple and complex chatbots can be. Before committing to an expensive out-sourced solution or pricy piece of software, think about the aspects of the different algorithms discussed and make sure you give great thought to the question: How complex should your chatbot be?