The very first time I interacted with a computer was through this screen:
As a 5-year-old, I learned how to use the command line by watching grown-ups and soon was able to open my favorite game, Hocus Pocus, by typing words that I couldn’t even read or understand.
MS-DOS was literally a black hole for usability. There is no way for you to know what’s inside your computer and what you can do with it. Command Line Interfaces (CLI) require users to have knowledge about the system and its language prior to interaction. But talking to a machine using abstract codes and commands soon reached its limits and the GUI revolution replaced this intimidating level of abstraction on a black screen with colorful icons, menus, lists and windows in order to make information accessible, understandable and usable. Computers would have never been democratized without GUIs.
Many people have compared Conversational User Interfaces (CUI) to CLI, in that communication between human and machine is done through imperative requests from the user. It is a fundamental mode of human-machine interaction. While CLI requires users to have a clear goal and an understanding of how to express their requests in the machine’s language, the idea behind CUI is to mimic a dialog and use NLP to decipher human language in order to understand the user’s intent. It remains one of AI’s hardest problems.
My criticism is not directed at AI; what I criticize is the translation of complex behaviors and processes to a conversation format and the negative consequences of this design choice for the user.
While writing this article, I found that most of my criticism has been the subject of Nielsen Norman Group’s recent usability research on Intelligent Assistants using CUI. This report mentions that “observing users struggle with the AI interfaces felt like a return to the dark ages of the 1970s: the need to memorize cryptic commands, oppressive modes, confusing content, inflexible interactions — basically an unpleasant user experience.”
Why marketing loves CUI
As with apps, CUI monopolizes the user’s attention and increases measurable variables such as user session duration on Google Analytics, sometimes at the expense of usability (needless to say spending more time one an app or feature does not necessarily mean it is efficient). Before social media, the efficiency of technology was measured based on the time it makes you save in doing something. Ironically, CUI could make the simplest tasks more frustrating, as I will explain later.
Let’s look at (and debunk) some of the hypothetical benefits of CUI for user experience as described in an article entitled “UI of the Future: Conversational Interfaces”:
- “Feels more natural than apps” and “More natural way of interaction:” I would argue that a “natural” way of communication is not necessarily an efficient way to communicate with an machine to do a task as “unnatural” as, say, making a bank transfer. Most of our interactions with machines happen to be pretty “unnatural” for which effective solutions have been designed and adopted. If we´re talking about feelings, research shows that users still feel weird and uncomfortable about talking to computers.
- “Don’t need to learn new skills to interact with an interface”: this should be the ultimate goal of any consumer app, no matter what shape its interface assumes.
- “Allows personalization”: in reality, literally anything can be personalized.
- “No need to download a separate mobile app”: I will deal with this point later in this article. I agree that it is a benefit but as we will see, it is not specific to chatbots and it certainly does not compensate for usability issues.
- “No judgment”: This is a point I actually agree with. however, this is the characteristic of machines in general and not specific to CUI.
- “Fast communication: Humans deliver and process audio far more than quickly they do all other media.” I am not sure which other media types the author is referring to but as far as I know, humans are fundamentally visual animals (half of the human brain is devoted to vision).
- “Easier way of communication: Voice is not just faster; it’s also easier than typing a message.”: this doesn’t mean that user always knows what to say.
- “More convenient way of interaction: Interacting with a VUI simply requires a user to speak to the device. A VUI doesn’t force users to learn and recall specific commands or methods of interaction, therefore creating less friction.”: this is theoretically true in a world where AI possesses all knowledge. In the real world, this is not the case (at least for now) and all CUIs operate within a limited functional scope, meaning that users have to indeed be aware of the system’s constraints or have a high tolerance for failure.
Millions of words have been typed to promote and defend CUI in the same way, mostly by bot development companies.
So why do I believe CUI is the worse thing that has happened to usability since MS-DOS?
1. Very low discoverability and understanding of system scope
As explained thoroughly in Don Norman’s “How to design everyday things”, the 2 basic characteristics of good design are Discoverability and Understanding. GUI uses visual clues, hints, affordances and spatial organization of information to ensure discoverability, understanding and usability of complex systems. If your app has 4 buttons, it probably means that it can do 4 things, and you can guess what those 4 things are. If this is not the case, the app has been poorly designed.
Chatbots lack both of those characteristics: The burden of discovering an app’s capabilities is placed upon the user. Whenever you make a request to a chatbot, a small voice in your head asks “what if it can’t do this?”. Sadly, You will know the answer only when the system fails.
In the absence of affordances, users will always have unrealistic expectations about a system’s capabilities. According to the Uncanny Valley theory, the more the conversation sounds human, the more unrealistic user expectations might get. Some chatbots try to overcome this by literally telling users what their capabilities are… and that is exactly my point.
Although my criticism is mostly directed at chatbots, voice assistants also have to deal with this problem. Improvements in NLP in AI technology have made Siri and Alexa highly intelligent and capable of dealing with «most» situations and requests. Amazon’s Alexa introduced “Skills” which. Users have to invoke these Skills by saying their name first before making their request. This behavior basically replaces navigating from an app page to another. Again, Nielsen Norman’s study shows that, even in this case, users face serious usability issues.
2. Conversations are inefficient for complex processes
Nielsen Norman Group’s research finds that “both voice-only and screen-based intelligent assistants work well only for very limited, simple queries that have fairly simple, short answers. Users have difficulty with anything else.”. In other words, CUIs are efficient when the user knows EXACTLY what they want and know how to ask for it. However, that is not how we always interact with websites and applications.
For example, booking a hotel for a trip requires a lot of exploring, thinking, browsing, comparing, sharing information and multitasking. Booking.com’s chatbot is pretty intelligent, only if you know exactly what you want and when you want it. It might be useful if you have already done your research or if you are a frequent traveler. But for the average user who needs to explore many choices from different sources before making a transaction, a chatbot is not useful simply because the conversation format does not enable the complex interactions required for the task.
Not surprisingly, the same research also concludes that Intelligent Assistants are “Poor Support for Comparison and Shopping”. Take the following example: in e-commerce website and apps, the easiest and most common behavior for viewing the color variation of a product is to simply choose a color in the color palette. This is what the same request would look like in a conversation:
Imagine all the time and effort users have to invest in typing this question for every product that they are interested in while browsing.
In some cases, usability issues arise even for the most trivial actions. In an era where deleting something on your smartphone is as easy as swiping, the following instructions in Blink (a reminder chatbot in Facebook Messenger) should keep every UX Designer awake at night:
Complex processes cannot be reduced to speech bubbles. Designing processes in a conversation format could make them extremely inefficient and frustrating, especially when they try to replace common usage patterns with new ones. Most commercial chatbots today have introduced non-textual elements in order to solve this problem. Buttons, menus, cards have already appeared in chatbots effectively making them fully fledged apps. the word “chatbot” is actually reductive for describing these apps since chat format is merely a visual effect for a couple of commands inside blue bubbles that trigger more complex functions. It seems obvious that chatbots could not replace an app or a website but chatbots such as booking.com’s show a clear attempt at doing so.
3. Articulating commands is actually not easy
As I explained in point 1, quite often, the interface reminds users what the system is capable of doing and allows them to evaluate different choices before taking action. In contrast, CUI requires users to articulate their needs in every step which is unnecessary and potentially frustrating cognitive load.
Standard UI elements (pages, navigation tabs, buttons, cards, lists) help users orient themselves without having to formulate their needs because designers have already done that for them. In most use cases clicking on an icon to trigger a process is way more efficient than having to say/type your intent. Meaningful abstractions make humans more efficient.