Command-and-control vs Conversational

It is important to know when a VUI whether it is command-and-control type of a system where the user has to explicitly indicate that they are going to speak or it is a conversational system where the assistant can talk to the user at any point. Can the user command the system at any time like Google assistant, Siri, Alexa or Cortana? Is the user involved in a closed conversation with an explicit beginning, middle and end like a game or a chatbot?

Command-and-control

Most popular VUIs like Siri, Cortana, Alexa and Google assistant are of this type. The user needs to explicitly indicate to the system that he is going to speak. Siri requires the user to press the home button or say “hey Siri” before asking it to perform an action.
“Hey Siri”, “Ok Google”, “Hey Cortana” are phrases used to invoke the system. These are often called wake words or hot words (more on this going forward). In most cases, the system responds with an earcon and/or with a visual feedback like Alexa’s light ring, wavy line, etc. on invoking the assistant. When the system has decided that the user has finished speaking, it often indicates with an earcon like the “up” sound from Siri. Identifying when the user has stopped speaking is called endpoint detection. As a rule of thumb, the time window in which your system should be listening is about 10 seconds.

Conversational

Forcing the user to press the button every time they are going to talk can be cumbersome and unnatural. When talking to a real person, you don’t have to indicate every time you want to speak. You can use natural turn taking techniques like asking a question, pausing, waiting for a response and in some cases explicit direction. The easiest technique is to ask a question, users would naturally respond.
However, human conversational turn-taking is not always seamless. In many cases, people might use paraverbals like “hmm”, “aha”, etc. in the middle of the conversation. It does not mean that the person wants to talk, it is an indication of existence — I’m here, I’m listening. Computers as yet aren’t able to understand this subtle form of turn-taking.
Another turn-taking violation that systems need to keep in mind is putting an instruction after the question. “Would you like to save this? Say yes, no or repeat.” — in this scenario, users might want to start speaking as soon as the question ends and might be frustrated because they can’t interrupt the system. A good design would be to remove the instruction altogether and handle responses through natural language processing, provide a visual indication where possible or put the instruction first.

It makes sense to switch between command-and-control and conversational modes. In these cases, it is important to let the user know that the system is listening and the mode has changed. A good example is setting reminders on Cortana. Users can say a “Hey Cortana! Set a reminder” and Cortana begins the reminder flow. There is no need for the user to say “Hey Cortana” again, it is a known conversational structure.

Barge-in: Interrupt the conversation at any time
In case of a hybrid model of command-and-control and conversational modes, the system could allow the user to interrupt the conversation at any time. This is called barge-in. Enabling barge-in is critical if the user wants to get out of the conversational flow at any time. For example, in the middle of the game of Jeopardy, the user can barge in and say “Alexa, stop the game.”

Conversational markers

Using conversational markers is another way to make your VUI sound more human-like. They are an important way of letting users know where they are in the conversation and that they are understood.

Here’s an example of a system without conversational markers:

Assistant
Did you get a good night’s sleep?
User
Yes
Assistant
Did you read before sleeping?
User
Yes
Assistant
What did you read?
User
The Rise of the Robots
Assistant
Ok

It sounds as you might’ve guessed, kinda robotic.

Here’s an example with conversational markers:

Assistant
Hi there! I’m going to ask you a few questions about your sleep. Did you get a good night’s sleep?
User
Yes
Assistant
That’s awesome. Did you read before sleeping?
User
Yes
Assistant
Good job. What did you read?
User
The Rise of the Robots
Assistant
That’s a nice book. Have a great day!

The system is using basic manners, it feels more human-like.

Conversational markers include:

  • Timelines (“First”, “At last”)
  • Acknowledgements (Thanks”, “Got it”)
  • Positive feedback (“Good job”, “Awesome”)
  • Assurance and generic replies (“Sorry to hear that”, “Oh”)

Conversational markers need to be designed as per your system’s persona. Even if it is a formal system and your users know that they are talking to a bot, having conversational markers is just basic hygiene.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here