The technology of speech recognition appears magical in its ability to transcribe and understand us, but the real changes are still to come.
We will no longer interact with the internet solely through keyboard or thumb. We’ll be talking and listening, and this new user experience will transform everything.
To see where we are headed we can we start by looking at the basic components of voice search technology. Voice recognition technology delivers it’s magic in a few ways.
Firstly, the tech can record and transcribe the analog sounds of speech into written text and digital information. This is a phenomenal achievement as it needs to account for accents, variations in voice, along with the very human inflections that we all interpret subconsciously.
Secondly, speech recognition translates this digitised information to determine the meaning and the intention behind the sounds. Complex AI has been developed that makes this seem so simple.
Thirdly, voice technology needs to decide how it responds. If we’re using voice recognition purely as a way of gathering information then it doesn’t need to do much. But if the speaker is expecting a response then things get complex. This is not complex from a technical and data perspective, but it is very complex from a UX perspective.
Standard search results have been modelled from the Google pattern – feed in a search query, get some answers, rank them according to your algorithm and send them back to be viewed as a list. This is great if you’re reading, but not so good if you’re listening. The way we use search will become less textual and more aural.
Our experience and expectations will change so that today’s internet will feel a little quaint.
This isn’t the first time the internet has had a major pivot in how it delivers user experience.
Signposts on the road we’ve travelled
To really understand the impending impact of voice search on the internet, we need to look back at several key crossroads in its development. These changes had technological underpinnings but the biggest shift was in user behaviour and the experience that users expected. In some ways the UX changes were more fundamental than the underlying tech.
The age of the portal
The first of these UX crossroads happened with the birth of the portal.
The internet meant we all had access to masses of information. We needed to find a way to provide access to this information that was more than people passing on web urls to each other. The closest thing we knew to this mass of information was the the telephone directory – and so web portals created.
These were basically directories of sites in categories and were signposted as the entry points into the online world. From these central points of reference people navigated out to what they wanted. The UX of the internet developed to resemble a categorised telephone directory and classified ads approach. You might remember the likes of AOL, Yahoo and Excite who either collaborated or competed with ISPs to capture dial-up internet users at the point at which they connected to the internet.
All seemed good. Portals were doing a good job and people seemed happy with this telephone directory approach.
But the mass of information was just too much to handle. The portals were bloated and confused, or just couldn’t deal with the scale.
The age of search
And so the next of the major internet UX disruptions came along with the idea of ‘search’.
Google simplified everything – a single point of access to the wealth of information just by asking a question. People started using the search box for everything. It wasn’t perfect. But that was fine as it gave you page after page of results you could look through. As search became more intelligent the results became better and you could generally find what you needed on the top few results.
The user’s experience and expectation of the internet evolved and search results listings became the order of the day. The directory approach was rejected. The dotcom boom of the portals was over before it had much of a chance to get its feet.
An industry was created around optimising for this way of navigating. SEO is as much about getting under the skin of how people use the UX of search engines, as it is about understanding the search engine algorithms. Getting in the top ten has been the goal as this is where people look. The higher your ranking the better – position 1 is what we’re after. And if you drop down the rankings it’s bad for business, but not necessarily a disaster.
This is our experience of the internet today. Even on mobile devices a standard search delivers up a list of results from which we do two things. First we review the options, then we select the most appropriate (or reject them all and ask for more).
The age of voice
Voice search will change all this.
Current search is defined by a list of results. With voice search there are no search results. There is only one search result. It is singular and there are no longer multiple choices.
The architecture of search results now relies on being at position zero. This is the only result that matters.
Voice search is now a bit like the ‘I’m feeling lucky’ option that used to exist on Google. (Does anyone remember that?) Instead of giving you a list of results it would take you to the result it thought was the most appropriate. For a good reason it no longer exists… until now. Essentially that’s what voice search is doing. It’s giving you the ‘I’m feeling lucky’ option, and we know that’s a terrible user experience.
The voice search algorithm has required a bit more intelligence to decipher the vagaries of our speech, but the real difference is in how we interact with voice search and most significantly how we need to respond to the results.
This latest shift in internet UX will radically alter how we access and interpret information. Technology now understands our speech, but it has still to develop the best way of feeding this back to people.
The internet of yesterday and today has been driven by a textual interaction, so if you can read and write then you can navigate the masses of information. The internet of tomorrow looks to be driven by oral and aural interaction where the spoken word replaces the written word.
How we deal with this, and how the technology and semantic data structures will evolve is still to be seen. But suddenly the difficulties of blind people who rely on sound to make sense of the internet are the difficulties that the internet industry will need to solve as a commercial imperative. It is unlikely that the current speech reader technology will provide the answers, but it will probably point the way to a better aural internet.
In order for us to adapt to this new way of interacting with the internet, we will need to develop an entirely different expectation of our own experiences. We will also need to understand how our new reliance on voice interaction will impact the internet’s semantic, technical and commercial structure.
If you found this interesting or useful then you know what to do…