As customer success managers for B2B, we often interview our business partners. To save time spent on making notes from these interviews we decided to employ a 3rd party. That’s where our consensus ended and yet the enthusiasm for innovation kicked in. The process below helped us to explore what options we had and discuss them objectively.


You might get scared now seeing the word long-list. However long-listing does not have to mean a lot of work and ending up with a listing of every solution on the market. Instead, identify the main possible approaches to solve your problem first with a quick search, pick your favourite one and then eventually do a second round of vendor search to broaden your pool of choices. Remember double diamond? Long-listing is the first step in this process.

Let me walk you through the process through the lense of our ’s goals. To ease note-taking we had three basic options: to transcribe the recordings manually and get a that makes it easier, to outsource it to an agency, or to do something for humanity and employ AI to do this part of our job. As you can see below, all of these approaches were then represented in our long-list by a few interesting vendors.

The long-list of text-to-speech tools (mid-2018)

Now, let’s have a look at our long-list parameters. A required one was the usage scenario. By that I mean information such as “Tool A was doing the recording for me. Tool B forced me to extract the audio track from my original video recording first and then upload it.” Such usage scenarios help us to understand up to which point the tool fits (or does not) our current workflow.

In my opinion, a usage scenario is a must have for any long-list of tools. Whether to include more paramaters depends on their importance. Platform support, price level, free trial availability etc. were non-negotiables for us and thus made it to the long-list. Anything else we left for later.


What I did with my long-list was sharing it with my colleagues. Immediately, they picked their favorites, sometimes completely different from the ones they favoured initially. A step two of us making an informed decision about our new tool was to dig deeper.

To be able to do so, the good practice is to narrow your options based on the information from the long-list and define new criteria to compare the candidates that made it to the second round. Some of them are obvious, like GDPR compliance or price tag. Some of them had better be analyzed in detail, like the quality of the product or accompanying customer services.

For us, the main criteria were quality (including GDPR compliance) and price. From the previous phase, I knew there was no such thing as a tool with poor usage scenario or user interface. Because of that, I focused on being able to compare the quality of transcription itself. The question was, how to define quality in this case? Moreover, what do we and/or our partners consider to be a good transcription tool?

In our case, quality means we can speak as we’re used to — playing with words, using jargon, speaking in abbreviations, have an accent and utter a lot of errs and uhms to gain more time to form our ideas — and GDPR compliance. Why GDPR? Because another characteristic of a good is feeling secure. We’re not discussing confidential things on the calls, yet personal information also deserves special care. Partners expect these conversations to take place in closed settings, not coffee shops. So if they accidentally mention where they were on vacation in the conversation, they don’t expect a 3rd party to know. Needless to say, they don’t expect a 3rd party to use it for business purposes either.

How we were assessing the transcription quality? We listed the speech characteristics and then searched for a real-life recording that featured all of these speech characteristics. We ended up with several two-minute samples of dialogues with native speakers from the UK, the US, Czechia, and Slovakia. They were discussing how exactly something worked. To double check, a colleague of mine was comparing a different recording at the same time. This allowed us to assess which tool performs better in our context.

The short-list of text-to-speech tools (mid-2018)

Lessons learned

Being the person who conducted this research, I had a chance to not only play with various tools and see how AI performs in the field to speech-to-text conversion, but also see how a bit more formal process can affect the team discussion about their tools. Another takeaway for me was how fast this particular market evolves. In the span of 3 months, one vendor disappeared while two new notable solutions emerged. And the last lesson for me was, how long it actually takes to sort out the GDPR red tape. Getting a transcript from any of the services was a matter of minutes, rounding up the necessary paperwork took weeks.

Source link


Please enter your comment!
Please enter your name here