Microsoft’s AI-powered voice transcription in Word Online converted an 11min interview into 1,935 words in 10 mins

    Over the years I’ve done a number of interviews on techAU and one of the most painful processes is transcribing the interview from audio into text.

    Obviously, when you’re quoting CEO’s, you need to be really accurate and that typically requires listening, then re-listening and listening again.

    This means a half an hour interview, can take more than an hour to transcribe. So painful was this process, that many journalists turned to paid services, which outsourced the job to someone trading their time for dollars.

    Thankfully in 2020, we have some new technology to help us with the challenge of voice to text transcription.

    Microsoft have added a great new feature to Word Online, the ability to transcribe audio using Azure Cognitive Services AI Platform.

    This works by either recording audio directly into Word online, or uploading an existing audio file (i.e. from your phone) that is then processed by Microsoft cloud services.

    Azure Cognitive Services includes an array of disciplines including Decisions, Language, Speech, Vision and Web search. Microsoft sell these services to developers who typically integrate this magic into their applications.

    Word’s integration of Speech to Text is available for free in Word online, which provides a great window into what’s possible with the rest of the platform.

    Microsoft trains speech models to recognise words, phrases and sentences, but is also able to understand organization- and industry-specific terminology.

    Obviously not every recording is done with study quality, more commonly they’re done in incredibly noisy environments so the AI has to overcome barriers such as background noise, accents, or unique vocabulary. Microsoft says they have state-of-the-art, high-quality and accurate transcriptions and the great thing is, we get to test it out.

    Hands-on with Word Online Voice Transcription

    Back in 2016, I had the opportunity to interview Toto Wolff from the successful Mercedes-Benz Formula 1 team, at the Melbourne Grand Prix. The audio was recorded on my phone, in the pit lane paddock, with loads of ambient noise. I sat across the table from Toto, and the audio is probably a great example of the worst-case scenario.

    Uploading the 10Mb, 11 minute MP3 file, took around 10 minutes to process and return the transcription. Returned was a list of timecoded paragraphs (questions and answers) which also comes as identified speakers.

    What I really love is the ability to rename each speaker and simply tick a box to rename all other transcription identified as being by that speaker. This dramatically speeds up the rate at which you can extract questions and answers rapidly, clicking plus to add that segment to the word document.

    In the event you have a multi-party interview, you could easily extract just your questions and the subject’s answers. You may also use this to transcribe a podcast recording where you want all text added to the document. Microsoft has made that easy with a simple button at the bottom ‘Add all to document’.

    Something else I really appreciate is the ability to change the playback speed between 0.5x and 2x speeds, enabling you to speed through, or slow down the playback of people speaking too slow, or too fast. This can also help speed up the translation.

    One area Microsoft could improve this new Transcribe feature, is the ability to bulk allocate a style to a Speaker’s name, once added to the document.

    For the most part, the translation was excellent in its accuracy, with the biggest miss being the name Suzie being translated to CZ which is kind of understandable. Even having to make a couple of minor corrections, you’re way ahead of time which you consider this just translated an 11:25 interview into 1,935 words in around 10 minutes.

    This is a dramatic demonstration of just how powerful Microsoft’s cloud services are when integrated into an application and showing it off in their own product is a great move by Microsoft.

    Having now used this, and seeing how well it works, I really want this in WordPress, it would dramatically change the ability for writers to extract audio content and accelerate workflows that save time and money.

    More information at Microsoft 365 blog.

    Jason Cartwright
    Jason Cartwright
    Creator of techAU, Jason has spent the dozen+ years covering technology in Australia and around the world. Bringing a background in multimedia and passion for technology to the job, Cartwright delivers detailed product reviews, event coverage and industry news on a daily basis. Disclaimer: Tesla Shareholder from 20/01/2021

    Leave a Reply


    Latest posts


    Related articles