Microsoft has spent the past two years adding flashy new productivity features to Teams, and now the company is overhauling how the fundamentals work thanks to AI. We’ve all been on a call where someone has poor room acoustics making it hard to hear them, or seen two people try to talk at the same time creating an awkward “no, you go ahead” moment. Microsoft’s new AI-powered voice quality improvements should improve or even eliminate these day-to-day annoyances.
Microsoft is now using a machine learning models to improve room acoustics so you’ll no longer sound like you’re hiding in a cave. “While we have been trying our best with digital signal processing to do a really good job in Teams, we have now started using machine learning for the first time to build echo cancellation where you can truly reduce echo from all the different devices,” he explains. Robert Aichner, a principal program manager for intelligent conversation and cloud communications at Microsoft, in an interview with TheVerge.
Microsoft has been testing this for months, measuring its models in the real world to ensure Teams users are noticing the echo reduction and improvements in call quality. The software maker used 30,000 hours of speech to help train its models, and captured thousands of devices through crowd sourcing where Teams users are paid to record their voice and playback audio from their device.
“We also simulate about 100,000 different rooms… the room acoustics play a big role in echo cancellation,” says Aichner. The result is big improvements in call audio quality, and an elimination of echo that also allows multiple people to speak at the same time. You can see all of the improvements in action in the video above.
If Teams detects sound is bouncing or reverberating in a room resulting in shallow audio, the model will also convert captured audio and process it to make it sound like Teams participants are speaking into a close-range microphone instead of an echoey mess.
The most impressive part is the ability for people to interrupt each other on Teams calls now, without the awkward overlap where you can’t hear the other person due to the echo. Microsoft is now shipping all this work in Teams, alongside the improvements it has made with AI-based noise suppression previously. All of the processing is done locally on client devices, instead of the cloud.
“We said we want to do it on the client, because the cloud is still expensive if you want to do every call processed in the cloud… and obviously we’d have to pass that cost onto the customer,” explains Aichner. That would mean potentially restricting these important Teams improvements to paying customers, and the on-device route means features like noise suppression are available on 90 percent of devices using Teams.
All of these new Microsoft Teams improvements are now live, alongside some real-time screen optimizations for text in videos and AI-based improvements to bandwidth constraints during video or screen-sharing calls.