2. Audio and Video Drives the User Experience
Taking this down a level or two, they made the fundamental assertion that all the AI in the world won’t make a difference unless you have a great user experience. When it comes to communications and collaboration – either in the workplace or the contact center - that means having great audio and video capabilities. These fundamentals are easy to take for granted, and I really liked how they parsed out what they’re doing, not just for UX, but for how Cisco is trying to differentiate.
The main update is AI Codec (ultra-low bit-rated resilient codec), which uses generative AI among other things to ensure high quality audio across all network conditions. So, when bandwidth is variable or spotty, packets will drop, and that degrades audio quality. I’m not an engineer, but they explained how these packets carry multiple copies of the audio, so if one drops, the others will get through – that’s redundancy to cover packet loss. A key part of the AI piece is how the codec removes extraneous elements like background noise so that only the voice signals are heard. Got it.
My photos below aren’t great, but the first one shows how AI Codec maintains top quality performance across the spectrum of low levels of bandwidth – between 1 and 6 kbps. Compare that to the right side of that chart, which shows the industry standard Opus Codec, and how it only maintains that level of performance at much higher bandwidth levels – 16 kbps. So, when it comes to supporting the varying bandwidth scenarios for hybrid work, Cisco maintains their new codec is better aligned.
The photo on the right is clearer, and shows another data set to support their audio quality story. In the speech recognition world, Word Error Rate (WER) is a benchmark for accuracy, where the lower the metric, the more accurate the speech engine. Cisco’s capability here comes largely from its Voicea acquisition, and this chart shows their market standing in two ways.
In absolute terms, the current version of Voicea leads the pack at 11.5% (meaning an 88.5% level of accuracy), well ahead of the leading brands. Then, in relative terms, the chart shows four data points for Voicea, and how their WER has steadily improved from 14.6% to 11.5%. This is where Machine Learning comes into play with continuous improvement, adding another layer to Cisco’s AI story.
Disclaimer – I’m a market researcher by trade, and I don’t know the source of this data. Every speech rec player seems to find a data set that shows them to be the best, and I cannot vouch for how authoritative Cisco’s claims here are. Note to self to follow up on this.
Before this post becomes too long, there are other pieces that help make for better audio and video experiences, such as their newly-touted Real Time Media Model (RMM), which they view as a complement to Large Language Models (LLM), something that all the vendors are behind as part of their AI stories. I’ll move on now, but I hope you get the main idea for how Cisco sees audio and video as core to the Webex value proposition.