ElevenLabs Introduces AA-WER v2.0 Speech to Text Accuracy Benchmark | ElevenLabs

ElevenLabs has released AA-WER v2.0, a significant upgrade focused on improving speech-to-text accuracy for voice agents. This version introduces AA-AgentTalk, a proprietary dataset designed specifically to enhance the performance of models in real-world voice agent interactions.

Key Updates in AA-WER v2.0:

New Dataset - AA-AgentTalk:
- Contains 469 samples (approximately 250 minutes) of speech directed at voice agents.
- Covers industry jargon, call center interactions, AI agent conversations, and more across 17 accent groups and 8 speaking styles.
Cleaned Public Datasets:
- Manual corrections made to existing public datasets, VoxPopuli and Earnings22, addressing inaccuracies in ground truth transcriptions.
- New versions are available as VoxPopuli-Cleaned-AA and Earnings22-Cleaned-AA.
Removal of AMI-SDM:
- Dataset removed due to extensive and unresolvable transcription errors.
Improved Text Normalization:
- Developed a custom text normalizer to mitigate formatting discrepancies, ensuring accurate evaluation of transcription performance.

Why It Matters:

This benchmark is crucial for builders and developers in AI and voice technology, as it provides a more accurate basis for evaluating models, ultimately leading to improved user experiences in voice interaction applications. With these updates, ElevenLabs aims to set a new standard in speech recognition accuracy for voice agents.

For further details, see the original article.