ElevenLabsElevenLabs

ElevenLabs Introduces AA-WER v2.0 Speech to Text Accuracy Benchmark

View source

ElevenLabs has released AA-WER v2.0, a significant upgrade focused on improving speech-to-text accuracy for voice agents. This version introduces AA-AgentTalk, a proprietary dataset designed specifically to enhance the performance of models in real-world voice agent interactions.

Key Updates in AA-WER v2.0:

  • New Dataset - AA-AgentTalk:
    • Contains 469 samples (approximately 250 minutes) of speech directed at voice agents.
    • Covers industry jargon, call center interactions, AI agent conversations, and more across 17 accent groups and 8 speaking styles.
  • Cleaned Public Datasets:
    • Manual corrections made to existing public datasets, VoxPopuli and Earnings22, addressing inaccuracies in ground truth transcriptions.
    • New versions are available as VoxPopuli-Cleaned-AA and Earnings22-Cleaned-AA.
  • Removal of AMI-SDM:
    • Dataset removed due to extensive and unresolvable transcription errors.
  • Improved Text Normalization:
    • Developed a custom text normalizer to mitigate formatting discrepancies, ensuring accurate evaluation of transcription performance.

Why It Matters:

This benchmark is crucial for builders and developers in AI and voice technology, as it provides a more accurate basis for evaluating models, ultimately leading to improved user experiences in voice interaction applications. With these updates, ElevenLabs aims to set a new standard in speech recognition accuracy for voice agents.

For further details, see the original article.