Yair Lifshitz’s Post

Our posts about ivrit.ai are usually (surprise surprise) written in Hebrew. Today, we're making an exception to accomodate our global viewers as we share our experience with the ElevenLabs' scribe-v1 Speech-to-Text model. As most of you know, we maintain a Hebrew Transcription leaderboard on Huggingface (link in first comment). This was needed as fleurs and common-voice are low-quality datasets with regards to Hebrew. When ElevenLabs launched scribe_v1, we were eager to try it out. The initial results were a mixed bag: some benchmarks where scribe_v1 provides best-in-class results, others where it fumbles significantly. Our analysis mostly hints at issues with long-form transcriptions, where it simply "drops" large parts of the text after 1-2 minutes. We had a quick discussion about this with ElevenLabs support, but the end result is still subpar performance on Hebrew transcription, and we suspect this may be true for other low-resource languages. We think scribe_v1's technology is best-in-class, and are hoping to work together with the ElevenLabs team to fix this. So, if anyone at EL wants to chime in and make this happen (Mati Staniszewski Piotr Dabkowski :)), please reach out to us. We believe this might make scribe_v1 much better at not just Hebrew, so it might be worth your time. Our benchmarks are open, we're a non-profit, and in the last few weeks we open-sourced >5000 hours of speech with high-quality transcriptions. If anyone here has friends at EL and wants to help, please tag them! Kinneret Misgav, PhD Yoad Snapir Yanir Marmor

To view or add a comment, sign in

Explore content categories