Why we measure success by our worst user experience

We measure success by our worst user experience, not our best. Terrible internet connection. Background noise. Speaking quickly because they're stressed. That user needs perfect voice transcription more than anyone. Most companies measure p50 latency - how fast their product works for the median user. That's like a restaurant saying "half our meals are edible." We measure p99. Meaning 99 out of 100 times, even in the worst conditions, Wispr Flow delivers perfect transcription in under 700 milliseconds. Why obsess over edge cases? Because voice isn't a nice-to-have feature. When someone switches from typing to talking, they're trusting us with their thoughts. One failure breaks that trust forever. To hit p99 at 700ms, we process 100+ tokens in 250 milliseconds. For context, that's faster than you can read this sentence. The technical challenge nearly killed us. Fine-tuning Llama models, rebuilding our inference pipeline, partnering with Baseten for infrastructure that could handle our impossible latency requirements. Most said it couldn't be done. "Just optimize for p50 like everyone else." But p50 means failing half your users. We'd rather be slower to market than wrong when it matters. The user having their worst day deserves our best performance. That's the only metric that counts.

Latency is a vanity metric. You and I know that users will be happy with a few hundred milliseconds more if it dramatically improves the quality of the transcript. Moreover, most people use Bluetooth headsets that add over 300 ms, which is beyond our control. With AI, quality is more important than speed.

Impressive commitment to p99 performance. Building for edge cases at sub-second latency is no small feat — it takes both engineering precision and product conviction. Kudos to the team for holding such a high bar for reliability.

Many say latency is not that important or that it is a baseline metric. For voice, latency is a core differentiator. If you want to really have a conversation, and not commanding/walkie talkie, it has to 'feel' real time. To be able to personalize as well for P99 at 700ms is incredible. I've used it and felt the difference. Still not brave enough to whisper in open offices though. Works beautifully at home. Incredible work by the entire Wispr team, Tanay Kothari Edge cases are the product. Reminded me of the Growth days for M365 mobile app and how every additional 50 millisecond delay in opening a PDF led to 1.3% drop in D1 retention. Didn't showed up in internal testing or in P50/P75 and we didn't 'feel' it as we focused on premium devices while 60% of our audience was in developing nations with <$200 phones. Did show up at P90/P99 though. It becomes a strategy/business question soon as chasing each decimal becomes prohibitively expensive.

Like
Reply

You guys have built a great product. But really? When did P50 became industry standard?

that's a solid way to look at it. nobody remembers the average experience, just the one time things break when they actually need it. most teams don't have the guts to build for edge cases because it's way harder. p99 > p50 all day. Tanay Kothari

Like
Reply

Most “AI products” hide behind pretty UX while their tail latencies are on fire. 

Like
Reply

I miss using Whispr Flow on low connectivity highways and wish there were a lower accuracy on-device fallback model!

This is very commendable and impressive....you went the extra EXTRA mile and took care to understand how human emotions impact speed, voice quality etc. I imagine this being a valuable feature if somebody is making a distress call in this form...cannot afford to fail that

Like
Reply

I have been using your tool for 2 months now and it is one of the best AI tool I have used you so far and it increases my productivity like anything. The accuracy level is 95%, I have seen issues like the application not kicking in when you hit the shortcut and sometimes it didn't transcribe the text as I wanted.

Like
Reply

p99 at seven hundred ms for voice transcription is solid, rebuilt pipelines for tail latency before and baseten helps a lot

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories