Audio In, Text Out

I haven't normally written predictive articles. However, I spent most of the last 3 years telling people that taste was all that would matter in the future thanks to new AI advances (and, in mid-2022, got weird looks). This is now standard discourse on social media.
I want to put down a similar prediction for posterity.

Current AI LLM model chat UI/UX is byzentine. Additionally, nobody has nailed a voice mode yet. Typing out content to the model feels like you're doing homework, and listening to the model's speach response feels like sitting in timeout.

If you've ever used visual voicemail on your phone, you'll know how nice it is to be able to read the audio out. In fact, I'd be willing to be that you seldom actually listen to the audio if the transcription seems accurate.

Humans generally speak faster than we type and read faster than we can listen. I predict that the future optimal UX of AI models (LLM or otherwise) will be audio in, text out. For those who really want it, interfaces should have optionality to toggle back on audio (speech) responses.

This format solves the major pain points of interacting with AI models, primarly feeling like you're having to slow down in order to do your best work.