The Lesser-Known OpenAI Gem

Oct 4, 2023

It’s called Whisper, a speech-to-text tool.

In Desktop, I know of no other way to try it out than to go to OpenAI’s playground, in “complete” mode (which as of now is “legacy”). Once there, you will see a green microphone icon in the top right corner, and you will be able to test it.

Three months ago, I stumbled upon this speech-to-text tool, and I’m really surprised by the quality of the results. It picks up what I say even if I switch languages in the middle of the sentence,

The killer component: this can be used in combination with ChatGPT to fill in the required context in an extremely quick way.

My algorithm goes like this:

I press a key combination in my system that starts recording audio.
I talk about the problem at hand (sometimes dedicating a few minutes to set the context).
I finish the audio recording, it gets sent to OpenAI’s servers, and I get the text back.
The generated text is copied to my clipboard.
I paste it in ChatGPT’s input box, along with any code, external context, or now even images, to generate a more accurate response.

The results so far have been amazing, in particular since ChatGPT is so good at picking up the nuance of the situation. And even if the transcription misses a few words, it’s still good enough for it to pick up the context.

The OpenAI app already has this built-in, but in general I’m extremely surprised to see the low adoption of this incredible tool.

In particular, speech-to-text in both Android and Iphone is absurdly bad compared to Whisper, and mobile is where I would expect this to be used the most. Someone even called out that text editing on mobile isn’t ok.