What is OpenAI Whisper?
Whisper, the prodigious automatic speech recognition (ASR) wizard, is flexing its muscles in the arena of speech processing.
With the wind of a staggering 680,000 hours of global, multitask web data in its sails, it is remarkably resistant to the hurdles of accents, background hubbub, and the quirks of tech jargon.
Beyond its transcription expertise in a multilingual context, it has a knack for translating from an array of languages into English. A grand giveaway, we’re rolling out our models and inference code to spark inspiration, stimulate practical implementations, and fuel ongoing speech processing research.
How does OpenAI Whisper work?
Painting the Whisper Masterpiece
Whisper’s blueprint is uncomplicated yet effective, embracing the famed encoder-decoder Transformer design. It all starts with an audio clip, divided into 30-second fragments, which are then turned into a log-Mel spectrogram. The encoder seizes these fragments while a decoder is on a mission to divine the corresponding text caption. Our star model is guided by unique tokens, choreographing tasks such as identifying languages, establishing phrase-level timestamps, transcribing multilingual speech, and performing to-English speech translation.
The Training Method Behind the Maestro
Contrary to many existing strategies, Whisper doesn’t limit itself to smaller, tightly-coupled audio-text training datasets, nor does it adhere to the trend of unsupervised audio pretraining. Having been trained on a generous and diverse dataset without any specific fine-tuning, Whisper doesn’t top the charts on LibriSpeech performance, a renowned speech recognition benchmark. Nevertheless, its adaptability shines when it’s thrown into the wild, demonstrating versatility across numerous diverse datasets and slashing errors by half compared to the conventional models.
Whisper’s Melting Pot of Languages
About one-third of Whisper’s training diet consists of non-English audio, providing it with a rich multilingual banquet. It is alternatively tasked with transcribing in the original language or translating into English. This approach has proven especially potent for learning speech-to-text translation, outstripping supervised state-of-the-art results on CoVoST2 to English translation tasks, and managing to achieve this without any prior task-specific training.
How do I Use OpenAI’s Whisper?
Currently, you need to use Whsiper through OpenAI’s API.
Click here to view OpenAI’s Whisper documentation to learn how to use the API.
How Much is Whisper OpenAI?
Whisper costs 6 cents per 10 minutes of transcriptions.
Is Whisper OpenAI Free?
No, whisper costs 6 cents per 10 minutes transcribed.
Is OpenAI Whisper Open Source?
Yes, Whisper is open source!
Did Whisper Get Shut Down?
No, it is still available via OpenAI’s API
Can Whisper AI do Text to Speech?
No, Whisper only does speech to text, not the other way around, text to speech.