Embracing Voice as the New Keyboard

Have you ever considered using your voice to “write” blog posts instead of typing them out? Recently, I had a thought—an epiphany, if you will. What if we recorded our spoken thoughts and then transcribed them into text? It could potentially mirror our thought process more accurately.

Think about this: thoughts are constantly popping into our heads, and our smartphones are always within reach. The moment an idea strikes, we could just pull out our phone and record it. This is much simpler than typing it out. By the time you finish typing, a quick notion can balloon into an arduous task of translating thoughts into words. Not to mention, you often end up losing the momentum, telling yourself, “I’ll jot it down later,” or “I’ll write this in my notes.” Yet, between capturing that thought and turning it into a blog, time passes. The idea gets stale, and surprisingly, the notes you took initially aren’t as comprehensive as the original spark. Details fade. Worse, you lose that stream-of-consciousness vibe that makes thoughts so vivid.

Thus, you’re left with this dilemma: Would shifting to a voice-first approach actually make us better writers or not? With text, you have a structured process but risk distancing from the immediacy and clarity of your initial idea. Voice capture could mean realizing those inspirations instantly, retaining that raw, unfiltered creativity.

It’s a compelling concept to toy with, especially for those who appreciate optimizing processes.

Real person notes

The text above has been written exactly as described with the voice recording method. I have two main observations.

It’s incredibly easy

I’m really surprised how good OpenAI Whisper is transcribing even when I don’t speak clearly with random pauses, half-sentences etc. I literally did only one recording for the text above, and that was it. I did not need to try actively to be clear or anything. I just hit record and spoke naturally.

I can’t get the writing feel natural

I think we’re all now used to “ChatGPT writing.” No matter how many prompts I’ve tried I did not feel like I was able to achieve my tone of voice. The example above is my favorite so far, but still feels very robotic. So, probably I would not use something like this for my personal writings, but I could use it for something more commercial, maybe?

It’s super cheap

If you deal with ChatGPT stuff you already know this but I think this post with 1 whisper + 6-7 ChatGPT calls ended up costing like $0.05. But, you could easily run Whisper + LLama/Gemma locally to make this $0.00

It’s super simple, but if you’re interested, here’s the Python code for OpenAI API.

from openai import OpenAI
client = OpenAI()

audio_file = open("test.mp3", "rb")
transcript = client.audio.transcriptions.create(
  file=audio_file,
  model="whisper-1",
)

speech = transcript.text

completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "You are a helpful assistant who writes blog posts based on given transcripts. I will give you my voice recording transcript, I want you to read it and turn it into a blog post in English. Don't use a magazine-like language, use a blog post language. Use markdown. You can add implied ideas into the text by expanding on them, but don't add completely new stuff."},
    {"role": "user", "content": speech}
  ]
)

print(completion.choices[0].message.content)

Published: 2024-09-15