I've been playing around with OpenAI's advanced voice modes over the last week, and it's the most immersive experience I've had yet of an AI-powered future. This week, my phone laughed at a joke, handed it back to me, asked how my day was, and told me I was “having a good time” – all while I was talking through my iPhone, not using it in my hands.
OpenAI's latest feature, currently in limited alpha testing, doesn't make ChatGPT any smarter than it was before. Instead, Advanced Voice Mode (AVM) makes ChatGPT more approachable and conversational. It creates a new interface for using AI with your device, which feels fresh and exciting, but that's exactly what scares me. The product is a bit glitchy, and the very idea of it scares me, but I was surprised at how much fun I actually had using it.
Taking a step back, I think AVMs, along with agents, fit into OpenAI CEO Sam Altman's broader vision of putting AI models at the center to change the way humans and computers interact.
“Eventually, you'll be able to just ask a computer what you need and the computer will perform all these tasks for you,” Altman said at OpenAI's Dev Day in November 2023. “These capabilities are often talked about in AI as 'agents,' and the benefits will be enormous.”
My friend, ChatGPT
On Wednesday, I put this advanced technology to the ultimate test by asking ChatGPT to order Taco Bell just like President Obama would.
“Well, let me be clear, I’d like a Crunchwrap Supreme and a few tacos just in case,” ChatGPT’s advanced voice mode said. “How do you think he’ll handle the drive-thru?” ChatGPT said, laughing at his own joke.
Screenshot: ChatGPT then transcribes the verbal conversation.
The impersonation matched Obama's iconic intonations and made me laugh heartily, too, while staying in the Juniper tone of my ChatGPT voice of choice so as not to be confused with Obama's voice. It sounded like my friend was doing a bad impersonation, understood exactly what I was trying to say, and even sounded funny. I found it surprisingly fun to talk to this advanced assistant on my phone.
I also asked ChatGPT for advice on how to handle a complicated relationship issue: asking my significant other to move in with me. After explaining the intricacies of our relationship and our career directions, I received very detailed advice on how to proceed. These are questions you would never ask Siri or Google search, but you can with ChatGPT. The chatbot's voice even expressed a slightly serious and kind tone when responding to these prompts, which is a stark contrast to the jokey tone of President Obama's Taco Bell order.
ChatGPT's AVMs can also help you understand complex subjects. I asked them to break down line items on an earnings report (such as free cash flow) in a way that a 10-year-old could understand. They used the example of a lemonade stand to explain some financial terminology in a way that even my younger cousin could understand. You can also ask ChatGPT's AVMs to speak more slowly to match your current level of understanding.
Siri walked, so AVM could run.
ChatGPT's AVM is the clear winner compared to Siri and Alexa, due to its faster response times, unique answers, and ability to answer complex questions that previous generations of virtual assistants couldn't answer. But AVM falls short in other areas: ChatGPT's voice capabilities can't set timers or reminders, browse the web in real time, check the weather, or interact with your phone's APIs. It's not an effective replacement for a virtual assistant, at least for now.
Compared to Google's competitor Gemini Live, AVM feels slightly better: Gemini Live can't imitate, can't express emotion, can't speed up or slow down, and takes longer to respond. Gemini Live has more voices (10 compared to OpenAI's 3) and seems more up-to-date (Gemini Live knew about Google's antitrust ruling). Of note, neither AVM nor Gemini Live sings, likely to avoid copyright lawsuits from the recording industry.
That said, ChatGPT's AVM is glitchy (and Gemini Live's is too, to be fair). It sometimes cuts off mid-sentence and forces you to start over. There's also a weird, grainy audio every now and then, which is a bit annoying. I'm not sure if this is an issue with the model, my internet connection, or something else, but some technical glitches like this are to be expected in alpha testing. However, these issues didn't ruin the experience of literally talking on the phone.
These examples, to me, are the beauty of AVM. This feature doesn't make ChatGPT omnipotent, but it does allow you to interact with the underlying AI model, GPT-4o, in a human-like way (you can forget that there's no human on the other end of the phone). When you're talking to AVM, ChatGPT feels socially aware, but of course it's not. It's just a neatly packaged bundle of predictive algorithms.
Talking Technology
To be honest, this feature worries me. It's not the first time a tech company has offered friending on our phones. My generation, Gen Z, is the first to have grown up with social media where companies offer connection but play on our collective insecurities. Conversations with AI devices like the one AVM offers seem like an evolution of the social media “phone friend” phenomenon, offering a cheap connection that tickles our human instincts. But this time, it leaves humans out of the loop entirely.
Artificial human connection has become a surprisingly popular use case for generative AI. Today, people use AI chatbots as friends, mentors, therapists, and teachers. When OpenAI launched its GPT store, it was immediately inundated with “AI girlfriends”—chatbots specialized in the role of your significant other. Two researchers from the MIT Media Lab warned this month to prepare for “addictive intelligence”—AI companions with dark patterns that make us obsessed. We may be opening Pandora’s box and letting our devices find new and fascinating ways to capture our attention.
Earlier this month, a Harvard dropout shook up the tech world when he sported an AI necklace called Friend: a wearable device that, if it lives up to its promises, will be constantly listening, letting you chat with a chatbot that will text you about your life. The idea seems crazy, but innovations like ChatGPT's AVM give us reason to take these use cases seriously.
While OpenAI is leading the way on this front, Google is close behind, Amazon and Apple are also racing to build this functionality into their products, and I believe it may soon become an industry requirement.
Imagine being able to ask your smart TV for very specific movie recommendations and have it give you those exact results. Or telling it the exact symptoms of your cold and it can order tissues and cough medicine from Amazon and even give you home remedy suggestions. Instead of manually Googling everything, you could even ask your computer to plan a weekend getaway with the family.
Of course, these actions require a quantum leap in the world of AI agents. OpenAI's effort in that area, GPT Store, seems like an overhyped product and is no longer much of a focus for the company. But AVM at least handles the “talk to computers” part of the puzzle. These concepts are still a long way off, but after using AVM, they seem a lot closer than they were last week.