OpenAI now has an AI model with vision, and everyone else should be scared
What you need to know
- One day before Google I/O 2024, OpenAI debuted a new AI model known as GPT-4o.
- The “o” in GPT-4o stands for “omni,” referencing the model’s multimodal interaction capabilities.
- GPT-4o appears to bring the multimodal, vision-based functionality touted by companies like Humane and Rabbit to virtually any device.
- OpenAI’s latest model has the potential to displace a handful of products and services, from the Humane AI Pin to the Google Assistant to Duolingo.
This is a big week for artificial intelligence, as OpenAI held an event on Monday, May 13, and Google I/O 2024 is taking place on May 14 and 15 as well. Although reports that OpenAI might be prepping a search competitor didn’t pan out, OpenAI did launch GPT-4o on Monday. The latest AI model from OpenAI is multimodal and can process combinations of vision, text, and voice input. Though it’s still early, quick tests and demos of the GPT-4o model have left both users and AI researchers impressed.
Certain characteristics of GPT-4o make it more likely to displace existing products and services than any other form of AI we’ve seen to date. The support for combinations of vision, text, and voice input takes the novelty factor away from hardware devices like the Humane AI Pin and the Rabbit R1. Response times that are claimed to be as quick as a human when using voice have the potential to make Google Assistant look outdated. Finally, rich translation and learning features could make apps like Duolingo redundant.
We fully expect Google to counter OpenAI’s GPT-4o at I/O 2024, and who knows, Google might debut an offering that’s as good or better than GPT-4o. Regardless, it’s time for the rest of the tech industry to start worrying about OpenAI. Up until now, there were a handful of glaring flaws with ChatGPT and GPT-4 that made it fairly easy to dismiss. Some of these still exist, but OpenAI is crossing many off the list.
Mobile AI is coming fast, and soon, features that were once exclusive to specialized hardware and software will be available right on your smartphone. That has massive implications for the tech industry, and it could displace more than just the Humane AI Pin and the Rabbit R1.
GPT-4o could rival Google Assistant, but Google has something up its sleeve
Google has been slowly trying to move away from the Google Assistant on its own, preferring instead to use an AI-based voice assistant like Gemini. However, with GPT-4o, Google might not have that luxury. By all accounts, GPT-4o is the most powerful AI interface we’ve seen at the consumer level to date. It can answer questions about your surroundings with vision, talk to you with audio, and respond to text.
The slow response times of AI-based voice assistants appear to be a thing of the past with the release of GPT-4o. OpenAI says that the model can deliver responses to questions asked with voice in as few as 232 milliseconds. The average response time is 320 milliseconds, according to OpenAI. The kicker? That’s about the same amount of time it would take a human to come up with a response in a real-world conversation.
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqNText and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYxMay 13, 2024
To be fair, we won’t get to see GPT-4o using voice and video in ChatGPT for real-world tests for a few weeks. However, OpenAI’s demos for the feature were impressive. If real-world performance is even close to what OpenAI has demonstrated, it could be far better than Google Assistant today.
At the time of writing, Google showed off a similar multimodal AI tool in a teaser for I/O 2024, but that was pre-recorded. We won’t know for sure what Google might have up its sleeve until the keynote happens.
One more day until #GoogleIO! We’re feeling 🤩. See you tomorrow for the latest news about AI, Search and more. pic.twitter.com/QiS1G8GBf9May 13, 2024
Still, it feels like OpenAI and ChatGPT are once again a few steps ahead of Google. The company scheduled its event for GPT-4o one day before Google I/O 2024, and that wasn’t by accident. Now, the pressure is on Google to give us a reason why we shouldn’t use ChatGPT instead of Gemini and Google Assistant.
OpenAI exposes dedicated AI hardware yet again
Of course, GPT-4o is also bad news for the creators of dedicated AI hardware devices. All the features that were deemed exclusive to new hardware, like the Rabbit R1 and the Humane AI Pin, will eventually be available straight on your phone through ChatGPT. The exception to this is Rabbit’s Large Action Model (LAM), but until the Rabbit R1 can reliably and quickly perform actions using the LAM, it’s hardly relevant. Instead of paying $200 for the Rabbit R1 or $700 for the Humane AI Pin (plus $24/month), you can get the same functionality for free on your phone.
all ai pins internally + a small number externally are now running gpt-4o! still early, but so far lots of great improvements:14% decrease in latency28% shorter answers33% fewer bad answersbeyond the numbers, everything just feels smarter and more accurate (as expected) pic.twitter.com/H3Y6MGsOc0May 14, 2024
A product design lead for Humane said on Monday that all internal AI pins were already running GPT-4o. However, that brings a whole new suite of issues. If the entire AI Pin software framework could be shifted to a new AI model in just a few hours, that would suggest it is heavily reliant on OpenAI software. In other words, today’s developments give serious weight to the claims that the Humane AI Pin is essentially using a “wrapper” for ChatGPT.
So, if you can get the functionality of a new AI hardware product on your phone, why wouldn’t you? And if these companies are just retooling OpenAI’s models, why not go straight to the source? The release of GPT-4o puts AI hardware companies — that were already in danger — in more peril.
No one is safe as OpenAI plows through AI development
Really, no one is safe from being displaced by OpenAI’s dominance. The perfect example of this is Duolingo, which saw its stock drop by over seven percent almost immediately following OpenAI’s event Monday. GPT-4o has impressive translation and learning capabilities, which appear to have spooked investors. When you think of OpenAI’s competitors, Duolingo probably isn’t the first to come to mind, and yet it was seriously affected by the GPT-4o announcement.
It goes to show that OpenAI can displace anyone, and no hardware or software product is safe. Who knows, the Google Assistant could be next. It’s now up to Google to show us why it’s still a force to be reckoned with in AI at I/O 2024.