When AI became mainstream, it was mainly because of ChatGPT. A chatbot created to help out with tasks, ChatGPT went viral in 2022 when millions of users started to embrace the groundbreaking technology. It’s almost two years now, and ChatGPT has become both a celebrated and criticised AI chatbot.
What makes it stand out? Three things. One, ChatGPT is easy to use. Anyone with the knowledge of how to use a computer or a phone can simply head over to OpenAI’s website or download the application on their phone, type in a prompt, and just like that, generate either ideas or text using the chatbot.
The second thing is, when OpenAI, the company behind ChatGPT, released the chatbot to the public, they made it free for everyone. That meant that without incurring any cost, users could prompt the AI and accomplish tasks that would have otherwise demanded a lot of time and effort.
The third thing that made ChatGPT stand out is its almost human replication. The results that came from user prompts were almost indistinguishable from what a human could produce, and the thought of it was marvelling as much as it was uncanny.
Keeping in mind that the version released in November 2022 was only ChatGPT 3.5, many wondered what more the AI could do given that it was still in development. We now have the answer to the question, thanks to the release of ChatGPT-4o on May 13th 2024.
ChatGPT-4o is unlike anything we have witnessed before when it comes to an AI chatbot. An all-in-one large language model, GPT-4o cuts across audio, image, text and video, allowing users to interact in real-time with the chatbot as they can with a human.
But before we delve into the new mind-blowing features of ChatGPT-4o, let us have a look at all the previous versions of ChatGPT.
Versions of ChatGPT
There have been several versions of Generative Pre-trained Transformer (GPT) namely:
- ChatGPT-1: Released in 2018, ChatGPT-1 was the first version of the chatbot. It used books to generate words and texts in an unsupervised learning environment and was made up of 117 parameters.
- ChatGPT-2: Released in 2019, ChatGPT-2 was a major improvement from its predecessor. It was trained on about 1.5 billion parameters which made the generated text output more logical and the speed of generation faster.
- ChatGPT-3: Released in 2020 and trained on 175 billion parameters, ChatGPT-3 had more advanced text-generation capabilities. It’s with this version that users started to embrace the technology and used it for different areas of their lives including writing articles and drafting emails.
- ChatGPT-3.5: A more advanced version of ChatGPT-3 released in 2022, ChatGPT-3.5 became the dramatic leap forward that attracted millions of users worldwide. It has both supervised and reinforcement learning techniques and is used for simple tasks, the main one being text generation.
- ChatGPT-4: Released in 2023, ChatGPT-4 is a multimodal large language model accepting not only text but also image inputs and outputs text. It has broader knowledge and more advanced reasoning capabilities than the previous versions making it possible to solve complex problems. It is available in the OpenAI API to paying customers.
- ChatGPT-4 Turbo: Released in April 2024, ChatGPT-4 Turbo is a more advanced version of ChatGPT-4 and has vision capabilities to enable users to generate text and images in real time.
- ChatGPT-4o: Released in May 2024, GPT-4o is the latest version of ChatGPT. The “o” in GPT-4o stands for “omni” meaning everything. This version is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image and video and generates any combination of text, audio and image outputs.
New features of ChatGPT-4o
With the capability to take multimodal input and generate similar output, ChatGPT-4o is constantly surpassing the limits of AI. GPT-4o also has a reduced latency period, that is, the response time between what a user prompts and the results they get.
For GPT-4o, this latency period is 0.32 seconds, close to the average human response time which is about 0.21 seconds. Since GPT-4o includes audio input and output, the reduced latency periods make it very easy to have a one-on-one conversation with the AI.
That’s not all. GPT-4o also considers one aspect of conversation that makes a human human, and that is the tone of voice. With the new version of the chatbot, users can receive either an empathetic, sarcastic or jovial response from the AI, depending on their preference, making the conversation seem as human-like as possible. This tone of voice goes as far as the chatbot soulfully singing or dramatically reading a story as prompted.
The ability to capture video and interpret it also means that the chatbot can also interpret an environment as it sees it in real time. In an OpenAI demonstration of the feature, a user puts two chatbots in a similar room and asks one, which has been permitted to view the room around the user, to describe what it can see. The chatbot describes every aspect of the room accurately and even notes some interruption in the room when another user enters it and does activities in the background.
These are just some of the mind-blowing features of the new ChatGPT-4o, which as for now, is only available to paid users. However, OpenAI has promised to make the version free for everyone eventually, which begs the question, with such features at the tip of our hands, what will we be capable of achieving?
Applications of ChatGPT-4o
In a YouTube video, to introduce the version to the world, OpenAI explains all the life-changing applications of ChatGPT-4o, and some of them include:
1. Real-time translation
The reduced latency period of ChatGPT-4o means that because the response time is low, a user can freely request a translated version of a conversation happening real time. In the OpenAI introductory video to GPT-4o, two people, one who speaks Italian and one who speaks English, ask the AI to translate their conversation as it happens.
The chatbot then goes ahead to do exactly that. When one person speaks Italian, it translates what they say to English complete with the tone of voice they use and without missing a beat, back to the English-speaking person and vice versa.
2. Real-time tutoring
Because of the video and audio integration, the chatbot can be able to access what the user sees in real time. For students who are struggling with their assignments, this means that the AI can help teach them what they are supposed to learn like a real-life teacher would.
In demonstration videos shared on the internet, users shared how the AI was able to tutor them on complex mathematical problems. In the videos, the AI is patient and understanding with the student, and employs an empathetic voice to mirror that of a kind teacher. By the end of the demonstration, almost all students seem to understand the concept being taught better.
3. Vision assistance for visually impaired persons
With access to a computer camera feed and audio features, ChatGPT-4o can act as the artificial ears and eyes of a visually impaired person. All the person needs to do is ask the chatbot what it’s seeing in real time, and the chatbot upon taking in the live camera feed, relates what it sees back to the person.
This is demonstrated further in a video shared by Open AI.
4. Real-time coding assistant
ChatGPT-4o also has the ability to interpret what a code means once it’s been prompted with a camera feed or screenshot of the code. This makes the work of a programmer easier as the AI can point out some of the errors that they might have overlooked.
What does this mean?
If you’re like me, you are probably wondering what this means for the future of humanity. If AI can teach a student better than a human teacher, what does this mean for the teacher? If AI can translate real-time conversations, what does this mean for people who have built careers on being translators? If AI can do what we can do, what does this mean for us?
With ChatGPT-4o, OpenAI has just brought the fear of the future closer to reality. And keeping in mind that this is just the fourth version, and that another version is probably being developed as we speak, it makes one wonder what more we should expect.