AI race: Google Hits Hard With Its Gemini Model!

After months of waiting, Google announced on December 6, 2023, the launch of Gemini, its new artificial intelligence (AI) model. It is an essential date in the history of the American firm because it now offers mighty multimodal AI, which will undoubtedly revolutionize the way we use and consider this technology.

Intended to compete with the famous ChatGPT, this language model (or large language model in English – LLM) is much more advanced than Bard, Google’s previous AI. It was also presented to the press by Eli Collins (vice-president of Google DeepMind) as being their “most consistent, most talented and also most general AI model.” It’s a great promise in short, but to verify its integrity, we will have to wait a little.

In fact, it has yet to be available in France. Offered only in English for the moment, it is deployed in 170 countries, but those in the European Union are excluded. No specific availability date has been announced. We know that it should be launched in the European Union in 2024 and that, in the meantime, Bard will have additional conditions at the beginning of next year.

However, we still have to take an interest in this AI and present it to you in every detail.

Gemini, What Is It Actually?

Gemini is Google’s more sophisticated answer to ChatGPT. A new high-performance and multimodal language model. This means that it has not been trained solely on text and is able to process and react to different types of information, such as:

Audio
Code
Video
Of text
Images

It can respond in writing and orally, and according to Google’s press release, has “sophisticated reasoning capabilities on all types of inputs.” It can, therefore, understand and analyze from data:

A context
Emotions
Intentions
Relationships
Concepts

To achieve such a result, Google designed Gemini to be “natively multimodal.” He first “pre-trained it to process varied modalities,” and then “its effectiveness was enhanced with additional multimodal data.”

According to the first tests carried out, the effectiveness of this AI has been demonstrated in various fields, including mathematics, physics, law, medicine, and history. In other words, it is a language model with unique capabilities and remarkable performance. It is, therefore, much more developed than Bard, which was launched by Google at the beginning of 2023 and is proving revolutionary. For example, it can:

Generate images, videos, sounds, or graphics from a few-word description;
Writing song lyrics, stories, or jokes;
Provide precise, clear, and relevant answers to questions on various subjects;
Translate words and sentences into many long ones;
Identify objects and animals in drawings;
Play music;
Reason about complex issues.

Its capabilities, therefore, exceed, on paper, those of ChatGPT at present.

The Three Versions Offered By Google

This multimodal language model comes in 3 versions, which are optimized for specific tasks—new proof of the power and performance of Gemini.Three distinct models, therefore, to meet different needs and uses Ultra, Pro, and Nano.

Gemini Ultra

The Ultra version is the most powerful and most complete. She can handle really complex tasks and has a very in-depth ability to understand different data. This version is primarily intended for uses requiring advanced reasoning and analysis (scientific research, for example). Usually, it should be integrated into Google Bard Advanced in early 2024 after testing with customers, developers, partners, and cybersecurity experts.

Gemini Pro

The Pro version of Gemini is characterized by its versatility and increased performance on a wide range of tasks. It is ideal for general applications (data analysis, software development, for example) and is already integrated into Bard (except in Europe).

Gemini Nano

The Nano version is the lightest. It was developed for tasks on mobile devices and other resource-constrained equipment. It is suitable for applications operating in real-time, such as intelligent responses and AI functionalities present in connected devices and smartphones. For example, we already found it on Google’s Pixel 8 Pro, which was released in October 2023. It can, in particular, be used to generate automatic responses (in English) in the WhatsApp application. With these three models, Google highlights the flexibility of its AI.

Also Read: Five Ways To Earn With ChatGPT

Multimodality At The Heart Of This Technology

With Gemini, the Californian company wants to take its revenge on OpenAI and its famous ChatGPT. It must be said that its launch and its success took Google by surprise. The arrival of ChatGPT absolutely did not go unnoticed around the world and was a real revolution.

Google reacted as quickly as possible by announcing the launch of Bard in early 2023 in certain countries (in July 2023 in France). But we must recognize that its capabilities are limited, and coming from Google, we expected better. With Gemini, it’s done. Its performance is on par with previous innovations from the American firm, if not superior.

A Breakthrough In AI

Indeed, Google has equipped this AI with very advanced and innovative multimodal capabilities. This is one of its notable differences from ChatGPT. And this multimodality makes Gemini particularly effective. Moreover, according to Google, during training, this language model displayed a performance that “far exceeds that of existing models.”

The American giant also claims that Gemini’s results are better than other LLMs (including ChatGPT-4, the latest version of OpenAI) in 30 of the 32 academic benchmarks used in the research and development of these technologies.

Thanks to this multimodality, Gemini can :

Be used in different fields such as science, finance, law, and medicine;
Locate and analyze information in large amounts of data;
Reason about complex subjects.

More than a chatbot (or conversational agent), this language model has comprehension and deduction capabilities, which can be closer to those of humans. This can be scary, but it also gives a glimpse of the multiple possibilities offered by AI.

How Will Gemini Be Integrated Into Google Services?

As we indicated in a previous paragraph, three versions of Gemini have been developed, two of which are already offered to users of Bard (outside Europe) and Pixel 8 Pro. But other uses are planned by Google. Thus, from 2024, this AI should appear in:

The American firm’s search engine
Online advertisements that are offered to Internet users based on their searches.

Concretely, the integration of Gemini with the Google search engine should make it possible to obtain answers:

More accurate
Faster
More qualitative thanks to a better understanding of the research context
More complete with different types of content (text, videos, photos, sounds, for example).

This will, therefore, have a positive impact on the user experience.

For advertisements, Gemini should personalize the promotions offered to Internet users better so that they are more relevant.

This improvement is made possible by the fact that this AI better understands the context of Internet users’ requests and their intentions. This will allow Google to target the advertisements displayed more effectively—a good point for Internet users and advertisers. Furthermore, the first positive effects of Gemini were demonstrated with the Pro version, which really improved Bard’s comprehension, reasoning, and planning abilities.

How To Access Gemini In France?

To date, this new AI is available free of charge in 170 countries, except in France and Europe. In question is the GDPR legislation (General Data Protection Regulation) in force in the European Union. Legislation that had already delayed the launch of Bard in European countries.

To benefit from it, you must wait until 2024… in theory. In practice, it is possible to access it in France using a VPN (Virtual Private Network), as the site explains well < /span>. By creating a virtual tunnel between a computer and a server based in another country, this tool makes Gemini accessible to French and European Internet users. Here are the steps to follow to discover the potential of this AI: Digitals*

Have a Google account;
Install VPN software;
Select a server located in a country with access to Gemini (the United States, for example);
When the connection is established, go to the Bard web interface and test the capabilities of this new AI, which, as a reminder, is only offered in English at the moment.

It Is An Impressive Promotional Video…But Slightly Misleading

To present Gemini, Google had everything planned (or almost) with a press presentation and a demonstration video. And the firm still needs to skim on resources. Its video presentation of the capabilities of its new multimodal AI model lasts 6 minutes and 23 seconds, with stunning demos. In this video, a person challenges Gemini by showing him images and asking him to reason to indicate what he sees.

Various Challenges Taken Up By Gemini

Here are some examples of the challenges seen in this video demonstration:

The AI manages to recognize a drawing of a duck and provides information about this animal. Then, he identifies the color of the duck, blue in the test, and specifies that it is not “a common color among ducks” although “there are a few breeds of blue ducks”;
He also managed to identify the material of one object, a duck in the test that “appears to be made of rubber or plastic,” and translate the word “duck” into several languages, including Mandarin;
Gemini also came up with a game idea from a world map, which he called “Guess the Country,” and gave clues to the person taking the test to find countries;
He was able to recognize, by looking at the gestures of two hands, the game “Rock, Paper, Scissors”;
He was also able to identify the shape of objects and tell if they were edible;
He recognized instruments drawn on a sheet (a guitar, an amp, a drum kit) and released sounds and music associated with these instruments.

There are so many challenges that illustrate Gemini’s capabilities very well. And we must admit that it’s pretty impressive. A little too much.

It’s A Video Not Shot In Actual Conditions

Indeed, in a blog article, Google admitted that the responses provided by the AI in the video are, in reality, more fragmented (therefore less fluid). Likewise, the instructions given to Gemini during testing are actually more precise than those heard in the voiceover video—a little arrangement with reality, in short.

However, AI’s reasoning, understanding, and deduction capabilities are still intact. The answers given during the tests were not made up, but they were not obtained as quickly as one might believe from watching the video. To justify this staging, Google recalled having indicated, in the description of the video, that “for this demo, the latency has been reduced, and the messages from Gemini have been shortened for the sake of brevity.”

Therefore, this demonstration video is neither wholly true nor completely false. On the other hand, Google does not specify which version of Gemini was tested, and according to Bloomberg, it is the Ultra version (the most powerful). However, it is the only one currently unavailable…. And to this day, Google has still not indicated the version used during the demo.

Still, this video achieved its goal of proving the feasibility and viability of its multimodal language model.

Is Gemini Really A Match For ChatGPT?

Since the presentation of this AI, a question has repeatedly come up: Is Gemini more powerful and efficient than ChatGPT? If Google is to be believed, the answer is yes. This is logical since Gemini is intended to compete with ChatGPT. With this technology, the Californian firm wants to catch up and strike hard.Today, as this AI is not accessible in France and Europe (apart from the use of a VPN), we cannot comment on Gemini’s actual capabilities. But we can consider them based on the following:

Tests performed;
The results obtained in the 32 academic benchmarks used in the research and development of these language models;
User experience in countries with access to Gemini.

Therefore, ChatGPT -4’s writing capabilities are similar to those of Gemini. But, when it comes to multimedia, Google’s AI is much more efficient. We can also confirm Google’s claim that its teams have designed the first “true multimodal model.”

For the rest, we must wait for the launch of Gemini in Europe and France and its availability in France to decide, especially since Google has already planned new features for its AI in 2024, such as advanced help in solving math problems. In addition, OpenAI will likely respond to this Google technology by further improving the functionality of ChatGPT (which, for example, has integrated speech and vision since September 2023).

Also Read: Programming With ChatGPT: Six Useful Tips