Google's new AI model is 700x more powerful than OpenAI's flagship



Google is not playing around when it comes to AI competition. Just a week after launching its most powerful model yet, Gemini 1.0 Ultra, the tech giant has unveiled its successor: Gemini 1.5. This new generation of AI models is faster, smarter, and more versatile than ever before.

Gemini 1.5

Gemini 1.5 is the result of Google’s relentless innovation in natural language processing (NLP), the branch of AI that deals with understanding and generating human language. Google claims that Gemini 1.5 can handle up to 1 million tokens of input, equivalent to about 4 million characters or 800,000 words. That’s 700 times more than the previous record-holder, OpenAI’s GPT-4, which can only process 128,000 tokens.

This means that Gemini 1.5 can comprehend and produce longer and more complex texts, such as novels, essays, or speeches while maintaining coherence and quality. It also means that Gemini 1.5 can handle multimodal inputs, such as text, images, audio, and video, and generate outputs in any of these formats. For example, Gemini 1.5 can answer questions based on a video clip or create a song based on a text prompt.

Mixture-of-Experts

Gemini 1.5 is not just one model but a family of models that cater to different needs and applications. The most general-purpose model, Gemini 1.5 Pro, is comparable in performance to Gemini 1.0 Ultra but uses much less computing power. This makes it more efficient and scalable for real-world use cases. Gemini 1.5 Pro is also the first model to use a novel technique called Mixture-of-Experts (MoE). It allows it to dynamically select the most relevant parts for each query rather than run the whole model every time.

Demis Hassabis, CEO of Google DeepMind, the research arm behind Gemini, explained that MoE also enables Gemini 1.5 Pro to integrate different types of data from the start rather than combining them later. “This way, Gemini 1.5 Pro can learn from text, images, and audio simultaneously and leverage the synergies between them,” he said.

The Gemini 1.5 Pro exhibits remarkable “in-context learning” abilities, which means it can acquire a new skill by processing information in a lengthy prompt without requiring further adjustments. 

To test this ability, Google employed the Machine Translation from One Book (MTOB) benchmark, which assesses how well the model can learn from unfamiliar data. When presented with a grammar manual for Kalamang, a language spoken by fewer than 200 individuals worldwide, the model learned to translate English into Kalamang at a level comparable to a person studying the same material.

With Gemini 1.5, Google has again raised the bar for AI research and development. The company says that Gemini 1.5 will power many products and services, such as Google Assistant, Google Translate, and Google Photos. It will also make Gemini Advanced, its conversational AI platform, more capable and competitive. Gemini Advanced is already a strong rival to OpenAI’s ChatGPT Plus, the leading chatbot in the market. Unlike ChatGPT Plus, which is only text-based, Gemini Advanced can handle multimodal inputs and outputs and offer more features and functionalities.

Google is not the only player in the AI race, though. Other companies, such as Anthropic, Facebook, and Microsoft, are also working on their AI models, which may soon challenge Google’s dominance. The AI war is heating up, and Gemini 1.5 is Google’s latest weapon.



Source link