Google has introduced its new generative AI platform called Gemini, aiming to make a significant impact in the field. However, while Gemini shows promise in certain aspects, it falls short in others. This guide provides essential information about Gemini, its applications, and how it compares to competitors.
Gemini is Google’s next-generation generative AI model family, developed by DeepMind and Google Research. It consists of three models: Gemini Ultra (the flagship model), Gemini Pro (a “lite” version), and Gemini Nano (a smaller model for mobile devices). These models are “natively multimodal,” capable of working with text, audio, images, and videos. Unlike models like LaMDA, Gemini’s capabilities extend beyond text.
Gemini Ultra: The best of the Gemini family
Gemini Ultra stands as the flagship model, representing the pinnacle of the Gemini family. Currently, access to Gemini Ultra is limited to a “select set” of customers across specific Google apps and services. Expected to launch more broadly later this year, Gemini Ultra is designed as the foundation model on which others are built.
Information about Gemini Ultra has primarily come from Google-led product demos, and its capabilities, though promising, should be taken with caution. According to Google, Gemini Ultra can assist with tasks like physics homework, step-by-step problem-solving on worksheets, and identifying potential mistakes in filled-in answers. Additionally, it can be applied to tasks such as identifying scientific papers relevant to a specific problem, extracting information, and updating charts with generated formulas.
While technically supporting image generation, this capability won’t be available in the productized version at launch. Google’s claim of exceeding state-of-the-art results on academic benchmarks suggests high expectations for Gemini Ultra, and its broader release is eagerly anticipated to assess its true impact and capabilities.
Difference between Bard and Gemini
Google’s branding might confuse, but Gemini is distinct from Bard. Bard serves as an interface to access certain Gemini models, while Gemini represents the family of models. In comparison to OpenAI’s products, Bard corresponds to ChatGPT, and Gemini corresponds to the underlying language model like GPT-3.5 or GPT-4.
Due to their multimodal nature, Gemini models theoretically perform various tasks, from transcribing speech to generating artwork. While some capabilities are still in development, Google promises a broader range of functions in the future. Each tier of Gemini models, including Ultra, Pro, and Nano, offers unique features.
Gemini Pro and Gemini Nano
Gemini Pro is publicly available, but its capabilities depend on usage. In Bard, it improves over LaMDA in reasoning and understanding. However, users have reported challenges, including errors in complex math problems. Gemini Pro is also accessible via API in Vertex AI, allowing developers to customize it for specific contexts and use cases. Google plans to integrate Gemini Pro into conversational voice agents and search features in Vertex AI.
Gemini Nano is a smaller version suitable for running directly on some phones. It powers features like Summarize in Recorder and Smart Reply in Gboard on the Pixel 8 Pro. Smart Reply suggests responses in messaging apps, initially supporting WhatsApp and expanding to more apps in 2024.
Gemini costs
Currently, Gemini Pro is accessible at no cost in Bard, AI Studio, and Vertex AI Preview, providing users with an opportunity to explore its capabilities. However, it’s crucial to note that once Gemini Pro exits the preview phase in Vertex, there will be associated costs. The pricing structure involves $0.0025 per character for input and $0.00005 per character for output. For the convenience of Vertex customers, payments are calculated per 1,000 characters or images, with additional considerations for models like Gemini Pro Vision.
Gemini vs. GPT-4: A comparative outlook
The true measure of Gemini’s capabilities against GPT-4 will become apparent with the release of Gemini Ultra. Google asserts its superiority over benchmarks, positioning Gemini as a frontrunner in various tasks. While this claim sparks anticipation, early assessments of Gemini Pro reveal challenges, including reported inaccuracies and suboptimal coding suggestions. The ongoing development and refinement of Gemini models will likely play a pivotal role in shaping the narrative of their performance compared to GPT-4.
Where to try Gemini
For those eager to experience Gemini Pro, avenues such as Bard, Vertex AI Preview, and AI Studio offer accessibility. Notably, the integration of Gemini with Duet AI for Developers and other Google development tools, slated for early 2024, will further expand the reach and applications of Gemini. Meanwhile, Gemini Nano, designed for mobile efficiency, has found its initial placement on the Pixel 8 Pro and is poised to extend its availability to a broader range of devices in the future.