Wednesday, 14 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » Google AI Introduces Gemini Embedding: A Novel Embedding Model Initialized from the Powerful Gemini Large Language Model
AIMachine LearningTechnology

Google AI Introduces Gemini Embedding: A Novel Embedding Model Initialized from the Powerful Gemini Large Language Model

capernaum
Last updated: 2025-03-13 18:47
capernaum
Share
Google AI Introduces Gemini Embedding: A Novel Embedding Model Initialized from the Powerful Gemini Large Language Model
SHARE

Recent advancements in embedding models have focused on transforming general-purpose text representations for diverse applications like semantic similarity, clustering, and classification. Traditional embedding models, such as Universal Sentence Encoder and Sentence-T5, aimed to provide generic text representations, but recent research highlights their limitations in generalisation. Consequently, integrating LLMs has revolutionised embedding model development through two primary approaches: improving training datasets via synthetic data generation and hard negative mining, and leveraging pre-trained LLM parameters for initialisation. These methods significantly enhance embedding quality and downstream task performance but increase computational costs.

Recent studies have also explored adapting pre-trained LLMs for embedding tasks. Sentence-BERT, DPR, and Contriever have demonstrated the benefits of contrastive learning and language-agnostic training for embedding quality. More recently, models like E5-Mistral and LaBSE, initialised from LLM backbones such as GPT-3 and Mistral, have outperformed traditional BERT and T5-based embeddings. Despite their success, these models often require large in-domain datasets, leading to overfitting. Efforts like MTEB aim to benchmark embedding models across diverse tasks and domains, fostering more robust generalisation capabilities in future research.

The Gemini Embedding Team at Google introduces Gemini Embedding, a state-of-the-art model that generates highly generalisable text representations. Built on Google’s powerful Gemini large language model, it leverages multilingual and code comprehension capabilities to enhance embedding quality across diverse tasks such as retrieval and semantic similarity. The model is trained using a high-quality, heterogeneous dataset curated with Gemini’s filtering, selection of positive/negative passages, and generation of synthetic data. Gemini Embedding achieves state-of-the-art performance on the Massive Multilingual Text Embedding Benchmark (MMTEB) through contrastive learning and fine-tuning, surpassing previous models in multilingual, English, and code benchmarks.

The Gemini Embedding model builds on Gemini’s extensive knowledge to generate representations for tasks like retrieval, classification, and ranking. It refines Gemini’s initialised parameters and applies a pooling strategy to create compact embeddings. The model is trained using a noise-contrastive estimation (NCE) loss with in-batch negatives, while a multi-loss approach adapts embeddings across sub-dimensions. The training process includes a two-stage pipeline: pre-finetuning on large datasets and fine-tuning on diverse tasks. Additionally, model ensembling enhances generalisation. Gemini also aids in synthetic data generation, filtering, and hard negative mining to refine the model’s performance across multilingual and retrieval tasks.

The Gemini Embedding model was evaluated across multiple benchmarks, including multilingual, English, and code-based tasks, covering over 250 languages. It demonstrated superior classification, clustering, and retrieval performance, consistently surpassing other leading models. The model achieved the highest ranking based on Borda scores and excelled in cross-lingual retrieval tasks. Additionally, it outperformed competitors in code-related evaluations, even when certain tasks were excluded. These results highlight Gemini Embedding as a highly effective multilingual embedding model, capable of delivering state-of-the-art performance across diverse linguistic and technical challenges.

In conclusion, the Gemini Embedding model is a robust, multilingual embedding solution that excels across various tasks, including classification, retrieval, clustering, and ranking. It demonstrates strong generalisation even when trained on English-only data, outperforming other models on multilingual benchmarks. To enhance quality, the model benefits from synthetic data generation, dataset filtering, and hard negative mining. Future work aims to extend its capabilities to multimodal embeddings, integrating text, image, video, and audio. Evaluations on large-scale multilingual benchmarks confirm its superiority, making it a powerful tool for researchers and developers seeking efficient, high-performance embeddings for diverse applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

🚨 Meet Parlant: An LLM-first conversational AI framework designed to provide developers with the control and precision they need over their AI customer service agents, utilizing behavioral guidelines and runtime supervision. 🔧 🎛 It’s operated using an easy-to-use CLI 📟 and native client SDKs in Python and TypeScript 📦.

The post Google AI Introduces Gemini Embedding: A Novel Embedding Model Initialized from the Powerful Gemini Large Language Model appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Coinbase Delists Three Memecoins in New York—Here’s Why Coinbase Delists Three Memecoins in New York—Here’s Why
Next Article Analyst Explains Why An XRP Price Rally To $10 Is ‘Reasonable’ Analyst Explains Why An XRP Price Rally To $10 Is ‘Reasonable’
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

By capernaum

ServiceLink expands closing technology

By capernaum
Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization
AIMachine LearningTechnology

Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

By capernaum

FHA cites AI emergence as it ‘archives’ inactive policy documents

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?