Gemini Embedding 2

Gemini Embedding 2 is a multimodal embedding model developed by Google, capable of converting text, images, video, audio, and documents into a single vector space.
Unlike conventional embedding models that handle only text, the defining feature of this model is its ability to map 5 types of media into a single semantic space. For example, an audio clip of an abnormal factory sound and a text document describing the corresponding equipment troubleshooting procedure can be placed in close proximity in vector space — enabling cross-modal search within a single model. In RAG pipelines where non-text knowledge needs to be searchable, this significantly reduces the overhead of preparing separate models for each modality.
The input window is 8,192 tokens, allowing for larger chunk sizes. Output dimensions reach up to 3,072, but thanks to the Matryoshka architecture, they can be reduced to 1,536 (balanced) or 768 (optimized for low-latency search). Task optimization parameters are also available, allowing the mathematical properties of vectors to be adjusted based on use cases such as retrieval and classification.
With native support for over 100 languages, the model is well-suited for multilingual RAG and cross-lingual search. Official integrations with LangChain, LlamaIndex, Weaviate, Qdrant, and ChromaDB are provided, enabling seamless incorporation into existing vector database infrastructure.
Pricing is $0.25 per 1 million tokens, with a free tier available. Migrating from the conventional text-embedding-004 is straightforward in terms of swapping the model ID, but since the vector spaces differ, existing indexes will need to be rebuilt. When fully leveraging multimodal input, careful design is required — including decisions on the granularity at which images and audio are included in the index, and balancing search accuracy against storage costs.
Related Terms

AI ROI (Return on Investment in AI)
AI ROI is a metric that quantitatively measures the effects obtained — such as operational efficienc

AI Observability
An operational practice of continuously monitoring and visualizing the inputs/outputs, latency, cost

Ambient AI
Ambient AI refers to an AI system that is seamlessly embedded in the user's environment, continuousl

BPO (Business Process Outsourcing)
BPO refers to a form of outsourcing in which a company delegates specific business processes to an e