Embedding
An embedding converts words, sentences, or documents into a list of numbers that captures their meaning. Similar concepts end up with similar numbers. “dog” and “puppy” would have nearly identical embeddings, while “dog” and “spreadsheet” would be very different. This lets computers compare the meaning of text mathematically: search for similar documents, cluster related topics, or find the most relevant passage to answer a question.
An embedding is a dense vector representation of data (text, images, audio) in a continuous vector space where geometric proximity corresponds to semantic similarity. A text embedding model maps a string to a fixed-length array of floating-point numbers (typically 256-3072 dimensions).
Properties:
- Semantic similarity: texts with similar meaning have high cosine similarity between their embeddings
- Dense: every dimension carries information (vs. sparse representations like bag-of-words where most values are zero)
- Fixed-size: regardless of input length, the output vector has the same dimensionality
Embedding models:
| Model | Dimensions | Provider |
|---|---|---|
| text-embedding-3-small | 1536 | OpenAI |
| text-embedding-3-large | 3072 | OpenAI |
| embed-v4 | 1024 | Cohere |
| BGE-large-en-v1.5 | 1024 | BAAI (open-source) |
| all-MiniLM-L6-v2 | 384 | Sentence Transformers (open-source) |
Similarity metrics:
- Cosine similarity: measures the angle between two vectors. Range: -1 to 1 (1 = identical meaning).
- Dot product: similar to cosine but sensitive to vector magnitude.
- Euclidean distance: straight-line distance between vectors.
Applications:
- Semantic search: find documents by meaning, not just keyword match
- RAG: retrieve relevant context for LLM generation
- Clustering: group similar documents, tickets, or customer feedback
- Recommendation: “users who liked X also liked Y”
- Anomaly detection: flag inputs that are far from any known cluster
Creating and comparing embeddings
from openai import OpenAI
import numpy as np
client = OpenAI()
def embed(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
def cosine_similarity(a: list[float], b: list[float]) -> float:
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
# Compare semantic similarity
e1 = embed("How do I configure a VLAN?")
e2 = embed("Setting up virtual LANs on a switch")
e3 = embed("Best chocolate cake recipe")
print(cosine_similarity(e1, e2)) # ~0.89 (very similar)
print(cosine_similarity(e1, e3)) # ~0.12 (unrelated) Embeddings are the foundation of modern search and recommendation systems. Every RAG pipeline starts by embedding documents into a vector database. Google Search uses embeddings to understand query intent beyond keyword matching. GitHub Copilot uses code embeddings to find relevant code snippets. In IT operations, embeddings power intelligent log search (“find errors similar to this one”), documentation search, and ticket routing. The choice of embedding model significantly impacts quality: domain-specific models (trained on code, medical text, or legal documents) outperform general-purpose models for specialized tasks. Running embedding models locally (Sentence Transformers, Ollama) is practical on consumer hardware, making privacy-preserving semantic search accessible.