Clivern
π° πππππ πππ π΄πππππππ πππ πΎπππππππππ ππππππ.
10 September 2025
An embedding is just a way to turn text into a list of numbers. Same idea as coordinates on a map. sentences that mean similar things end up close together, even when they use different words.
In a RAG setup you do this in two passes. First, you split your docs into chunks, turn each chunk into numbers, and save those numbers in a database (Qdrant, Chroma, whatever).
Later, when someone asks a question, you turn the question into numbers too, find the chunks that are closest, and send only those chunks to the LLM. The LLM does not read your whole library, something else finds the good bits first. Embeddings are what let you search by meaning, not by exact words.
Install what we need with uv:
$ uv init
$ uv add numpy sentence-transformers
We use four short docs and one question. The code turns each line of text into a list of numbers using the BAAI/bge-m3 model from Hugging Face:
from sentence_transformers import SentenceTransformer
DOCUMENTS = [
"Python is a high-level, general-purpose programming language.",
"The PyTorch library is widely used for deep learning and neural networks.",
"ChromaDB is an open-source vector database specialized for AI embeddings.",
"OpenAI relies on cloud-hosted API architecture for model evaluation.",
]
QUERY = "What tool should I use to store vector data locally?"
model = SentenceTransformer("BAAI/bge-m3")
document_vectors = model.encode(DOCUMENTS, normalize_embeddings=True)
query_vector = model.encode(QUERY, normalize_embeddings=True)
normalize_embeddings=True scales each list of numbers so they all have the same length.
We use the same model for the docs and the question, so everything ends up in the same kind of space and the scores mean something.
Score each doc against the question:
import numpy as np
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
def l2_distance(a: np.ndarray, b: np.ndarray) -> float:
return float(np.linalg.norm(a - b))
In the following code we score every doc against the question at once, sort by cosine similarity, and pick the best match from DOCUMENTS.
def rank_matches(query_vector, document_vectors):
cosine_scores = document_vectors @ query_vector
l2_distances = np.linalg.norm(document_vectors - query_vector, axis=1)
ranked = sorted(
enumerate(cosine_scores),
key=lambda item: item[1],
reverse=True,
)
return [
(idx, float(cosine_scores[idx]), float(l2_distances[idx]))
for idx, _ in ranked
]
results = rank_matches(query_vector, document_vectors)
best_idx = results[0][0]
best_doc = DOCUMENTS[best_idx]
print(f"Query: {QUERY}")
print(f"\nBest match: {best_doc}")
Run it and the script prints your question plus the best-matching doc:
$ uv run python main.py
Query: What tool should I use to store vector data locally?
Best match: ChromaDB is an open-source vector database specialized for AI embeddings.
You can swap in your own documents, embed a whole folder, or plug this into a RAG flow. The model and the compare step stay the same.