A guide to the concepts behind Retrieval-Augmented Generation, embeddings, and what Revelio is actually showing you.
An embedding is a way of representing text as a list of numbers - a vector. The key property is that meaning is encoded geometrically: text that means similar things produces vectors that are close together in space, while unrelated text produces vectors that are far apart.
For example, the words king and queen will have embeddings that are much closer to each other than either is to bicycle.
"The cat sat on the mat."
↓ sentence-transformers model
[-0.032, 0.118, 0.047, -0.091, ...] ← 768 numbersRevelio uses BAAI/bge-base-en-v1.5 across the retrieval pipeline for built-in corpora, browser-side query embedding, the word explorer, and custom indexing. It produces 768-dimensional vectors and keeps every part of retrieval in the same semantic space.
Traditional keyword search matches exact words. Semantic search matches meaning.
To find the most relevant chunks for a query, we embed both the query and every chunk, then rank by cosine similarity - the angle between two vectors. A similarity of 1.0 means identical direction (very similar), 0 means orthogonal (unrelated).
similarity(query, chunk) = (query · chunk) / (|query| × |chunk|) top_k = sorted(chunks, by=similarity, descending=True)[:k]
In Revelio this happens entirely in the browser. When you select a query, the pre-computed embeddings are loaded from JSON and cosine similarity is computed client-side in real time - no server round-trip needed for retrieval.
Once embeddings are computed, there are different strategies for picking which chunks to return. Revelio supports two, selectable from the settings menu.

Cosine similarity
Ranks every chunk by its cosine similarity to the query and returns the top K. Fast and predictable. Can return redundant chunks if the corpus has repeated content - you might get five chunks all saying the same thing.
scores = [cosine(query, chunk) for chunk in corpus] top_k = sorted(scores, descending=True)[:k]
MMR - Maximal Marginal Relevance
MMR picks chunks that are both relevant to the query and different from each other. After selecting the first chunk (highest similarity), each subsequent pick is penalised if it is too similar to an already-selected chunk. This trades a little relevance for more coverage.
selected = []
while len(selected) < k:
best = argmax over candidates of:
λ · sim(query, c) − (1−λ) · max sim(c, s) for s in selected
selected.append(best)Revelio uses λ=0.5, giving equal weight to relevance and diversity. Try MMR when your cosine results all highlight the same passage - it tends to surface a broader range of evidence for the LLM to work with.
Both modes apply a similarity threshold of 0.3 - chunks below this score are excluded regardless of K.
RAG is a pattern for making LLMs answer questions about specific documents without re-training or fine-tuning them. The idea is simple:
This works because LLMs are good at reading comprehension: they can synthesise an answer from the provided text even if they have never seen that text before.
Why RAG instead of just asking the LLM directly? LLMs have knowledge cutoffs, they can hallucinate facts, and they can't access your private documents. RAG grounds the answer in a specific, verifiable source.
System: You are a helpful assistant. Answer using only the provided context. User: Context: [1] Alice was beginning to get very tired of sitting by her sister... [2] There was nothing so very remarkable in that... Question: Why was Alice bored?
The Prompt Builder panel in the demo shows you exactly this constructed prompt, and the Answer Panel streams the LLM response.
Embedding models have a maximum input length (typically 256–512 tokens). Long documents must be split into smaller pieces called chunks before embedding.
The chunking strategy matters: chunks that are too small lose context, chunks that are too large dilute the signal. Revelio uses a sliding-window approach with overlap so that sentences near a boundary appear in two adjacent chunks, reducing the chance of a relevant sentence being cut off.
raw text ↓ split into ~500-token windows, 50-token overlap [chunk 0] [chunk 1] [chunk 2] ... ↓ embed each chunk independently [vec 0] [vec 1] [vec 2] ...
Each dot you see in the 3D viewer is one chunk. When a query is selected, the dots that light up are the chunks with the highest cosine similarity to that query.
Embedding vectors are 768 dimensions - impossible to visualise directly. To display them in 3D, we use Uniform Manifold Approximation and Projection (UMAP).
UMAP is a non-linear dimensionality reduction algorithm that tries to preserve the local neighbourhood structure of the high-dimensional data. In practice this means: chunks that were close in 768D tend to stay close in 3D. You can see natural semantic clusters - passages about the same topic clump together.
chunk embeddings: shape (N, 768)
↓ UMAP(n_components=3)
3D coords: shape (N, 3)
↓ normalize to [-1, 1]³
scatter plot pointsThe 3D coordinates are only used for visualisation. All retrieval still uses the original high-dimensional embeddings - the projected positions are not used for cosine similarity.
Revelio is split into two parts: a Python CLI that pre-computes corpus data, and a Next.js UI that loads and explores it.
Python CLI (cli/demo)
data/raw/alice.txt
↓ chunk_text() # sliding window
["Alice was…", "in that…", …]
↓ embed() # sentence-transformers
[[0.03, -0.09, …], …] # shape (N, 768)
↓ project() + normalize() # UMAP → 3D
[[0.12, -0.44, 0.31], …]
↓ write JSON
ui/public/data/alice.jsonPre-built query embeddings are included in the same JSON file so the browser never needs to run a model.
Next.js UI
load /data/alice.json # fetch on corpus select
↓
user selects a query
↓ retrieve() - cosine similarity in the browser
top-K chunks highlighted in 3D viewer
↓
POST /api/chat { messages: [system, user+context] }
↓ LLM API (OpenRouter / any OpenAI-compatible)
streamed answer → Answer PanelThe LLM backend is configured via environment variables (LLM_BASE_URL, LLM_MODEL, LLM_API_KEY) and defaults to OpenRouter with a free Mistral model so you can run it without any setup.
Yes. Revelio supports custom corpora in addition to the built-in datasets. The UI looks for a manifest at /data/custom/manifest.json, then loads each selected project from /data/custom/<id>.json.
Generate a custom corpus with the CLI
cd cli python -m venv .venv source .venv/bin/activate pip install -r requirements.txt python revelio.py index ./path/to/your/docs --name "My Project"
Command variants
python revelio.py index <folder> --name "<project name>" [--output <dir>]
| Variant | When to use it |
|---|---|
| python revelio.py index ./docs --name "My Project" | Standard local indexing flow |
| python revelio.py index ~/Documents/notes --name "Personal Notes" | Index a folder outside the repo using an absolute or home-relative path |
| python revelio.py index ./docs --name "My Project" --output ../ui/public/data/custom | Override where the generated corpus and manifest are written |
| Flag | Required | Purpose |
|---|---|---|
| folder | Yes | Root folder scanned recursively for supported files |
| --name NAME | Yes | Human-readable project label shown in the UI |
| --output DIR | No | Override the default output directory |
The indexer walks the folder recursively, extracts text, chunks it, embeds each chunk with BAAI/bge-base-en-v1.5, projects it to 3D with UMAP, writes the corpus JSON, and updates the manifest automatically.
Supported inputs are .txt, .md, .pdf, .jpg, .jpeg, .png, .gif, and .webp. PDF parsing needs pypdf; image OCR needs pytesseract, Pillow, and the system tesseract binary.
What gets written
ui/public/data/custom/ ├── manifest.json └── my-project.json
{
"projects": [
{
"id": "my-project",
"label": "My Project",
"file": "my-project.json"
}
]
}{
"corpus": "my-project",
"label": "My Project",
"model": "BAAI/bge-base-en-v1.5",
"chunks": [
{
"id": "my-project-chunk-0000",
"text": "Chunk text...",
"source": "notes.pdf",
"embedding": [0.12, -0.03, ...],
"x": 0.41,
"y": -0.22,
"z": 0.67
}
],
"queries": []
}The source field is optional in the shared corpus type, but custom corpora generated by the CLI include it so the UI can show which file a chunk came from.
How it shows up in the app
After indexing, restart npm run dev. Your dataset appears in the settings menu under Your Projects. Selecting it uses the same client-side retrieval flow as the built-in corpora.
Built-in corpus generation variants
python -m demo --all [--model <embedding-model>] python -m demo --corpus <alice|fastapi|space|words> [--model <embedding-model>]
| Variant | When to use it |
|---|---|
| python -m demo --all | Regenerate every built-in corpus with the default model mapping |
| python -m demo --corpus alice | Refresh one built-in text corpus |
| python -m demo --corpus words | Rebuild the word explorer only |
| python -m demo --all --model BAAI/bge-base-en-v1.5 | Force one embedding model across the whole generated dataset |
| python -m demo --corpus fastapi --model BAAI/bge-base-en-v1.5 | Override the model for a single corpus run |
| Flag | Required | Purpose |
|---|---|---|
| --all | One of `--all` or `--corpus` | Generate every built-in corpus in one run |
| --corpus CORPUS | One of `--all` or `--corpus` | Generate exactly one corpus: `alice`, `fastapi`, `space`, or `words` |
| --model MODEL | No | Override the embedding model for the selected run |
Revelio works with any OpenAI-compatible API. Smaller instruction-following models tend to work better for RAG than large RLHF-trained ones - they follow the system prompt more faithfully and avoid over-hedging when context is provided.
| Model | Via | Notes |
|---|---|---|
| mistralai/mistral-small-3.1-24b-instruct:free | OpenRouter (free) | Best free option - fast, follows instructions well |
| meta-llama/llama-3.1-8b-instruct | OpenRouter | Solid, widely supported |
| mistral-small3.1 | Ollama (local) | Runs fully offline |
To use OpenRouter, set the base URL to https://openrouter.ai/api/v1 and supply your API key. For Ollama, set it to http://localhost:11434/v1 with no key required. Both can be configured at runtime from the settings menu without restarting.
Getting an OpenRouter API key
:free) work with no credit balance.mistralai/mistral-small-3.1-24b-instruct:free. You can browse all available models at openrouter.ai/models.Using any other OpenAI-compatible provider
Any provider that implements the OpenAI chat completions API works - OpenAI itself, Together AI, Groq, LM Studio, vLLM, etc. The settings menu has three fields:
| Field | Example |
|---|---|
| Base URL | https://api.groq.com/openai/v1 |
| Model | llama3-8b-8192 |
| API Key | gsk_... |
The key is never sent to Revelio's server - it travels directly from your browser to the LLM provider in the Authorization header.