How Revelio Works

A guide to the concepts behind Retrieval-Augmented Generation, embeddings, and what Revelio is actually showing you.

1. What are embeddings?

An embedding is a way of representing text as a list of numbers - a vector. The key property is that meaning is encoded geometrically: text that means similar things produces vectors that are close together in space, while unrelated text produces vectors that are far apart.

For example, the words king and queen will have embeddings that are much closer to each other than either is to bicycle.

"The cat sat on the mat."
          ↓  sentence-transformers model
[-0.032, 0.118, 0.047, -0.091, ...]   ← 768 numbers

Revelio uses BAAI/bge-base-en-v1.5 across the retrieval pipeline for built-in corpora, browser-side query embedding, the word explorer, and custom indexing. It produces 768-dimensional vectors and keeps every part of retrieval in the same semantic space.

2. Semantic search

Traditional keyword search matches exact words. Semantic search matches meaning.

To find the most relevant chunks for a query, we embed both the query and every chunk, then rank by cosine similarity - the angle between two vectors. A similarity of 1.0 means identical direction (very similar), 0 means orthogonal (unrelated).

similarity(query, chunk) = (query · chunk) / (|query| × |chunk|)

top_k = sorted(chunks, by=similarity, descending=True)[:k]

In Revelio this happens entirely in the browser. When you select a query, the pre-computed embeddings are loaded from JSON and cosine similarity is computed client-side in real time - no server round-trip needed for retrieval.

3. Retrieval modes

Once embeddings are computed, there are different strategies for picking which chunks to return. Revelio supports two, selectable from the settings menu.

Settings pane showing LLM provider options, BYOK note, corpus selector, retrieval mode, and accent colours

Cosine similarity

Ranks every chunk by its cosine similarity to the query and returns the top K. Fast and predictable. Can return redundant chunks if the corpus has repeated content - you might get five chunks all saying the same thing.

scores = [cosine(query, chunk) for chunk in corpus]
top_k  = sorted(scores, descending=True)[:k]

MMR - Maximal Marginal Relevance

MMR picks chunks that are both relevant to the query and different from each other. After selecting the first chunk (highest similarity), each subsequent pick is penalised if it is too similar to an already-selected chunk. This trades a little relevance for more coverage.

selected = []
while len(selected) < k:
    best = argmax over candidates of:
        λ · sim(query, c) − (1−λ) · max sim(c, s) for s in selected
    selected.append(best)

Revelio uses λ=0.5, giving equal weight to relevance and diversity. Try MMR when your cosine results all highlight the same passage - it tends to surface a broader range of evidence for the LLM to work with.

Both modes apply a similarity threshold of 0.3 - chunks below this score are excluded regardless of K.

4. Retrieval-Augmented Generation (RAG)

RAG is a pattern for making LLMs answer questions about specific documents without re-training or fine-tuning them. The idea is simple:

Embed the user's question.
Retrieve the top-K most semantically similar chunks from your corpus.
Build a prompt that pastes those chunks in as context: “Given this context, answer the question.”
Send the prompt to an LLM and stream the answer back.

This works because LLMs are good at reading comprehension: they can synthesise an answer from the provided text even if they have never seen that text before.

Why RAG instead of just asking the LLM directly? LLMs have knowledge cutoffs, they can hallucinate facts, and they can't access your private documents. RAG grounds the answer in a specific, verifiable source.

System: You are a helpful assistant. Answer using only the provided context.

User:
Context:
[1] Alice was beginning to get very tired of sitting by her sister...
[2] There was nothing so very remarkable in that...

Question: Why was Alice bored?

The Prompt Builder panel in the demo shows you exactly this constructed prompt, and the Answer Panel streams the LLM response.

5. Chunking

Embedding models have a maximum input length (typically 256–512 tokens). Long documents must be split into smaller pieces called chunks before embedding.

The chunking strategy matters: chunks that are too small lose context, chunks that are too large dilute the signal. Revelio uses a sliding-window approach with overlap so that sentences near a boundary appear in two adjacent chunks, reducing the chance of a relevant sentence being cut off.

raw text
   ↓  split into ~500-token windows, 50-token overlap
[chunk 0] [chunk 1] [chunk 2] ...
   ↓  embed each chunk independently
[vec 0]   [vec 1]   [vec 2]  ...

Each dot you see in the 3D viewer is one chunk. When a query is selected, the dots that light up are the chunks with the highest cosine similarity to that query.

6. Dimensionality reduction & UMAP

Embedding vectors are 768 dimensions - impossible to visualise directly. To display them in 3D, we use Uniform Manifold Approximation and Projection (UMAP).

UMAP is a non-linear dimensionality reduction algorithm that tries to preserve the local neighbourhood structure of the high-dimensional data. In practice this means: chunks that were close in 768D tend to stay close in 3D. You can see natural semantic clusters - passages about the same topic clump together.

chunk embeddings: shape (N, 768)
         ↓  UMAP(n_components=3)
3D coords:      shape (N, 3)
         ↓  normalize to [-1, 1]³
scatter plot points

The 3D coordinates are only used for visualisation. All retrieval still uses the original high-dimensional embeddings - the projected positions are not used for cosine similarity.

7. How the Revelio pipeline works

Revelio is split into two parts: a Python CLI that pre-computes corpus data, and a Next.js UI that loads and explores it.

Python CLI (cli/demo)

data/raw/alice.txt
      ↓  chunk_text()          # sliding window
["Alice was…", "in that…", …]
      ↓  embed()               # sentence-transformers
[[0.03, -0.09, …], …]         # shape (N, 768)
      ↓  project() + normalize() # UMAP → 3D
[[0.12, -0.44, 0.31], …]
      ↓  write JSON
ui/public/data/alice.json

Pre-built query embeddings are included in the same JSON file so the browser never needs to run a model.

Next.js UI

load /data/alice.json          # fetch on corpus select
      ↓
user selects a query
      ↓  retrieve() - cosine similarity in the browser
top-K chunks highlighted in 3D viewer
      ↓
POST /api/chat  { messages: [system, user+context] }
      ↓  LLM API (OpenRouter / any OpenAI-compatible)
streamed answer → Answer Panel

The LLM backend is configured via environment variables (LLM_BASE_URL, LLM_MODEL, LLM_API_KEY) and defaults to OpenRouter with a free Mistral model so you can run it without any setup.

8. Custom data sources

Yes. Revelio supports custom corpora in addition to the built-in datasets. The UI looks for a manifest at /data/custom/manifest.json, then loads each selected project from /data/custom/<id>.json.

Generate a custom corpus with the CLI

cd cli
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

python revelio.py index ./path/to/your/docs --name "My Project"

Command variants

python revelio.py index <folder> --name "<project name>" [--output <dir>]

Variant	When to use it
python revelio.py index ./docs --name "My Project"	Standard local indexing flow
python revelio.py index ~/Documents/notes --name "Personal Notes"	Index a folder outside the repo using an absolute or home-relative path
python revelio.py index ./docs --name "My Project" --output ../ui/public/data/custom	Override where the generated corpus and manifest are written

Flag	Required	Purpose
folder	Yes	Root folder scanned recursively for supported files
--name NAME	Yes	Human-readable project label shown in the UI
--output DIR	No	Override the default output directory

The indexer walks the folder recursively, extracts text, chunks it, embeds each chunk with BAAI/bge-base-en-v1.5, projects it to 3D with UMAP, writes the corpus JSON, and updates the manifest automatically.

Supported inputs are .txt, .md, .pdf, .jpg, .jpeg, .png, .gif, and .webp. PDF parsing needs pypdf; image OCR needs pytesseract, Pillow, and the system tesseract binary.

What gets written

ui/public/data/custom/
├── manifest.json
└── my-project.json

{
  "projects": [
    {
      "id": "my-project",
      "label": "My Project",
      "file": "my-project.json"
    }
  ]
}

{
  "corpus": "my-project",
  "label": "My Project",
  "model": "BAAI/bge-base-en-v1.5",
  "chunks": [
    {
      "id": "my-project-chunk-0000",
      "text": "Chunk text...",
      "source": "notes.pdf",
      "embedding": [0.12, -0.03, ...],
      "x": 0.41,
      "y": -0.22,
      "z": 0.67
    }
  ],
  "queries": []
}

The source field is optional in the shared corpus type, but custom corpora generated by the CLI include it so the UI can show which file a chunk came from.

How it shows up in the app

After indexing, restart npm run dev. Your dataset appears in the settings menu under Your Projects. Selecting it uses the same client-side retrieval flow as the built-in corpora.

Built-in corpus generation variants

python -m demo --all [--model <embedding-model>]
python -m demo --corpus <alice|fastapi|space|words> [--model <embedding-model>]

Variant	When to use it
python -m demo --all	Regenerate every built-in corpus with the default model mapping
python -m demo --corpus alice	Refresh one built-in text corpus
python -m demo --corpus words	Rebuild the word explorer only
python -m demo --all --model BAAI/bge-base-en-v1.5	Force one embedding model across the whole generated dataset
python -m demo --corpus fastapi --model BAAI/bge-base-en-v1.5	Override the model for a single corpus run

Flag	Required	Purpose
--all	One of `--all` or `--corpus`	Generate every built-in corpus in one run
--corpus CORPUS	One of `--all` or `--corpus`	Generate exactly one corpus: `alice`, `fastapi`, `space`, or `words`
--model MODEL	No	Override the embedding model for the selected run

9. Recommended models

Revelio works with any OpenAI-compatible API. Smaller instruction-following models tend to work better for RAG than large RLHF-trained ones - they follow the system prompt more faithfully and avoid over-hedging when context is provided.

Model	Via	Notes
mistralai/mistral-small-3.1-24b-instruct:free	OpenRouter (free)	Best free option - fast, follows instructions well
meta-llama/llama-3.1-8b-instruct	OpenRouter	Solid, widely supported
mistral-small3.1	Ollama (local)	Runs fully offline

To use OpenRouter, set the base URL to https://openrouter.ai/api/v1 and supply your API key. For Ollama, set it to http://localhost:11434/v1 with no key required. Both can be configured at runtime from the settings menu without restarting.

Getting an OpenRouter API key

Go to openrouter.ai and sign in.
Open Keys in the sidebar and create a new key. Free-tier models (marked :free) work with no credit balance.
Copy the key and paste it into the API Key field in Revelio's settings menu.
Set the model ID - e.g. mistralai/mistral-small-3.1-24b-instruct:free. You can browse all available models at openrouter.ai/models.

Using any other OpenAI-compatible provider

Any provider that implements the OpenAI chat completions API works - OpenAI itself, Together AI, Groq, LM Studio, vLLM, etc. The settings menu has three fields:

Field	Example
Base URL	https://api.groq.com/openai/v1
Model	llama3-8b-8192
API Key	gsk_...

The key is never sent to Revelio's server - it travels directly from your browser to the LLM provider in the Authorization header.