Vector Search
Nucleus includes built-in vector search with HNSW and IVFFlat indexes. Store embeddings alongside your relational data — no separate vector database needed.
The Vector Type
Declare a fixed-dimension vector column:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding Vector(384)
);
Inserting Vectors
Use the VECTOR() function to create vector values:
INSERT INTO documents (title, content, embedding)
VALUES (
'Getting Started',
'Learn how to use Neutron...',
VECTOR('[0.1, 0.5, 0.3, ...]')
);
Important: Always use VECTOR('...') — a bare string '[1,0,0]' stores as text, not as a vector.
Similarity Search
Use VECTOR_DISTANCE to find similar vectors:
SELECT title, VECTOR_DISTANCE(embedding, VECTOR('[0.1, 0.4, 0.2, ...]'), 'l2') AS distance
FROM documents
ORDER BY distance
LIMIT 10;
Distance Metrics
| Metric | Function | Best For |
|--------|----------|----------|
| l2 | Euclidean distance | General purpose |
| cosine | Cosine similarity | Text embeddings |
| dot | Inner product | Normalized vectors |
-- Cosine similarity
SELECT * FROM documents
ORDER BY VECTOR_DISTANCE(embedding, VECTOR('[...]'), 'cosine')
LIMIT 5;
-- Inner product
SELECT * FROM documents
ORDER BY VECTOR_DISTANCE(embedding, VECTOR('[...]'), 'dot')
LIMIT 5;
Indexes
HNSW (Hierarchical Navigable Small World)
Best for high recall with moderate memory usage. Recommended for most use cases.
CREATE INDEX idx_docs_hnsw ON documents USING hnsw (embedding);
IVFFlat (Inverted File with Flat)
Best for large datasets where you can trade some recall for speed.
CREATE INDEX idx_docs_ivf ON documents USING ivfflat (embedding);
Filtering with Vectors
Combine vector search with SQL filters:
-- Find similar documents in a specific category
SELECT title, VECTOR_DISTANCE(embedding, VECTOR('[...]'), 'cosine') AS score
FROM documents
WHERE category = 'tutorials'
ORDER BY score
LIMIT 10;
-- Join with other tables
SELECT d.title, u.name AS author, VECTOR_DISTANCE(d.embedding, VECTOR('[...]'), 'l2') AS dist
FROM documents d
JOIN users u ON d.author_id = u.id
ORDER BY dist
LIMIT 5;
RAG Pattern
A common pattern for Retrieval-Augmented Generation:
-- 1. Store document chunks with embeddings
CREATE TABLE chunks (
id SERIAL PRIMARY KEY,
doc_id INTEGER REFERENCES documents(id),
text TEXT,
embedding Vector(1536)
);
-- 2. Create HNSW index
CREATE INDEX idx_chunks_hnsw ON chunks USING hnsw (embedding);
-- 3. Retrieve relevant chunks for a query
SELECT text
FROM chunks
ORDER BY VECTOR_DISTANCE(embedding, VECTOR('[query_embedding...]'), 'cosine')
LIMIT 5;
-- 4. Pass retrieved chunks to your LLM as context
Performance
- HNSW: ~1ms queries on 1M vectors (384 dimensions)
- IVFFlat: ~0.5ms queries with nprobe=10, lower recall
- Brute force: Exact results, O(n) — use for small datasets (under 10K vectors)
SIMD-accelerated distance calculations are used automatically on supported hardware.