How to Set Up pgvector for Vector Search
Why pgvector
pgvector eliminates the need for a separate vector database in your infrastructure. Your application data, metadata, and vectors all live in one database, which means you can join vector search results with relational data in a single query. There is no synchronization between systems, no additional authentication layer, and no separate backup strategy. For applications under 10 million vectors, pgvector handles the workload on a single PostgreSQL instance with HNSW indexing.
The trade-off is that pgvector is a single-node solution. It scales vertically (bigger machine, more RAM) but not horizontally (multiple machines). Dedicated vector databases like Pinecone, Qdrant, and Weaviate offer distributed architectures that scale to billions of vectors. If your application needs that scale, pgvector is not the right choice. For the majority of applications, which operate in the hundreds of thousands to low millions of vectors, pgvector is simpler, cheaper, and fast enough.
Step-by-Step Setup
pgvector is available as a package for most PostgreSQL distributions. On managed PostgreSQL services (AWS RDS, Google Cloud SQL, Azure Database, Supabase, Neon), pgvector is pre-installed and you just need to enable it. On self-managed PostgreSQL, install the package and then enable the extension.
# Ubuntu/Debian (PostgreSQL 16)
sudo apt install postgresql-16-pgvector
# Amazon Linux / RHEL
sudo yum install pgvector_16
# macOS with Homebrew
brew install pgvector
# Or compile from source
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install-- Enable in your database
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify installation
SELECT extversion FROM pg_extension WHERE extname = 'vector';Add a vector column to your documents table. The dimension must match your embedding model's output size. For OpenAI text-embedding-3-small, use 1536. For text-embedding-3-large, use 3072 (or a truncated size like 1536 if using Matryoshka). For Cohere embed-v4, use 1024.
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT now()
);
-- If adding to an existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);Generate embeddings from your text using your embedding model's API, then insert them as vector literals. pgvector accepts vectors as bracket-enclosed comma-separated values.
import psycopg2
from anthropic import Anthropic
import json
# Generate embedding (using your preferred embedding API)
def get_embedding(text: str) -> list:
# Replace with your embedding model call
# Returns a list of floats
pass
conn = psycopg2.connect("dbname=myapp")
cur = conn.cursor()
# Insert a document with its embedding
text = "PostgreSQL connection pooling with PgBouncer..."
embedding = get_embedding(text)
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(text, json.dumps(embedding))
)
conn.commit()
# Batch insert for efficiency
from psycopg2.extras import execute_values
data = [(doc["content"], json.dumps(get_embedding(doc["content"])))
for doc in documents]
execute_values(
cur,
"INSERT INTO documents (content, embedding) VALUES %s",
data,
template="(%s, %s::vector)"
)Without an index, pgvector performs exact nearest neighbor search by scanning every vector. This is accurate but slow for large tables. An HNSW (Hierarchical Navigable Small World) index enables fast approximate nearest neighbor search with 95 to 99% recall at sub-millisecond latency.
-- Create HNSW index for cosine distance
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops);
-- For dot product (inner product) distance
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_ip_ops);
-- For L2 (Euclidean) distance
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_l2_ops);
-- With tuning parameters
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);pg_stat_progress_create_index during construction to track progress.
pgvector provides distance operators for each metric. The
<=> operator computes cosine distance (1 minus cosine similarity), <#> computes negative inner product, and <-> computes L2 distance. Lower values mean more similar for all operators.
-- Find 10 most similar documents (cosine distance)
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- With metadata filtering (filter BEFORE vector search for efficiency)
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'technical'
AND created_at > now() - interval '30 days'
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- Hybrid: combine with full-text search
WITH vector_matches AS (
SELECT id, 1 - (embedding <=> $1::vector) AS vscore
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 20
),
text_matches AS (
SELECT id, ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $2)) AS tscore
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $2)
LIMIT 20
)
SELECT COALESCE(v.id, t.id) AS id,
COALESCE(v.vscore, 0) * 0.7 + COALESCE(t.tscore, 0) * 0.3 AS score
FROM vector_matches v
FULL OUTER JOIN text_matches t ON v.id = t.id
ORDER BY score DESC
LIMIT 10;Two HNSW parameters control the speed/accuracy trade-off.
ef_search controls how many candidates the index evaluates at query time: higher values give better recall but slower queries. m and ef_construction control index quality at build time: higher values give better recall but slower index building and more memory usage.
-- Increase ef_search for better recall (default is 40)
SET hnsw.ef_search = 100;
-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_documents_embedding'));
-- Monitor query performance
EXPLAIN ANALYZE
SELECT id FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
-- Ensure enough shared memory for the index
-- In postgresql.conf:
-- shared_buffers = 4GB (or 25% of total RAM)
-- effective_cache_size = 12GB (or 75% of total RAM)
-- maintenance_work_mem = 2GB (for index building)Scaling Considerations
pgvector handles up to roughly 5 to 10 million 1536-dimensional vectors on a single machine with 32 GB or more of RAM. Beyond that, consider partitioning the table by a natural partition key (tenant, date, category) so each partition's index fits comfortably in memory. If you need to scale beyond what a single machine can handle, consider migrating to a distributed vector database or using pgvector with Citus for distributed PostgreSQL.
For applications using Adaptive Recall, vector storage and indexing are managed as part of the memory infrastructure. You do not need to set up, tune, or maintain pgvector or any other vector database yourself. The system handles embedding, indexing, and querying alongside the cognitive scoring and knowledge graph traversal that complement vector similarity.
Skip the database setup. Adaptive Recall provides managed vector search combined with cognitive scoring and graph traversal, no infrastructure to maintain.
Try It Free