Home » Vector Search and Embeddings » Without External Database

Can Vector Search Work Without an External Database

Yes. For small datasets (under 100K vectors), you can run vector search entirely in memory using NumPy or FAISS without any database. Load your vectors into an array at startup, compute cosine similarity against the query vector, and return the top results. For persistence without a database server, FAISS supports file-based indexes and SQLite has vector extensions. These options work well for prototypes, local applications, CLI tools, and serverless functions where adding a database is unnecessary overhead.

In-Memory Search with NumPy

The simplest vector search implementation is a NumPy matrix multiplication. Store all vectors in a 2D NumPy array, and for each query, compute the dot product (or cosine similarity) between the query vector and all stored vectors. This is exact nearest neighbor search with perfect recall, and NumPy's optimized BLAS operations make it fast enough for tens of thousands of vectors.

import numpy as np import json class SimpleVectorSearch: def __init__(self): self.vectors = [] self.documents = [] self.matrix = None def add(self, text: str, embedding: list): self.documents.append(text) self.vectors.append(embedding) self.matrix = None # invalidate cache def search(self, query_embedding: list, top_k: int = 5): if self.matrix is None: self.matrix = np.array(self.vectors) norms = np.linalg.norm(self.matrix, axis=1, keepdims=True) self.matrix = self.matrix / norms query = np.array(query_embedding) query = query / np.linalg.norm(query) similarities = self.matrix @ query top_indices = np.argsort(similarities)[-top_k:][::-1] return [ {"text": self.documents[i], "score": float(similarities[i])} for i in top_indices ] def save(self, path: str): np.savez(path, vectors=np.array(self.vectors), documents=np.array(self.documents)) def load(self, path: str): data = np.load(path + ".npz", allow_pickle=True) self.vectors = data["vectors"].tolist() self.documents = data["documents"].tolist() self.matrix = None

This approach handles up to roughly 50K to 100K vectors with sub-100ms query times. Beyond that, exact search becomes slow because every query computes similarity against every stored vector. The O(n) query complexity means doubling the vectors doubles the query time.

FAISS: Fast In-Memory and File-Based Search

FAISS (Facebook AI Similarity Search) is a library that provides both exact and approximate nearest neighbor search with optimized C++ implementations. It runs in-process (no server), supports file-based persistence, and handles millions of vectors efficiently with indexing.

import faiss import numpy as np # Exact search (good for under 100K vectors) dimension = 1536 index = faiss.IndexFlatIP(dimension) # Inner product (cosine for normalized) # Add vectors (must be float32) vectors = np.array(embeddings, dtype=np.float32) faiss.normalize_L2(vectors) index.add(vectors) # Search query = np.array([query_embedding], dtype=np.float32) faiss.normalize_L2(query) distances, indices = index.search(query, k=10) # Save to disk faiss.write_index(index, "my_index.faiss") # Load from disk index = faiss.read_index("my_index.faiss") # For larger datasets, use HNSW index index_hnsw = faiss.IndexHNSWFlat(dimension, 32) # 32 neighbors index_hnsw.hnsw.efSearch = 64 faiss.normalize_L2(vectors) index_hnsw.add(vectors) # Now handles millions of vectors with sub-ms queries

FAISS is the right choice when you need fast vector search without a database server. It is commonly used in serverless functions (load the index from S3 at cold start), CLI tools, Jupyter notebooks, and batch processing pipelines. The trade-off is that FAISS does not support concurrent writes from multiple processes, so it is best for read-heavy or single-writer workloads.

ChromaDB: Embedded Database

ChromaDB runs as an embedded database (in-process, no server) with automatic embedding and persistence. It wraps HNSW indexing with a document-oriented API that handles embedding, storage, and querying in a few lines of code. For prototyping and small-scale applications, it is the fastest path to working vector search.

import chromadb client = chromadb.PersistentClient(path="./chroma_data") collection = client.get_or_create_collection("docs") # Add documents (ChromaDB can embed automatically) collection.add( documents=["Database connection pooling guide...", "Authentication troubleshooting..."], ids=["doc1", "doc2"] ) # Search results = collection.query( query_texts=["how to configure connection pools"], n_results=5 )

When You Need a Real Database

In-memory and embedded approaches stop working well when you need concurrent access from multiple application instances, real-time updates from multiple writers, vectors that exceed available RAM, or enterprise features like replication, backups, and monitoring. At that point, pgvector (if you have PostgreSQL) or a dedicated vector database (Qdrant, Pinecone, Weaviate) is the right step up.

The good news is that starting with a simple approach and migrating later is straightforward. The vector search interface (embed query, find top-k similar, return results) is the same regardless of the backend. Switching from NumPy to FAISS to pgvector to Qdrant changes the implementation but not the API contract.

Adaptive Recall handles vector storage, search, and scaling as a managed service. Start simple, scale without migration.

Try It Free