Vector Database Apache-2.0

ChromaDB

Lightweight, local-first open-source vector database. The default embedding store for RAG applications with simple Python/JS APIs and zero-config setup.

Website GitHub

Platforms: cross-platformdocker

ChromaDB is a lightweight, open-source vector database designed to be the easiest way to store and retrieve embeddings for AI applications. It works out of the box with zero configuration, runs in-process or as a standalone server, and provides simple Python and JavaScript APIs. For developers building RAG applications who want a local-first vector store that just works without infrastructure complexity, ChromaDB is the default choice used by LangChain, LlamaIndex, and most tutorial content.

Key Features

Zero-configuration start. ChromaDB runs in-process with a single pip install and a few lines of Python. No separate server to start, no database to configure, no schema to define. This makes it the fastest path from zero to working vector search, ideal for prototyping and small-to-medium applications.

Local-first architecture. Data is stored locally by default using SQLite and an embedded HNSW index. No cloud services, no network calls, no data leaving your machine. For server deployments, ChromaDB can run as a standalone service with client-server architecture.

Simple API. The API follows a collection-based model: create a collection, add documents with metadata, and query by similarity. ChromaDB handles embedding generation automatically when configured with an embedding function, or accepts pre-computed embeddings.

Built-in embedding functions. ChromaDB includes integrations with OpenAI, Hugging Face, Cohere, and sentence-transformers for automatic embedding generation. Point it at a local sentence-transformers model for fully offline operation.

Metadata filtering. Queries can combine vector similarity search with metadata filters using where clauses. Filter on any metadata field attached to documents, enabling hybrid retrieval that combines semantic and structured search.

Multi-modal support. ChromaDB supports storing and querying embeddings from any modality — text, images, audio — as long as they share the same embedding space. This enables multi-modal retrieval applications.

When to Use ChromaDB

Choose ChromaDB for prototyping RAG applications, building local AI tools that need vector search, and small-to-medium production deployments. It is the right choice when simplicity and fast iteration are priorities, when you want local-first operation, and when your dataset fits comfortably on a single machine.

Ecosystem Role

ChromaDB is the default vector database for the local AI ecosystem. LangChain, LlamaIndex, AnythingLLM, and most RAG tutorials use it as their primary example. For production deployments requiring distributed scaling, horizontal replication, or advanced features, Qdrant, Weaviate, or pgvector may be more appropriate. ChromaDB’s strength is getting started quickly without infrastructure overhead.