Skip to content

[DOC] Add cloud docs, architecture docs, data model doc #4859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion docs/docs.trychroma.com/markdoc/content/cloud/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,22 @@

Our fully managed hosted service, **Chroma Cloud** is here. You can now [sign up](https://trychroma.com/signup) for early access.

More documentation for Chroma Cloud users coming soon!
**Chroma Cloud** is a managed offering of [Distributed Chroma](../docs/overview/architecture), operated by the same database and search engineers who designed the system. Under the hood, it's the exact same Apache 2.0–licensed Chroma—no forks, no divergence, just the open-source engine running at scale. Chroma Cloud is serverless - you don’t have to provision servers or think about operations, and is billed [based on usage](./pricing)

### Easy to use and operate

Chroma Cloud is designed to require minimal configuration while still delivering top-tier performance, scale, and reliability. You can get started in under 30 seconds, and as your workload grows, Chroma Cloud handles scaling automatically—no tuning, provisioning, or operations required. Its architecture is built around a custom Rust-based execution engine and high-performance vector and full-text indexes, enabling fast query performance even under heavy loads.

### Reliability

Reliability and accuracy are core to the design. Chroma Cloud is thoroughly tested, with production systems achieving over 90% recall and being continuously monitored for correctness. Thanks to its object storage–based persistence layer, Chroma Cloud is often an order of magnitude more cost-effective than alternatives, without compromising on performance or durability.

### Security and Deployment

Chroma Cloud is SOC 2 Type I certified (Type II in progress), and offers deployment flexibility to match your needs. You can sign up for our fully-managed multi-tenant cluster currently running in AWS us-east-1 or contact us for single-tenant deployment managed by Chroma or hosted in your own VPC (BYOC). If you ever want to self-host open source Chroma, we will help you transition your data from Cloud to your self-managed deployment.

### Dashboard

Our web dashboard lets your team work together to view your data, and ensure data quality in your collections with ease. It also serves as a touchpoint for you to view billing data and usage telemetry.

Chroma Cloud is open-source at its core, built on the exact same Apache 2.0 codebase available to everyone. Whether you’re building a prototype or running a mission-critical production workload, Chroma Cloud is the fastest path to reliable, scalable, and accurate retrieval.
80 changes: 80 additions & 0 deletions docs/docs.trychroma.com/markdoc/content/cloud/pricing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Pricing

Chroma Cloud uses a simple, transparent, usage-based pricing model. You pay for what you use across **writes**, **reads**, and **storage**—with no hidden fees or tiered feature gating.

Need an estimate? Try our [pricing calculator](https://trychroma.com/pricing).

## Writes

Chroma Cloud charges **$2.50 per logical GiB** written via an add, update, or upsert.

- A *logical GiB* is the raw, uncompressed size of the data you send to Chroma—regardless of how it's stored or indexed internally.
- You are only billed once per write, not for background compactions or reindexing.

## Reads

Read costs are based on both the amount of data scanned and the volume of data returned:

- **$0.0075 per TiB scanned**
- **$0.09 per GiB returned**

**How queries are counted:**

- A single vector similarity query counts as one query.
- Each metadata or full-text predicate in a query counts as an additional query.
- Full-text and regex filters are billed as *(N – 2)* queries, where *N* is the number of characters in the search string.

**Example:**

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
collection.query(
query_embeddings=[[1.0, 2.3, 1.1, ...]],
where_document={"$contains": "hello world"}
)
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
await collection.query(
queryEmbeddings=[[1.0, 2.3, 1.1, ...]],
whereDocument={"$contains": "hello world"}
)
```
{% /Tab %}

{% /TabbedCodeBlock %}

For the query above (a single vector search and a 10-character full-text search), querying against 10 GiB of data incurs:

- 10,000 queries × 10 units (1 vector + 9 full-text) = 100,000 query units
- 10 GiB = 0.01 TiB scanned → 100,000 × 0.01 TiB × $0.0075 = **$7.50**

## Storage

Storage is billed at **$0.33 per GiB per month**, prorated by the hour:

- Storage usage is measured in **GiB-hours** to account for fluctuations over time.
- Storage is billed based on the logical amount of data written.
- All caching, including SSD caches used internally by Chroma, are not billed to you.

## Frequently Asked Questions

**Is there a free tier?**

We offer $5 in credits to new users.

**How is multi-tenancy handled for billing?**

Billing is account-based. All data across your collections and tenants within a Chroma Cloud account is aggregated for pricing.

**Can I deploy Chroma in my own VPC?**

Yes. We offer a BYOC (bring your own cloud) option for single-tenant deployments. [Contact us](mailto:[email protected]) for more details.

**Do I get charged for background indexing?**

No. You’re only billed for the logical data you write and the storage you consume. Background jobs like compaction or reindexing do not generate additional write or read charges.
25 changes: 25 additions & 0 deletions docs/docs.trychroma.com/markdoc/content/cloud/quotas-limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Quotas & Limits

To ensure the stability and fairness in a multi-tenant environment, Chroma Cloud enforces input and query quotas across all user-facing operations. These limits are designed to strike a balance between performance, reliability, and ease of use for the majority of workloads.

Most quotas can be increased upon request, once a clear need has been demonstrated. If your application requires higher limits, please [contact us](mailto:[email protected]). We are happy to help.

| **Quota** | **Value** |
| --- | --- |
| Maximum embedding dimensions | 3072 |
| Maximum document bytes | 16,384 |
| Maximum uri bytes | 128 |
| Maximum ID size bytes | 128 |
| Maximum metadata value size bytes | 256 |
| Maximum metadata key size bytes | 36 |
| Maximum number of metadata keys | 16 |
| Maximum number of where predicates | 8 |
| Maximum size of full text search or regex search | 256 |
| Maximum number of results returned | 100 |
| Maximum number of concurrent reads per collection | 5 |
| Maximum number of concurrent writes per collection | 5 |
| Maximum number of collections | 1,000,000 |

These limits apply per request or per collection as appropriate. For example, concurrent read/write limits are tracked independently per collection, and full-text query limits apply to the length of the input string, not the number of documents searched.

If you expect to approach these limits, we recommend reaching out early so we can ensure your account is configured accordingly.
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Architecture

Chroma is designed with a modular architecture that prioritizes performance and ease of use. It scales seamlessly from local development to large-scale production, while exposing a consistent API across all deployment modes.

Chroma delegates, as much as possible, problems of data durability to trusted sub-systems such as SQLite and Cloud Object Storage, focusing the system design on core problems of data management and information retrieval.

## Deployment Modes

Chroma runs wherever you need it to, supporting you in everything from local experimentation, to large scale production workloads.

- **Local**: as an embedded library - great for prototyping and experimentation.
- **Single Node**: as a single-node server - great for small to medium scale workloads of < 10M records in a handful of collections.
- **Distributed**: as a scalable distributed system - great for large scale production workloads, supporting millions of collections.

You can use [Chroma Cloud](https://www.trychroma.com/signup), which is a managed offering of distributed Chroma.

## Core Components

Regardless of deployment mode, Chroma is composed of five core components. Each plays a distinct role in the system and operates over the shared [Chroma data model](../overview/data-model).

![architecture](/architecture.png)

### The Gateway

The entrypoint for all client traffic.

- Exposes a consistent API across all modes.
- Handles authentication, rate-limiting, quota management, and request validation.
- Routes requests to downstream services.

### The Log

Chroma’s write-ahead log.

- All writes are recorded here before acknowledgment to clients.
- Ensures atomicity across multi-record writes.
- Provides durability and replay in distributed deployments.


### The Query Executor

Responsible for **all read operations.**

- Vector similarity, full-text and metadata search.
- Maintains a combination of in-memory and on-disk indexes, and coordinates with the Log to serve consistent results.

### The Compactor

A service that periodically builds and maintains indexes.

- Reads from the Log and builds updated vector / full-text / metadata indexes.
- Writes materialized index data to shared storage.
- Updates the System Database with metadata about new index versions.

### The System Database

Chroma’s internal catalog.

- Tracks tenants, collections, and their metadata.
- In distributed mode, also manages cluster state (e.g., query/compactor node membership).
- Backed by a SQL database.

## Storage & Runtime

These components operate differently depending on the deployment mode, particularly in how they use storage and the runtime they operate in.

- In Local and Single Node mode, all components share a process and use the local filesystem for durability.
- In **Distributed** mode, components are deployed as independent services.
- The log and built indexes are stored in cloud object storage.
- The system catalog is backed by a SQL database.
- All services use local SSDs as caches to reduce object storage latency and cost.

## Request Sequences

### Read Path

![read_path](/read_path.png)

1. Request arrives at the gateway, where it is authenticated, checked against quota limits, rate limited and transformed into a logical plan.
2. This logical plan is routed to the relevant query executor. In distributed Chroma, a rendezvous hash on the collection id is used to route the query to the correct nodes and provide cache coherence.
3. The query executor transforms the logical plan into a physical plan for execution, reads from its storage layer, and performs the query. The query executor pulls data from the log to ensure a consistent read.
4. The request is returned to the gateway and subsequently to the client.

### Write Path

![write_path](/write_path.png)

1. Request arrives at the gateway, where it is authenticated, checked against quota limits, rate limited and then transformed into a log of operations.
2. The log of operations is forwarded to the write-ahead-log for persistence.
3. After being persisted by the write-ahead-log, the gateway acknowledges the write.
4. The compactor periodically pulls from the write-ahead-log and builds new index versions from the accumulated writes. These indexes are optimized for read performance and include vector, full-text, and metadata indexes.
5. Once new index versions are built, they are written to storage and registered in the system database.

## Tradeoffs

Distributed Chroma is built on object storage in order to ensure the durability of your data and to deliver low costs. Object storage has extremely high throughput, easily capable of saturating a single nodes network bandwidth, but this comes at the cost of a relatively high latency floor of ~10-20ms.

In order to reduce the overhead of this latency floor, Distributed Chroma aggressively leverage SSD caching. When you first query a collection, a subset of the data needed to answer the query will be read selectively from object storage, incurring a cold-start latency penalty. In the background, the SSD cache will be loaded with the data for the collection. After the collection is fully warm, queries will be served entirely from SSD.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Chroma Data Model

Chroma’s data model is designed to balance simplicity, flexibility, and scalability. It introduces a few core abstractions—**Tenants**, **Databases**, and **Collections**—that allow you to organize, retrieve, and manage data efficiently across environments and use cases.

### Collections

A **collection** is the fundamental unit of storage and querying in Chroma. Each collection contains a set of items, where each item consists of:

- An ID uniquely identifying the item
- An **embedding vector**
- Optional **metadata** (key-value pairs)
- A document that belongs to the provided embedding

Collections are independently indexed and are optimized for fast retrieval using **vector similarity**, **full-text search**, and **metadata filtering**. In distributed deployments, collections can be sharded or migrated across nodes as needed; the system transparently manages paging them in and out of memory based on access patterns.

### Databases

Collections are grouped into **databases**, which serve as a logical namespace. This is useful for organizing collections by purpose—for example, separating environments like "staging" and "production", or grouping applications under a common schema.

Each database contains multiple collections, and each collection name must be unique within a database.

### Tenants

At the top level of the model is the **tenant**, which represents a single user, team, or account. Tenants provide complete isolation. No data or metadata, is shared across tenants. All access control, quota enforcement, and billing are scoped to the tenant level.
10 changes: 9 additions & 1 deletion docs/docs.trychroma.com/markdoc/content/sidebar-config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ const sidebarConfig: AppSection[] = [
id: "getting-started",
name: "Getting Started",
},
{
id: 'architecture',
name: "Architecture",
},
{
id: "data-model",
name: "Data Model",
},
{
id: "roadmap",
name: "Roadmap",
Expand Down Expand Up @@ -100,7 +108,7 @@ const sidebarConfig: AppSection[] = [
name: "Chroma Cloud",
icon: CloudIcon,
tag: "",
pages: [{ id: "getting-started", name: "Getting Started" }],
pages: [{ id: "getting-started", name: "Getting Started" }, { id: "pricing", name: "Pricing" }, { id: "quotas-limits", name: "Quotas & Limits" }],
},
{
id: "production",
Expand Down
Binary file added docs/docs.trychroma.com/public/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs.trychroma.com/public/read_path.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs.trychroma.com/public/write_path.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading