-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[DOC] Add cloud docs, architecture docs, data model doc #4859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+255
−2
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
c64aab9
[DOCS] Add cloud docs, architecture docs, data model doc
HammadB 081eaaf
docs
HammadB 00c02a0
images
HammadB f087db6
fix bugs
HammadB 4c8ea46
image bg
HammadB 7e1af5e
revert
HammadB 641d339
Minor edits, fix links, and remove image backgrounds
itaismith c3fc6f2
small copy editS
jeffchuber File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Pricing | ||
|
||
Chroma Cloud uses a simple, transparent, usage-based pricing model. You pay for what you use across **writes**, **reads**, and **storage**—with no hidden fees or tiered feature gating. | ||
|
||
Need an estimate? Try our [pricing calculator](https://trychroma.com/pricing). | ||
|
||
## Writes | ||
|
||
Chroma Cloud charges **$2.50 per logical GiB** written via an add, update, or upsert. | ||
|
||
- A *logical GiB* is the raw, uncompressed size of the data you send to Chroma—regardless of how it's stored or indexed internally. | ||
- You are only billed once per write, not for background compactions or reindexing. | ||
|
||
## Reads | ||
|
||
Read costs are based on both the amount of data scanned and the volume of data returned: | ||
|
||
- **$0.0075 per TiB scanned** | ||
- **$0.09 per GiB returned** | ||
|
||
**How queries are counted:** | ||
|
||
- A single vector similarity query counts as one query. | ||
- Each metadata or full-text predicate in a query counts as an additional query. | ||
- Full-text and regex filters are billed as *(N – 2)* queries, where *N* is the number of characters in the search string. | ||
|
||
**Example:** | ||
|
||
{% TabbedCodeBlock %} | ||
|
||
{% Tab label="python" %} | ||
```python | ||
collection.query( | ||
query_embeddings=[[1.0, 2.3, 1.1, ...]], | ||
where_document={"$contains": "hello world"} | ||
) | ||
``` | ||
{% /Tab %} | ||
|
||
{% Tab label="typescript" %} | ||
```typescript | ||
await collection.query( | ||
queryEmbeddings=[[1.0, 2.3, 1.1, ...]], | ||
whereDocument={"$contains": "hello world"} | ||
) | ||
``` | ||
{% /Tab %} | ||
|
||
{% /TabbedCodeBlock %} | ||
|
||
For the query above (a single vector search and a 10-character full-text search), querying against 10 GiB of data incurs: | ||
|
||
- 10,000 queries × 10 units (1 vector + 9 full-text) = 100,000 query units | ||
- 10 GiB = 0.01 TiB scanned → 100,000 × 0.01 TiB × $0.0075 = **$7.50** | ||
|
||
## Storage | ||
|
||
Storage is billed at **$0.33 per GiB per month**, prorated by the hour: | ||
|
||
- Storage usage is measured in **GiB-hours** to account for fluctuations over time. | ||
- Storage is billed based on the logical amount of data written. | ||
- All caching, including SSD caches used internally by Chroma, are not billed to you. | ||
|
||
## Frequently Asked Questions | ||
|
||
**Is there a free tier?** | ||
|
||
We offer $5 in credits to new users. | ||
itaismith marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**How is multi-tenancy handled for billing?** | ||
|
||
Billing is account-based. All data across your collections and tenants within a Chroma Cloud account is aggregated for pricing. | ||
|
||
**Can I deploy Chroma in my own VPC?** | ||
|
||
Yes. We offer a BYOC (bring your own cloud) option for single-tenant deployments. [Contact us](mailto:[email protected]) for more details. | ||
|
||
**Do I get charged for background indexing?** | ||
|
||
No. You’re only billed for the logical data you write and the storage you consume. Background jobs like compaction or reindexing do not generate additional write or read charges. |
25 changes: 25 additions & 0 deletions
25
docs/docs.trychroma.com/markdoc/content/cloud/quotas-limits.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Quotas & Limits | ||
|
||
To ensure the stability and fairness in a multi-tenant environment, Chroma Cloud enforces input and query quotas across all user-facing operations. These limits are designed to strike a balance between performance, reliability, and ease of use for the majority of workloads. | ||
|
||
Most quotas can be increased upon request, once a clear need has been demonstrated. If your application requires higher limits, please [contact us](mailto:[email protected]). We are happy to help. | ||
|
||
| **Quota** | **Value** | | ||
| --- | --- | | ||
| Maximum embedding dimensions | 3072 | | ||
| Maximum document bytes | 16,384 | | ||
| Maximum uri bytes | 128 | | ||
| Maximum ID size bytes | 128 | | ||
| Maximum metadata value size bytes | 256 | | ||
| Maximum metadata key size bytes | 36 | | ||
| Maximum number of metadata keys | 16 | | ||
| Maximum number of where predicates | 8 | | ||
| Maximum size of full text search or regex search | 256 | | ||
| Maximum number of results returned | 100 | | ||
| Maximum number of concurrent reads per collection | 5 | | ||
| Maximum number of concurrent writes per collection | 5 | | ||
| Maximum number of collections | 1,000,000 | | ||
|
||
These limits apply per request or per collection as appropriate. For example, concurrent read/write limits are tracked independently per collection, and full-text query limits apply to the length of the input string, not the number of documents searched. | ||
|
||
If you expect to approach these limits, we recommend reaching out early so we can ensure your account is configured accordingly. |
98 changes: 98 additions & 0 deletions
98
docs/docs.trychroma.com/markdoc/content/docs/overview/architecture.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Architecture | ||
|
||
Chroma is designed with a modular architecture that prioritizes performance and ease of use. It scales seamlessly from local development to large-scale production, while exposing a consistent API across all deployment modes. | ||
|
||
Chroma delegates, as much as possible, problems of data durability to trusted sub-systems such as SQLite and Cloud Object Storage, focusing the system design on core problems of data management and information retrieval. | ||
|
||
## Deployment Modes | ||
|
||
Chroma runs wherever you need it to, supporting you in everything from local experimentation, to large scale production workloads. | ||
|
||
- **Local**: as an embedded library - great for prototyping and experimentation. | ||
- **Single Node**: as a single-node server - great for small to medium scale workloads of < 10M records in a handful of collections. | ||
- **Distributed**: as a scalable distributed system - great for large scale production workloads, supporting millions of collections. | ||
|
||
You can use [Chroma Cloud](https://www.trychroma.com/signup), which is a managed offering of distributed Chroma. | ||
|
||
## Core Components | ||
|
||
Regardless of deployment mode, Chroma is composed of five core components. Each plays a distinct role in the system and operates over the shared [Chroma data model](../overview/data-model). | ||
|
||
 | ||
|
||
### The Gateway | ||
|
||
The entrypoint for all client traffic. | ||
|
||
- Exposes a consistent API across all modes. | ||
- Handles authentication, rate-limiting, quota management, and request validation. | ||
- Routes requests to downstream services. | ||
|
||
### The Log | ||
|
||
Chroma’s write-ahead log. | ||
|
||
- All writes are recorded here before acknowledgment to clients. | ||
- Ensures atomicity across multi-record writes. | ||
- Provides durability and replay in distributed deployments. | ||
|
||
|
||
### The Query Executor | ||
|
||
Responsible for **all read operations.** | ||
|
||
- Vector similarity, full-text and metadata search. | ||
- Maintains a combination of in-memory and on-disk indexes, and coordinates with the Log to serve consistent results. | ||
|
||
### The Compactor | ||
|
||
A service that periodically builds and maintains indexes. | ||
|
||
- Reads from the Log and builds updated vector / full-text / metadata indexes. | ||
- Writes materialized index data to shared storage. | ||
- Updates the System Database with metadata about new index versions. | ||
|
||
### The System Database | ||
|
||
Chroma’s internal catalog. | ||
|
||
- Tracks tenants, collections, and their metadata. | ||
- In distributed mode, also manages cluster state (e.g., query/compactor node membership). | ||
- Backed by a SQL database. | ||
|
||
## Storage & Runtime | ||
|
||
These components operate differently depending on the deployment mode, particularly in how they use storage and the runtime they operate in. | ||
|
||
- In Local and Single Node mode, all components share a process and use the local filesystem for durability. | ||
- In **Distributed** mode, components are deployed as independent services. | ||
- The log and built indexes are stored in cloud object storage. | ||
- The system catalog is backed by a SQL database. | ||
- All services use local SSDs as caches to reduce object storage latency and cost. | ||
|
||
## Request Sequences | ||
|
||
### Read Path | ||
|
||
 | ||
|
||
1. Request arrives at the gateway, where it is authenticated, checked against quota limits, rate limited and transformed into a logical plan. | ||
2. This logical plan is routed to the relevant query executor. In distributed Chroma, a rendezvous hash on the collection id is used to route the query to the correct nodes and provide cache coherence. | ||
3. The query executor transforms the logical plan into a physical plan for execution, reads from its storage layer, and performs the query. The query executor pulls data from the log to ensure a consistent read. | ||
4. The request is returned to the gateway and subsequently to the client. | ||
|
||
### Write Path | ||
|
||
 | ||
|
||
1. Request arrives at the gateway, where it is authenticated, checked against quota limits, rate limited and then transformed into a log of operations. | ||
2. The log of operations is forwarded to the write-ahead-log for persistence. | ||
3. After being persisted by the write-ahead-log, the gateway acknowledges the write. | ||
4. The compactor periodically pulls from the write-ahead-log and builds new index versions from the accumulated writes. These indexes are optimized for read performance and include vector, full-text, and metadata indexes. | ||
5. Once new index versions are built, they are written to storage and registered in the system database. | ||
|
||
## Tradeoffs | ||
|
||
Distributed Chroma is built on object storage in order to ensure the durability of your data and to deliver low costs. Object storage has extremely high throughput, easily capable of saturating a single nodes network bandwidth, but this comes at the cost of a relatively high latency floor of ~10-20ms. | ||
|
||
In order to reduce the overhead of this latency floor, Distributed Chroma aggressively leverage SSD caching. When you first query a collection, a subset of the data needed to answer the query will be read selectively from object storage, incurring a cold-start latency penalty. In the background, the SSD cache will be loaded with the data for the collection. After the collection is fully warm, queries will be served entirely from SSD. |
24 changes: 24 additions & 0 deletions
24
docs/docs.trychroma.com/markdoc/content/docs/overview/data-model.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Chroma Data Model | ||
|
||
Chroma’s data model is designed to balance simplicity, flexibility, and scalability. It introduces a few core abstractions—**Tenants**, **Databases**, and **Collections**—that allow you to organize, retrieve, and manage data efficiently across environments and use cases. | ||
|
||
### Collections | ||
|
||
A **collection** is the fundamental unit of storage and querying in Chroma. Each collection contains a set of items, where each item consists of: | ||
|
||
- An ID uniquely identifying the item | ||
- An **embedding vector** | ||
- Optional **metadata** (key-value pairs) | ||
- A document that belongs to the provided embedding | ||
|
||
Collections are independently indexed and are optimized for fast retrieval using **vector similarity**, **full-text search**, and **metadata filtering**. In distributed deployments, collections can be sharded or migrated across nodes as needed; the system transparently manages paging them in and out of memory based on access patterns. | ||
|
||
### Databases | ||
|
||
Collections are grouped into **databases**, which serve as a logical namespace. This is useful for organizing collections by purpose—for example, separating environments like "staging" and "production", or grouping applications under a common schema. | ||
|
||
Each database contains multiple collections, and each collection name must be unique within a database. | ||
|
||
### Tenants | ||
|
||
At the top level of the model is the **tenant**, which represents a single user, team, or account. Tenants provide complete isolation. No data or metadata, is shared across tenants. All access control, quota enforcement, and billing are scoped to the tenant level. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.