Chroma
The Chroma DB and Local DB components read and write to Chroma vector stores using an instance of Chroma vector store. Includes support for remote or in-memory instances with or without persistence.
For more information, see the following:
1. Hidden parameters
2. Search results output
3. Vector store instances
4. Chroma documentation
Chroma DB
You can use the Chroma DB component to read and write to a Chroma database in local storage or a remote Chroma server with options for persistence and caching. When writing, the component can create a new database or collection at the specified location.
The following example flow uses one Chroma DB component for both reads and writes:
1. When writing, it splits Data from a URL component into chunks, computes embeddings with attached Embedding Model component, and then loads the chunks and embeddings into the Chroma vector store. To trigger writes, click Run component on the Chroma DB component.
2. When reading, it uses chat input to perform a similarity search on the vector store and then print the search results to the chat. To trigger reads, open the Playground and enter a chat message.
After running the flow once, you can click Inspect Output on each component to understand how the data transformed as it passed from component to component.
Chroma DB parameters
Name | Type | Description |
---|---|---|
Collection Name (collection_name) | String | Input parameter. The name of your Chroma vector store collection. Default: Robility flow. |
Persist Directory (persist_directory) | String | Input parameter. To persist the Chroma database, enter a relative or absolute path to a directory to store the chroma.sqlite3 file. Leave empty for an ephemeral database. When reading or writing to an existing persistent database, specify the path to the persistent directory. |
Ingest Data (ingest_data) | Data or DataFrame | Input parameter. Data or DataFrame input containing the records to write to the vector store. Only relevant for writes. |
Search Query (search_query) | String | Input parameter. The query to use for vector search. Only relevant for reads. |
Cache Vector Store (cache_vector_store) | Boolean | Input parameter. If true, the component caches the vector store in memory for faster reads. Default: Enabled (true). |
Embedding (embedding) | Embeddings | Input parameter. The embedding function to use for the vector store. By default, Chroma DB uses its built-in embeddings model, or you can attach an Embedding Model component to use a different provider or model. |
CORS Allow Origins (chroma_server_cors_allow_origins) | String | Input parameter. The CORS allow origins for the Chroma server. |
Chroma Server Host (chroma_server_host) | String | Input parameter. The host for the Chroma server. |
Chroma Server HTTP Port (chroma_server_http_port) | Integer | Input parameter. The HTTP port for the Chroma server. |
Chroma Server gRPC Port (chroma_server_grpc_port) | Integer | Input parameter. The gRPC port for the Chroma server. |
Chroma Server SSL Enabled (chroma_server_ssl_enabled) | Boolean | Input parameter. Enable SSL for the Chroma server. |
Allow Duplicates (allow_duplicates) | Boolean | Input parameter. If true (default), writes don't check for existing duplicates in the collection, allowing you to store multiple copies of the same content. If false, writes won't add documents that match existing documents already present in the collection. If false, it can strictly enforce deduplication by searching the entire collection or only search the number of records, specified in limit. Only relevant for writes. |
Search Type (search_type) | String | Input parameter. The type of search to perform, either Similarity or MMR. Only relevant for reads. |
Number of Results (number_of_results) | Integer | Input parameter. The number of search results to return. Default: 10. Only relevant for reads. |
Limit (limit) | Integer | Input parameter. Limit the number of records to compare when Allow Duplicates is false. This can help improve performance when writing to large collections, but it can result in some duplicate records. Only relevant for writes. |
Local DB
The Local DB component reads and writes to a persistent, in-memory Chroma DB instance intended for use with Robility flow. It has separate modes for reads and writes, automatic collection management, and default persistence in your Robility flow cache directory.
Set the Mode parameter to reflect the operation you want the component to perform, and the configure the other parameters accordingly. Some parameters are only available for one mode.
a. Ingest
b. Retrieve
To create or write to your local Chroma vector store, use Ingest mode.
The following parameters are available in Ingest mode:
Name | Type | Description |
---|---|---|
Name Your Collection (collection_name) | String | Input parameter. The name for your Chroma vector store collection. Default: Robility flow. Only available in Ingest mode. |
Persist Directory (persist_directory) | String | Input parameter. The base directory where you want to create and persist the vector store. If you use the Local DB component in multiple flows or to create multiple collections, collections are stored at $PERSISTENT_DIRECTORY/vector_stores/$COLLECTION_NAME. If not specified, the default location is your Robility flow cache directory (ROBILITY FLOW_CONFIG_DIR). For more information, see Memory management options. |
Embedding (embedding) | Embeddings | Input parameter. The embedding function to use for the vector store. |
Allow Duplicates (allow_duplicates) | Boolean | Input parameter. If true (default), writes don't check for existing duplicates in the collection, allowing you to store multiple copies of the same content. If false, writes won't add documents that match existing documents already present in the collection. If false, it can strictly enforce deduplication by searching the entire collection or only search the number of records, specified in limit. Only available in Ingest mode. |
Ingest Data (ingest_data) | Data or DataFrame | Input parameter. The records to write to the collection. Records are embedded and indexed for semantic search. Only available in Ingest mode. |
Limit (limit) | Integer | Input parameter. Limit the number of records to compare when Allow Duplicates is false. This can help improve performance when writing to large collections, but it can result in some duplicate records. Only available in Ingest mode. |
Retrieve
To read from your local Chroma vector store, use Retrieve. The following parameters are available in Retrievemode:
Name | Type | Description |
---|---|---|
Persist Directory (persist_directory) | String | Input parameter. The base directory where you want to create and persist the vector store. If you use the Local DB component in multiple flows or to create multiple collections, collections are stored at $PERSISTENT_DIRECTORY/vector_stores/$COLLECTION_NAME. For more information, see Memory management options. |
Existing Collections (existing_collections) | String | Input parameter. Select a previously-created collection to search. Only available in Retrieve mode. |
Embedding (embedding) | Embeddings | Input parameter. The embedding function to use for the vector store. |
Search Type (search_type) | String | Input parameter. The type of search to perform, either Similarity or MMR. Only available in Retrieve mode. |
Search Query (search_query) | String | Input parameter. Enter a query for similarity search. Only available in Retrieve mode. |
Number of Results (number_of_results) | Integer | Input parameter. Number of search results to return. Default: 10. Only available in Retrieve mode. |