Robility KB Ingestion

Estimated reading: 5 minutes

The Robility KB Ingestion component creates or updates a centralized knowledge base for your structured data and documents. It serves as a robust repository for standard workflow queries while automatically handling text chunking and vector embedding behind the scenes.

This ensures your ingested data such as product documentation, FAQs, and policy files are instantly optimized for fast semantic lookup, allowing both traditional automations and AI agents to ground their responses in accurate content.

Important This component cannot ingest files directly. It only accepts a Data or DataFrame output from an upstream component. To load a file into the knowledge base, you must first pass it through a Read File component, then connect its output to the Input Data field of this component.

Prerequisites

Before using this component, ensure the following are in place:

LLM and Embedding Configuration (Mandatory)

To enable KB ingestion and retrieval functionality, you must configure the required Azure LLM and Embedding credentials.This step is mandatory. Without valid LLM and embedding configuration, the KB ingestion and retrieval components will not function.

Steps:

The following steps are performed only by Tenant Admins.

1. Log in to the Robility Manager.
2. Navigate to the Settings menu.
3. Open the LLM Configuration section.
4. Provide the following details:
a. Azure OpenAI / LLM credentials
b. Embedding model configuration
5. Save the configuration.

Recommended Workflow

Step 1 — Read File component

Add a Read File component to your workflow and configure it to point at the source file you want to ingest (for example, a PDF, DOCX, TXT, or CSV). The Read File component reads the raw content and outputs it as a structured Data or DataFrame object.

Step 2 — KB Ingestion component

Connect the Data or DataFrame output port of the Read File component to the Input Data field of the KB Ingestion component. Then configure the remaining parameters: set Action to Create (for a new knowledge base) or Update (for an existing one), enter a Knowledge Name, select the Embedding Model, and adjust chunking settings if needed.

When the workflow runs, KB Ingestion receives the structured records, splits them into chunks, generates vector embeddings, and stores the result in the knowledge base ready for downstream retrieval.

Parameters

Knowledge Base Parameters

Parameter	Description
Action	Determines whether to create a new knowledge base or update an existing one. Create: Provisions a new knowledge base. Fails if a knowledge base with the same name already exists. Update: Appends or modifies records in an existing knowledge base. The named knowledge base must already exist.
Knowledge Name	A unique identifier for the knowledge base. Used to reference it in downstream components. Must start with a letter. Allowed: letters, numbers, hyphens (-), underscores (_). Length: 3–60 characters. Accepts a literal string or a global variable.
Knowledge Description	A brief description of what the knowledge base contains. Helps agents and collaborators understand its purpose. Length: 10–500 characters.
Embedding Model	The model used to convert your data into vector embeddings for semantic search. Currently supports Azure OpenAI. Your Azure OpenAI resource and embedding deployment must be configured in the environment before running the workflow.
Input Data	The data to ingest. Connect the output of an upstream Data or DataFrame component. Each record must include two fields: text (the content to embed) and file_name (a source identifier). Supports single-file and multi-file inputs.
Chunk Size	Controls how input text is split into segments before embedding. Smaller values improve precision for targeted lookups. Larger values preserve more context per chunk. Default: 1000.
Chunk Overlap	The number of characters shared between consecutive chunks. Prevents context from being lost at chunk boundaries. Higher overlap reduces information loss at boundaries but increases the total number of chunks stored. Default: 200.
Timeout (Seconds)	Maximum time the component waits for the ingestion process to complete before raising a timeout error. Increase this value when ingesting large files or batches. Default: 30 seconds.
Retry Count	Number of times the component automatically retries if ingestion fails. Default: 2.
Delay Between Retries	Time in milliseconds to wait between retry attempts. Provides a back-off window before reattempting.
Delay Before Execution	Time in milliseconds to pause before the component begins processing. Useful for rate-limiting or sequencing within a workflow.
Delay After Execution	Time in milliseconds to pause after the component finishes. Allows downstream components time to prepare.
Continue on Error	Defines the workflow behaviour when this component encounters an unrecoverable error. Stop Workflow: Halts execution immediately. No subsequent steps run. Continue: Skips this component and resumes with the next step. The error is not captured downstream. Continue Using Error Output: Proceeds with execution and routes error details (message, code, context) through the error output handle for conditional handling or logging.

Tip When ingesting large KB, start with the default chunk size of 1000 and a timeout of 60–120 seconds. Tune chunk size down if retrieval results are too broad, or up if responses lack sufficient context.

Output

Knowledge Base Output Fields

Field	Description
knowledge_base	Name and identifier of the created or updated knowledge base.
records_processed	Total number of records successfully embedded and stored.
embedding_status	Indicates whether all records were embedded without error.
storage_status	Confirms whether the embedded vectors were persisted to the knowledge store.

Robility KB Ingestion

Prerequisites

Recommended Workflow

Parameters

Robility KB Ingestion

CONTENTS