Structured Output

Estimated reading: 4 minutes 172 views

The Structured Output component uses an LLM to transform any input into structured data (Data or DataFrame) using natural language formatting instructions and an output schema definition. For example, you can extract specific details from documents, like email messages or scientific papers.

Use the Structured Output component in a flow

To use the Structured Output component in a flow, do the following:

1. Provide an Input Message, which is the source material from which you want to extract structured data. This can come from practically any component, but it is typically a Chat InputFile, or other component that provides some unstructured or semi-structured input.

2. Define Format Instructions and an Output Schema to specify the data to extract from the source material and how to structure it in the final Data or DataFrame output.

The instructions are a prompt that tell the LLM what data to extract, how to format it, how to handle exceptions, and any other instructions relevant to preparing the structured data.

The schema is a table that defines the fields (keys) and data types to organize the data extracted by the LLM into a structured Data or DataFrame object. For more information, see Output Schema options

3. Attach a Language Model component that is set to emit LanguageModel output.

The Language Model component uses the Input Message and Format Instructions from the Structured Output component to extract specific pieces of data from the input text. The output schema is applied to the model’s response to produce the final Data or DataFrame structured object.

4. Optional: Typically, the structured output is passed to downstream components that use the extracted data for other processes, such as other Processing components like the Parser or Data Operations components.

Structured Output parameters

Some Structured Output component input parameters are hidden by default in the visual editor. You can toggle parameters through the  Controls in the component’s header menu.

Name Type Description
Language Model (llm) LanguageModel Input parameter. The LanguageModel output from a Language Model component that defines the LLM to use to analyze, extract, and prepare the structured output.
Input Message (input_value) String Input parameter. The input message containing source material for extraction.
Format Instructions (system_prompt) String Input parameter. The instructions to the language model for extracting and formatting the output.
Schema Name (schema_name) String Input parameter. An optional title for the Output Schema.
Output Schema (output_schema) Table Input parameter. A table describing the schema of the desired structured output, ultimately determining the content of the Data or DataFrame output. See Output Schema options.
Structured Output (structured_output) Data or DataFrame Output parameter. The final structured output produced by the component. Near the component's output port, you can select the output data type as either Structured Output Data or Structured Output DataFrame. The specific content and structure of the output depends on the input parameters.

Output Schema options

After the LLM extracts the relevant data from the Input Message and Format Instructions, the data is organized according to the Output Schema.

The schema is a table that defines the fields (keys) and data types for the final Data or DataFrame output from the Structured Output component.

The default schema is a single field string.

To add a key to the schema, click Add a new row, and then edit each column to define the schema:

1. Name: The name of the output field. Typically, a specific key for which you want to extract a value.

You can reference these keys as variables in downstream components, such as a Parser component’s template. For example, the schema key NET_INCOME could be referenced by the variable {NET_INCOME}.

2. Description: An optional metadata description of the field’s contents and purpose.
3. Type: The data type of value stored in the field. Supported types are str (default), int, float, bool, and dict.
4. As List: Enable this setting if you want the field to contain a list of values rather than a single value.

For simple schemas, you might only extract a few strings or int fields. For more complex schemas with lists and dictionaries, it might help to refer to the Data and DataFrame structures and attributes, as described in Robility flow data types. You can also emit a rough Data or DataFrame, and then use downstream components for further refinement, such as a Data Operations component.

Share this Doc

Structured Output

Or copy link

CONTENTS