Ollama

Estimated reading: 3 minutes

The Ollama component allows integration with an Ollama server to run language models within Robility Flow automation workflows. It supports Chat and Embeddings modes and dynamically loads available models from the configured Ollama endpoint.

The component accepts an API base URL or IP address, either directly or through a global variable. If authentication is required, the platform authenticates using the provided API key. Whenever the mode is changed, the available model list is refreshed automatically.

Prerequisites

Ollama supports the following models for chat and embeddings.

1. Chat Models (Text Generation)
These models are used to generate conversational responses and natural language text.
2gpt-oss:latest— An open-source, GPTstyle model for general conversational tasks
1:latest— Meta’s latest Llama model, optimized for instruction following and chat use cases
llama3:latest— An earlier version of Meta’s Llama chat model, suitable for standard conversations

2. Embedding Models (Vector Search / RAG)
These models convert text into numerical vectors for search, similarity matching, and retrieval‑augmented generation (RAG).
mxbai-embed-large:latest— A high quality embedding model designed for accurate search and retrieval
nomic-embed-text:latest— A fast and efficient embedding model for large‑scale text indexing

Key Capabilities

1. Configure the Ollama endpoint using an API URL or IP address.
2. Support for optional API key–based authentication.
3. Validate connectivity with the Ollama server.
4. Dynamically load available model Name.
5. Support for Chatand Embeddingsoperations.

Parameter

Parameter	Description
Mode*	Defines the operation type for the component. • Chat: Generates human-readable text responses based on the provided input. • Embeddings: Converts text into numerical vector representations that capture semantic meaning.
API URL or IP*	Specifies the Ollama server endpoint. Can be a direct URL/IP address or a reference to a global variable.
API Key	Authentication keys are used to securely access the Ollama server.
Model Name*	Language model selected for execution. The list of available models is dynamically populated, and updates automatically based on the selected Mode.
Input Text*	Text sent to the model for processing. • In Chat mode, this is the prompt or message provided to the model.
Temperature	Controls the randomness and creativity of the model’s output. • Lower values: More focused, deterministic responses. • Higher values (0.7–1.0): More creative and varied responses.
Max Tokens	Maximum number of tokens the model can generate in response. Helps control the length of the output.
Top P	Controls the selection of the probable next words until their total probability reaches a threshold. • Lower values: More focused and predictable output. • Higher values: More diverse and flexible responses.
Top K	Limits the number of candidate tokens considered at each generation step. • Lower values: More precise and stable output. • Higher values: More diverse and creative output.
Keep Alive	Determines whether the connection to the server remains persistent. • Enabled: Maintains the connection, reducing overhead for multiple requests. • Disabled: Creates a new connection for each request.

*Required Fields

Output

Mode	Output	Description
Chat	Model Response	Provides the response generated for the message or prompt specified in Input Text.
Chat	Language Model	The model that produces the response, returned in JSON format.
Embeddings	Embedding	Vector representation of text generated by the component, used when integrating with other embedding-based components.

Ollama

Prerequisites

Parameter

Output

Ollama

CONTENTS