LiteLLM Proxy
The LiteLLM Proxy acts as a central gateway that connects your AI workflows to multiple large language model (LLM) providers through a single, unified API.
Instead of integrating separately with providers such as OpenAI, Anthropic, Google Gemini, Azure OpenAI, or Ollama, the proxy routes all requests to the appropriate model behind the scenes. This simplifies integration and allows you to work with multiple providers without changing your workflow logic.
In Robility Flow, the LiteLLM component communicates with models through this proxy endpoint. This enables centralized management of models while handling key operations such as request routing, authentication, retries, load balancing, and provider switching.
Key Benefits
1. Unified Access: Use a single API to interact with multiple LLM providers
2. Simplified Integration: Avoid managing separate integrations for each provider
3. Flexible Provider Switching: Change models or providers without modifying workflows
4. Centralized Control: Manage authentication, routing, and execution behavior in one place
5. Improved Reliability: Supports retries, load balancing, and failover for stable performance.
Parameter
| Parameter | Description |
|---|---|
| Input |
Defines the endpoint used by the LiteLLM component to communicate with the LiteLLM Proxy server. This acts as a unified API gateway for routing requests to configured LLM providers.
Default: http://localhost:4000/v1
|
| System Message | Defines the base instruction or behavior for the language model before processing user input. This helps control response style, tone, formatting, and domain-specific behavior. |
| Stream | Enables real-time streaming of responses instead of waiting for the full output. Improves responsiveness in chat-based interactions. Supported in Chat mode only. |
| LiteLLM Proxy URL | Specifies the base URL of the LiteLLM Proxy server. All requests are routed through this endpoint, which handles provider abstraction, authentication, retries, and load balancing. |
| Virtual Key | Authentication credential used to securely access the LiteLLM Proxy. Helps centralize access control without exposing individual provider API keys. |
| Model Name | Specifies the target model used to process the request through the LiteLLM Proxy (e.g., GPT models or other supported LLMs). |
| Temperature | Controls the randomness and creativity of generated responses. Lower values produce predictable outputs, while higher values increase variation and creativity. |
| Max Tokens |
Defines the maximum number of tokens the model can generate in response. Helps control output length, latency, and API usage cost.
Set to 0 for no explicit limit. |
| Timeout (Seconds) |
Defines the maximum time the component waits for a response before terminating the request. Prevents workflow delays due to stalled responses.
Default: 60 |
| Max Retries |
Specifies the number of retry attempts if a request fails due to temporary issues such as network errors, timeouts, or rate limits.
Default: 2 |
Temperature Behavior
| Value Range | Behavior |
|---|---|
| 0.0 – 0.3 (Low) | Produces deterministic, focused, and predictable responses with minimal randomness. Ideal for factual or consistent outputs. |
| 0.4 – 0.7 (Medium) | Balances creativity and consistency. Responses remain coherent while allowing moderate variation. |
| 0.8 – 1.0 (High) | Produces more creative, diverse, and less predictable outputs. Suitable for brainstorming or open-ended tasks. |
Output
| Output | Description |
|---|---|
| Model Response | The response generated by the selected language model through the LiteLLM Proxy after processing the input. This represents the final AI-generated output returned to the workflow. |
| Language Model | The configured language model instance accessed through the LiteLLM Proxy. This output can be used for further integration or chaining within the workflow. |