Parameters
Sampling parameters shape the token generation process of the model. You may send any parameters from the following list, as well as others, to OpenRoute AI.
OpenRoute AI will default to the values listed below if certain parameters are absent from your request (for example, temperature
to 1.0). We will also transmit some provider-specific parameters, such as safe_prompt
for Mistral or raw_mode
for Hyperbolic directly to the respective providers if specified.
Please refer to the model's provider section to confirm which parameters are supported. For detailed guidance on managing provider-specific parameters, click here.
API Parameters Reference
This comprehensive guide covers all available parameters for OpenRoute AI API requests, organized by functional categories to help developers quickly find and configure the parameters they need.
Core Parameters
Essential parameters required for all API requests.
Required Parameters
Model
-
Key:
model
-
Required, string
-
ID of the model to use. Refer to the model endpoint compatibility table for details on which models work with the Chat API.
Messages
-
Key:
messages
-
Required, array
-
A list of messages comprising the conversation so far.
Properties of messages
Each message in the array contains the following properties:
-
role
: string - The role of the message's author. Roles can be: system, user, assistant, function or tool. -
content
: string or list[dict] or null - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls. -
name
: string (optional) - The name of the author of the message. It is required if the role is "function". The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters. -
function_call
: object (optional) - The name and arguments of a function that should be called, as generated by the model. -
tool_call_id
: str (optional) - Tool call that this message is responding to.
Generation Control Parameters
Parameters that control how the model generates responses, including sampling methods and output constraints.
Sampling Parameters
Temperature
-
Key:
temperature
-
Optional, float, 0.0 to 2.0
-
Default: 1.0
-
Explainer Video: Watch
This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.
Top P
-
Key:
top_p
-
Optional, float, 0.0 to 1.0
-
Default: 1.0
-
Explainer Video: Watch
This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.
Top K
-
Key:
top_k
-
Optional, integer, 0 or above
-
Default: 0
-
Explainer Video: Watch
This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.
Min P
-
Key:
min_p
-
Optional, float, 0.0 to 1.0
-
Default: 0.0
Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.
Top A
-
Key:
top_a
-
Optional, float, 0.0 to 1.0
-
Default: 0.0
Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.
Seed
-
Key:
seed
-
Optional, integer
If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.
Penalty Parameters
Parameters that control repetition and token frequency in responses.
Frequency Penalty
-
Key:
frequency_penalty
-
Optional, float, -2.0 to 2.0
-
Default: 0.0
-
Explainer Video: Watch
This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.
Presence Penalty
-
Key:
presence_penalty
-
Optional, float, -2.0 to 2.0
-
Default: 0.0
-
Explainer Video: Watch
Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.
Repetition Penalty
-
Key:
repetition_penalty
-
Optional, float, 0.0 to 2.0
-
Default: 1.0
-
Explainer Video: Watch
Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability.
Output Control Parameters
Parameters that control the length, format, and structure of generated responses.
Max Tokens
-
Key:
max_tokens
-
Optional, integer, 1 or above
This sets the upper limit for the number of tokens the model can generate in response. It won't produce more than this limit. The maximum value is the context length minus the prompt length.
Max Completion Tokens
-
Key:
max_completion_tokens
-
Optional, integer
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
Stop
-
Key:
stop
-
Optional, array
Stop generation immediately if the model encounter any token specified in the stop array.
Number of Choices
-
Key:
n
-
Optional, integer
The number of chat completion choices to generate for each input message.
Verbosity
-
Key:
verbosity
-
Optional, enum (low, medium, high)
-
Default: medium
Controls the verbosity and length of the model response. Lower values produce more concise responses, while higher values produce more detailed and comprehensive responses.
Response Format Parameters
Parameters that control the format and structure of the response output.
Response Format
-
Key:
response_format
-
Optional, map
Forces the model to produce specific output format. Setting to { "type": "json_object" }
enables JSON mode, which guarantees the message the model generates is valid JSON.
Note: when using JSON mode, you should also instruct the model to produce JSON yourself via a system or user message.
Structured Outputs
-
Key:
structured_outputs
-
Optional, boolean
If the model can return structured outputs using response_format json_schema.
Tool and Function Calling Parameters
Parameters that enable the model to call external tools and functions.
3.1 Tool Configuration
Tools
-
Key:
tools
-
Optional, array
Tool calling parameter, following OpenAI's tool calling request shape. For non-OpenAI providers, it will be transformed accordingly. Click here to learn more about tool calling
Tool Choice
-
Key:
tool_choice
-
Optional, array
Controls which (if any) tool is called by the model. 'none' means the model will not call any tool and instead generates a message. 'auto' means the model can pick between generating a message or calling one or more tools. 'required' means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}}
forces the model to call that tool.
Parallel Tool Calls
-
Key:
parallel_tool_calls
-
Optional, boolean
-
Default: true
Whether to enable parallel function calling during tool use. If true, the model can call multiple functions simultaneously. If false, functions will be called sequentially. Only applies when tools are provided.
Debugging and Logging Parameters
Parameters that provide detailed information about the generation process for debugging and analysis.
Logging Parameters
Logit Bias
-
Key:
logit_bias
-
Optional, map
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
Logprobs
-
Key:
logprobs
-
Optional, boolean
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
Top Logprobs
-
Key:
top_logprobs
-
Optional, integer
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
Streaming Parameters
Parameters that control real-time streaming of responses.
Streaming Configuration
Stream
-
Key:
stream
-
Optional, boolean
If set to true, it sends partial message deltas. Tokens will be sent as they become available, with the stream terminated by a [DONE] message.
Stream Options
-
Key:
stream_options
-
Optional, dict
Options for streaming response. Only set this when you set stream: true
include_usage
: boolean (optional) - If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.
User and Metadata Parameters
Parameters for user identification, request metadata, and system configuration.
User Identification
User
-
Key:
user
-
Optional, string
-
A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.
Request Configuration
Timeout
-
Key:
timeout
-
Optional, int
-
Timeout in seconds for completion requests (Defaults to 600 seconds)
Headers
-
Key:
headers
-
Optional, dict
-
A dictionary of headers to be sent with the request.
Extra Headers
-
Key:
extra_headers
-
Optional, dict
-
Alternative to
headers
, used to send extra headers in LLM API request.
Advanced Configuration Parameters
Parameters for advanced system configuration, fallbacks, and custom behavior.
API Configuration
API Base
-
Key:
api_base
-
Optional, string
-
The api endpoint you want to call the model with
API Version
-
Key:
api_version
-
Optional, string
-
(Azure-specific) the api version for the call
API Key
-
Key:
api_key
-
Optional, string
-
The API key for the request
Model List
-
Key:
model_list
-
Optional, list
-
Pass in a list of api_base, keys, etc.
Fallback and Retry Configuration
Number of Retries
-
Key:
num_retries
-
Optional, int
-
The number of times to retry the API call if an APIError, TimeoutError or ServiceUnavailableError occurs
Context Window Fallback Dict
-
Key:
context_window_fallback_dict
-
Optional, dict
-
A mapping of model to use if call fails due to context window error
Fallbacks
-
Key:
fallbacks
-
Optional, list
-
A list of model names + params to be used, in case the initial call fails
Metadata
-
Key:
metadata
-
Optional, dict
-
Any additional data you want to be logged when the call is made (sent to logging integrations, eg. promptlayer and accessible via custom callback function)
Custom Cost Parameters
Parameters for custom cost tracking and billing configuration.
Cost Configuration
Input Cost Per Token
-
Key:
input_cost_per_token
-
Optional, float
-
The cost per input token for the completion call
Output Cost Per Token
-
Key:
output_cost_per_token
-
Optional, float
-
The cost per output token for the completion call
Last updated on