curl -X POST "https://api.openroute.cn/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Hello!" } ], "temperature": 1, "max_tokens": 150, "stream": false, "response_format": { "type": "json_object" }, "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": [ "location" ] } } } ], "tool_choice": "auto" }'
Last updated on
A list of messages comprising the conversation so far.
Timeout in seconds for completion requests (Defaults to 600 seconds).
The sampling temperature to be used, between 0 and 2. Higher values like 0.8 produce more random outputs, while lower values like 0.2 make outputs more focused and deterministic.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
The number of chat completion choices to generate for each input message.
If set to true, it sends partial message deltas. Tokens will be sent as they become available, with the stream terminated by a [DONE] message.
Options for streaming response. Only set this when you set stream: true.
Up to 4 sequences where the API will stop generating further tokens.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens to generate in the chat completion.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.
Used to modify the probability of specific tokens appearing in the completion.
A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.
An object specifying the format that the model must output. Setting to {"type": "json_object"} enables JSON mode.
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
A list of tools the model may call. Currently, only functions are supported as a tool.
Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function.
Whether to enable parallel function calling during tool use. OpenAI default is true.
Whether to return log probabilities of the output tokens or not. If true returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
Azure-specific deployment ID for the model.
A list of functions that the model may use to generate JSON inputs. (Deprecated in favor of tools)
Controls how the model responds to function calls. (Deprecated in favor of tool_choice)
The API endpoint you want to call the model with.
(Azure-specific) the API version for the call.
The API key to authenticate and authorize requests.
A list of API base URLs, keys, etc.
The number of times to retry the API call if an APIError, TimeoutError or ServiceUnavailableError occurs.
A mapping of model to use if call fails due to context window error.
A list of model names + params to be used, in case the initial call fails.
Any additional data you want to be logged when the call is made.
The cost per input token for the completion call.
The cost per output token for the completion call.
Initial string applied at the start of the input messages.
Dictionary specifying how to format the prompt based on the role + message passed in via messages.
Final string applied at the end of the input messages.
Initial string applied at the start of a sequence.
Initial string applied at the end of a sequence.
(Sagemaker Only) The corresponding huggingface name of the model, used to pull the right chat template for the model.
A dictionary of headers to be sent with the request.
Alternative to headers, used to send extra headers in LLM API request.
curl -X POST "https://api.openroute.cn/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Hello!" } ], "temperature": 1, "max_tokens": 150, "stream": false, "response_format": { "type": "json_object" }, "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } }, "required": [ "location" ] } } } ], "tool_choice": "auto" }'