Multimodal Capabilities

OpenRoute supports multiple input modalities beyond text, allowing you to send images, PDFs, and audio files to compatible models through our unified API. This enables rich multimodal interactions for a wide variety of use cases.

Supported Modalities

Images

Send images to vision-capable models for analysis, description, OCR, and more. OpenRoute supports multiple image formats and both URL-based and base64-encoded images.

Learn more about image inputs →

Image Generation

Generate images from text prompts using AI models with image output capabilities. OpenRoute supports various image generation models that can create high-quality images based on your descriptions.

Learn more about image generation →

PDFs

Process PDF documents with any model on OpenRoute. Our intelligent PDF parsing system extracts text and handles both text-based and scanned documents.

Learn more about PDF processing →

Audio

Send audio files to speech-capable models for transcription, analysis, and processing. OpenRoute supports common audio formats with automatic routing to compatible models.

Learn more about audio inputs →

Getting Started

All multimodal inputs use the same /api/v1/chat/completions endpoint with the messages parameter. Different content types are specified in the message content array:

Images: Use image_url content type
PDFs: Use file content type with PDF data
Audio: Use input_audio content type

You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.

Model Compatibility

Not all models support every modality. OpenRoute automatically filters available models based on your request content:

Vision models: Required for image processing
File-compatible models: Can process PDFs natively or through our parsing system
Audio-capable models: Required for audio input processing

Use our Models page to find models that support your desired input modalities.

Input Format Support

OpenRoute supports both direct URLs and base64-encoded data for multimodal inputs:

URLs (Recommended for public content)

Images: https://example.com/image.jpg
PDFs: https://example.com/document.pdf
Audio: Not supported via URL (base64 only)

Base64 Encoding (Required for local files)

Images: data:image/jpeg;base64,{base64_data}
PDFs: data:application/pdf;base64,{base64_data}
Audio: Raw base64 string with format specification

URLs are more efficient for large files as they don't require local encoding and reduce request payload size. Base64 encoding is required for local files or when the content is not publicly accessible.

Frequently Asked Questions

Last updated on

Supported Modalities

Images

Send images to vision-capable models for analysis, description, OCR, and more. OpenRoute supports multiple image formats and both URL-based and base64-encoded images.

Learn more about image inputs →

Image Generation

Generate images from text prompts using AI models with image output capabilities. OpenRoute supports various image generation models that can create high-quality images based on your descriptions.

Learn more about image generation →

PDFs

Process PDF documents with any model on OpenRoute. Our intelligent PDF parsing system extracts text and handles both text-based and scanned documents.

Learn more about PDF processing →

Audio

Send audio files to speech-capable models for transcription, analysis, and processing. OpenRoute supports common audio formats with automatic routing to compatible models.

Learn more about audio inputs →

Getting Started

All multimodal inputs use the same /api/v1/chat/completions endpoint with the messages parameter. Different content types are specified in the message content array:

Images: Use image_url content type
PDFs: Use file content type with PDF data
Audio: Use input_audio content type

You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.

Model Compatibility

Not all models support every modality. OpenRoute automatically filters available models based on your request content:

Vision models: Required for image processing
File-compatible models: Can process PDFs natively or through our parsing system
Audio-capable models: Required for audio input processing

Use our Models page to find models that support your desired input modalities.

Input Format Support

OpenRoute supports both direct URLs and base64-encoded data for multimodal inputs:

Multimodal Capabilities

Supported Modalities

Images

Image Generation

PDFs

Audio

Getting Started

Model Compatibility

Input Format Support

URLs (Recommended for public content)

Base64 Encoding (Required for local files)

Frequently Asked Questions

On this page

Multimodal Capabilities

Supported Modalities

Images

Image Generation

PDFs

Audio

Getting Started

Model Compatibility

Input Format Support

URLs (Recommended for public content)

Base64 Encoding (Required for local files)

Frequently Asked Questions

On this page

Multimodal Capabilities

Can I mix different modalities in one request?

How is multimodal content priced?

What about video support?

On this page

Multimodal Capabilities

Can I mix different modalities in one request?

How is multimodal content priced?

What about video support?

On this page