Models
110 modelsdoubao-seedream-5-0-260128
DoubaoPER REQ$0.0400
A flagship image generation model from ByteDance, featuring exceptional semantic understanding and visual synthesis capabilities. It supports multimodal inputs (text and image) and leverages web search to enhance contextual knowledge. The model excels in fine-grained detail, style transfer, and subject consistency, providing a professional-grade, high-definition solution for e-commerce, film post-production, and creative design.
Input Type:
Output Type:
Web Search
gpt-4o-transcribe
OpenAIINPUT$2.2750/M
OUTPUT$9.1000/M
gpt-4o-transcribe is OpenAI’s speech-to-text model for automatic transcription and voice understanding tasks, built within the GPT-4o family. It is designed to convert audio into text with strong transcription quality, including support for multilingual speech and robust handling of natural spoken language across varied audio conditions. Typical use cases include meeting notes, subtitle generation, voice assistants, customer service analysis, and speech content indexing or retrieval.
Input Type:
Output Type:
text-embedding-3-small
OpenAICONTEXT8K
INPUT$0.0200/M
OUTPUT$0.0200/M
This efficient and cost-effective embedding model is designed for text processing, transforming text into high-performance semantic vectors. Optimized for multilingual capabilities and semantic retrieval, it is ideal for building semantic search, recommendation systems, clustering, and RAG applications. With flexible dimension control, it offers an optimal balance between performance and cost, serving as an ideal foundation for enterprise-grade AI development.
Input Type:
Output Type:
glm-4.5-air
Zhipu AICONTEXT128K
INPUT$0.1092/M
OUTPUT$0.7284/M
GLM-4.5-Air is a lightweight general-purpose model from Zhipu AI, positioned to balance capability, latency, and cost for high-throughput online applications. It supports multi-turn dialogue, instruction following, content generation, basic reasoning, and code-related tasks, with an emphasis on fast response and practical reliability. It is suitable for AI assistants, enterprise Q&A, content drafting, educational support, and lightweight application integration where efficiency and scalability matter.
Input Type:
Output Type:
ReasoningWeb SearchTool UseFunction CallingLong Context
kimi-k2.5-free
Moonshot AICONTEXT256K
INPUT$0.0000/M
OUTPUT$0.0000/M
As Moonshot AI's flagship native multimodal model, it excels in visual understanding, complex logical reasoning, and code generation. Featuring a 1T-parameter MoE architecture and agent swarm technology, it supports a 256K context window and handles multi-step complex tasks with high efficiency, making it an ideal choice for building high-performance AI applications and automated workflows.
Input Type:
Output Type:
ReasoningWeb SearchTool UseFunction CallingStructured OutputLong ContextCode Execution
qwen-image-max
QwenPER REQ$0.0723
A flagship professional-grade image generation model, excelling in complex text rendering and layout design. It delivers photorealistic quality with precise material and texture reproduction, making it an ideal choice for posters, product imagery, and commercial creative workflows.
Input Type:
Output Type:
doubao-seedance-2.0-V2V
DoubaoINPUT$8.1000/M
OUTPUT$8.1000/M
Built on a unified multimodal audio-video generation architecture, this model supports mixed inputs of text, images, audio, and video to deliver high-fidelity, audiovisual-synchronized content. It features exceptional motion control and physical simulation capabilities, supporting multi-shot narrative and video editing to produce professional, fluid, and coherent video content suitable for film, marketing, and creative design scenarios.
Multimodal Output
doubao-seedance-2.0-fast-V2V
DoubaoINPUT$6.3700/M
OUTPUT$6.3700/M
A next-generation video generation model based on a dual-branch diffusion Transformer architecture, supporting joint input of text, images, audio, and video. It features excellent motion stability and physical fidelity, enables multi-shot narrative consistency and native audio-visual synchronization, and provides efficient, controllable video production solutions for industrial scenarios such as film, advertising, and creative design.
Multimodal Output
deepseek-v3.2-exp-thinking
DeepSeekCONTEXT128K
INPUT$0.2876/M
OUTPUT$0.4314/M
An experimental reasoning model built on the DeepSeek V3.2 architecture, featuring DeepSeek Sparse Attention (DSA) for enhanced long-context efficiency. It specializes in deep chain-of-thought analysis and complex problem decomposition, delivering robust performance in mathematical reasoning, coding, and logical deduction with superior long-range modeling capabilities.
Input Type:
Output Type:
ReasoningLong Context
jina-clip-v2
JinaINPUT$0.2458/M
OUTPUT$0.2458/M
A state-of-the-art multilingual, multimodal embedding model for both cross-modal (text-image) and unimodal (text-text) retrieval. Covering 89 languages and supporting 512x512 image resolution, it utilizes Matryoshka Representation Learning for flexible embedding dimensions, balancing performance and efficiency for complex retrieval applications.
Input Type:
Output Type:
Loading…