大模型价格 – 聚合 LLM API 价格对比

Models

110 models

doubao-seedream-5-0-260128

Doubao

PER REQ$0.0400

A flagship image generation model from ByteDance, featuring exceptional semantic understanding and visual synthesis capabilities. It supports multimodal inputs (text and image) and leverages web search to enhance contextual knowledge. The model excels in fine-grained detail, style transfer, and subject consistency, providing a professional-grade, high-definition solution for e-commerce, film post-production, and creative design.

Input Type:

Output Type:

Web Search

gpt-4o-transcribe

OpenAI

INPUT$2.2750/M

OUTPUT$9.1000/M

gpt-4o-transcribe is OpenAI’s speech-to-text model for automatic transcription and voice understanding tasks, built within the GPT-4o family. It is designed to convert audio into text with strong transcription quality, including support for multilingual speech and robust handling of natural spoken language across varied audio conditions. Typical use cases include meeting notes, subtitle generation, voice assistants, customer service analysis, and speech content indexing or retrieval.

Input Type:

Output Type:

text-embedding-3-small

OpenAI

CONTEXT8K

INPUT$0.0200/M

OUTPUT$0.0200/M

This efficient and cost-effective embedding model is designed for text processing, transforming text into high-performance semantic vectors. Optimized for multilingual capabilities and semantic retrieval, it is ideal for building semantic search, recommendation systems, clustering, and RAG applications. With flexible dimension control, it offers an optimal balance between performance and cost, serving as an ideal foundation for enterprise-grade AI development.

Input Type:

Output Type:

glm-4.5-air

Zhipu AI

CONTEXT128K

INPUT$0.1092/M

OUTPUT$0.7284/M

GLM-4.5-Air is a lightweight general-purpose model from Zhipu AI, positioned to balance capability, latency, and cost for high-throughput online applications. It supports multi-turn dialogue, instruction following, content generation, basic reasoning, and code-related tasks, with an emphasis on fast response and practical reliability. It is suitable for AI assistants, enterprise Q&A, content drafting, educational support, and lightweight application integration where efficiency and scalability matter.

Input Type:

Output Type:

ReasoningWeb SearchTool UseFunction CallingLong Context

kimi-k2.5-free

Moonshot AI

CONTEXT256K

INPUT$0.0000/M

OUTPUT$0.0000/M

As Moonshot AI's flagship native multimodal model, it excels in visual understanding, complex logical reasoning, and code generation. Featuring a 1T-parameter MoE architecture and agent swarm technology, it supports a 256K context window and handles multi-step complex tasks with high efficiency, making it an ideal choice for building high-performance AI applications and automated workflows.

Input Type:

Output Type:

ReasoningWeb SearchTool UseFunction CallingStructured OutputLong ContextCode Execution

qwen-image-max

Qwen

PER REQ$0.0723

A flagship professional-grade image generation model, excelling in complex text rendering and layout design. It delivers photorealistic quality with precise material and texture reproduction, making it an ideal choice for posters, product imagery, and commercial creative workflows.

Input Type:

Output Type:

doubao-seedance-2.0-V2V

Doubao

INPUT$8.1000/M

OUTPUT$8.1000/M

Built on a unified multimodal audio-video generation architecture, this model supports mixed inputs of text, images, audio, and video to deliver high-fidelity, audiovisual-synchronized content. It features exceptional motion control and physical simulation capabilities, supporting multi-shot narrative and video editing to produce professional, fluid, and coherent video content suitable for film, marketing, and creative design scenarios.

Multimodal Output

doubao-seedance-2.0-fast-V2V

Doubao

INPUT$6.3700/M

OUTPUT$6.3700/M

A next-generation video generation model based on a dual-branch diffusion Transformer architecture, supporting joint input of text, images, audio, and video. It features excellent motion stability and physical fidelity, enables multi-shot narrative consistency and native audio-visual synchronization, and provides efficient, controllable video production solutions for industrial scenarios such as film, advertising, and creative design.

Multimodal Output

deepseek-v3.2-exp-thinking

DeepSeek

CONTEXT128K

INPUT$0.2876/M

OUTPUT$0.4314/M

An experimental reasoning model built on the DeepSeek V3.2 architecture, featuring DeepSeek Sparse Attention (DSA) for enhanced long-context efficiency. It specializes in deep chain-of-thought analysis and complex problem decomposition, delivering robust performance in mathematical reasoning, coding, and logical deduction with superior long-range modeling capabilities.

Input Type:

Output Type:

ReasoningLong Context

jina-clip-v2

Jina

INPUT$0.2458/M

OUTPUT$0.2458/M

A state-of-the-art multilingual, multimodal embedding model for both cross-modal (text-image) and unimodal (text-text) retrieval. Covering 89 languages and supporting 512x512 image resolution, it utilizes Matryoshka Representation Learning for flexible embedding dimensions, balancing performance and efficiency for complex retrieval applications.

Input Type:

Output Type:

Loading…