Qwen3.7-Plus is a cost-effective product in Alibaba's Qwen3.7 series. It supports text and image input, and supports text output. Based on the original series' text processing capabilities, it has comprehensively upgraded visual language capabilities while retaining full-stack agents for coding, tool use, and productivity workflows. Its notable feature is multimodal interactive hybrid agent capabilities: it can perceive real scenes, read screens, interact with GUI, generate code based on visual references, and perform end-to-end navigation in mobile applications.
Deep thinking
Visual comprehension
Text generation
Claude Opus 4.8 is the most powerful general-purpose model in Anthropic's Opus series. It supports text, image, and file input, outputs text, has reasoning capabilities, and a context window of 1 million tokens. It is suitable for highly autonomous agents, long-term agent tasks, knowledge tasks, and memory-driven tasks requiring high session consistency.
Coding
Reasoning
Agents
Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output, designed for agent-centric workloads, especially excelling in coding, office and productivity tasks, and long-cycle autonomous execution. Compared with previous Qwen products, this model has significant improvements in coding and agent performance, and supports explicit prompt caching for efficient context reuse.
Deep thinking
Visual comprehension
Text generation
OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and 1M+ token context.
Reasoning
Multimodal
Coding
Qwen3.6 Flash is a fast and efficient language model in Alibaba's Qwen 3.6 series. It supports text, image, and video input, with a context window size of 1 million tokens. It adopts tiered pricing after 256K tokens. It supports instant caching and provides two pricing methods: explicit cache read and cache creation.
Deep thinking
Visual comprehension
Text generation
The next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Delivers stronger performance on complex, multi-step tasks and reliable agentic execution.
Coding
Agents
Reasoning
Qwen Image Plus is an enhanced multimodal image generation and precise editing model. It supports single-image fine-tuning, multi-image fusion, and object transformation, maintaining character identity and product features without drift, achieving natural matching of lighting and textures. It supports up to 2048×2048 resolution, has stronger Chinese text rendering capabilities, better instruction following and detail preservation, suitable for e-commerce product images, design drafts, and marketing material production.
Deep thinking
Visual comprehension
Text generation
The Qwen3.6 native visual language series Plus model demonstrates outstanding performance comparable to the current top-edge models, with a significant improvement in model effect compared to the 3.5 series. The model has been significantly enhanced in code capabilities such as Agentic coding, front-end programming, Vibe coding, as well as multi-modal all-encompassing recognition, OCR, object positioning, etc.
Deep thinking
Visual comprehension
Text generation
GPT-5.4 is OpenAI's latest frontier model that integrates Codex and GPT series into one system. It has a context window of over 1 million tokens (922K input, 128K output), supports text and image input, enabling high-context reasoning, coding, and multimodal analysis in the same workflow.
Audio
Transcription
The Qwen3.5 native visual language series Plus model is designed based on a hybrid architecture, integrating linear attention mechanism and sparse mixed expert model, achieving higher inference efficiency. In multiple task evaluations, the 3.5 series has demonstrated outstanding performance comparable to the current top-edge models. Compared with the 3 series, the model effect has achieved a leapfrog improvement in both pure text and multimodal aspects.
Deep Thinking
Visual Understanding
Text Generation
The Qwen3.5 native visual language series Flash model is designed based on a hybrid architecture, integrating the linear attention mechanism and sparse hybrid expert model. This enables higher inference efficiency. The model's performance has achieved a leapfrog improvement compared to the 3 series in both pure text and multimodal aspects; it is fast in response, and has both inference speed and performance.
Deep thinking
Visual comprehension
Text generation
DeepSeek is a large language model developed by DeepSeek. It excels in code generation, mathematical reasoning, and other fields.
Deep thinking
Text generation
Reasoning
Gemini 3.5 Flash is Google's efficient multimodal model that achieves near-professional-grade coding and reasoning capabilities at Flash-level cost and speed. It is highly optimized for coding efficiency and parallel agent execution loops, supporting text, image, video, audio, and PDF input.
Reasoning
Multimodal
Agentic
Anthropic’s strongest model for coding and long-running professional tasks, with deep contextual understanding and high reliability.
Coding
Reasoning
Agents
A high-speed, high-value thinking model designed for agentic workflows, multi-turn chat, and coding assistance. Delivers near Pro-level reasoning with lower latency.
Reasoning
Multimodal
Agentic
Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows.
Reasoning
Coding
Agentic
WanXiang 2.6 - Video Production - Flash, generates faster and offers better cost-effectiveness. Intelligent scene scheduling supports multi-camera narrative, stable conversations for multiple people, more natural and realistic sound quality. Supports generation up to a maximum duration of 15 seconds.
Video Generation
WanXiang 2.6 - Image Generation from Text. The picture texture, aesthetic expression, and instruction compliance have been upgraded. It demonstrates outstanding capabilities in precise control of artistic style, creating realistic and touching images, generating long texts into images, and covering a wide range of historical and cultural IP. It can generate high-quality and expressive visual content.
Image generation
WanXiang 2.6 - Reference Live Video - Flash, generates faster and offers better cost-effectiveness. Supports specifying a particular person or any item for reference, precisely maintaining consistency in appearance and voice. Supports multi-character reference for seamless collaboration.
Video Generation
WanXiang 2.6 - Image Generation, All-in-One Image Generation Model, Supports Integrated Text-Image Reasoning and Generation, Equipped with Multi-image Creative Integration, Commercial-level Consistency, Transfer of Aesthetic Elements, and Precise Control of Camera Light and Shadows, Significantly Improving the Consistency, Controllability, and Expressiveness of Image Generation.
Image generation
WanXiang 2.6 - WenSheng Video, intelligent shot scheduling supports multi-camera narrative, capable of generating multi-camera narrative videos with consistent main subjects, scenes and atmosphere, with a maximum duration of 15 seconds, higher-quality sound generation, better compliance with instructions and visual quality.
Video Generation
The Max model of the Thousand Questions 3 series has undergone specialized upgrades in the areas of agent programming and tool invocation compared to the preview version. The officially released version of this model has reached the state-of-the-art level in the field and is better suited to meet the more complex requirements of intelligent agents.
Text generation
Deep thinking
DeepSeek R1 is now released: performance comparable to OpenAI o1, but it is open-source and the reasoning tokens are fully open. It has a parameter scale of 671 billion, with 37 billion parameters active during one inference.
Deep thinking
Text generation
Reasoning
Z-Image-Turbo is an efficient image generation model that topped the list of open-source image models in the Artificial Analysis evaluation. With only 6 billion parameters and 8 steps of inference, it can generate photo-realistic images comparable to large-scale commercial models. It also excels in Chinese-English text rendering, complex semantic understanding, and diverse topic generation.
Image generation
Specializing in providing model services for image translation, this system can translate images from 11 languages including Chinese, English, and Japanese into the desired language, accurately reproducing the layout and content information of the images. It supports custom functions such as term definition, sensitive word filtering, and product subject detection, offering flexible, accurate, and efficient image localization services.
Image generation
The Qianwan series of image editing Plus model has further optimized the inference performance and system stability based on the initial Edit model, significantly reducing the response time for image generation and editing; it supports returning multiple images in a single request, greatly enhancing the user experience.
Image generation
The Qwen-Image-2.0 full-powered model integrates image generation and image editing; it has a more professional text rendering capability with 1k token instruction support, a more delicate and realistic texture, a more detailed depiction of realistic scenes, and a stronger semantic following ability. The full-powered version possesses the strongest text rendering and realistic texture capabilities of the 2.0 series.
Image generation
The Max series of the Thousand Question Image Editing Model offers more stable and comprehensive editing capabilities: enhancing industrial design and geometric reasoning abilities; improving character consistency; reducing offset issues; integrating Lora capabilities, allowing for more functions of image editing. This version is a snapshot as of January 16, 2026.
Image generation
The Qwen-Image-2.0 series of accelerated models have achieved the integration of image generation and image editing; they possess a more professional ability to render text with 1k token instructions, a more delicate and realistic texture, a more detailed depiction of realistic scenes, and a stronger ability to follow semantics. The accelerated version effectively achieves the optimal balance between model effect and performance.
Image generation
The SOTA of the agent world, specially designed for Agent 2.0, extends the coding to the real world including the workspace, entertainment and personal assistant. Model highlights: Global SOTA open-source coding and agent model; Scores higher than Opus 4.6 in SWE-bench Pro and SWE-bench Verified; Global SOTA in Excel, search and research, and document summarization; The perfect main model for future workspaces; Lightning-fast: Optimizes thinking efficiency, 100+ TPS, achieving a speed 3 times faster than Opus; Ultimate cost-effectiveness, supporting always-online agents.
Deep thinking
Text generation
The Max series of the Thousand Questions image generation model has performed exceptionally well in various generation tasks. Compared to the Plus series, it significantly reduces the artificiality of generated images and enhances the authenticity of the images; it features more realistic human texture, finer natural textures, and more aesthetically pleasing text rendering.
Image generation
GLM-5.1 is a model designed by ZhishuAI for long-term tasks. It has a total of 744B parameters and supports 200K extremely long contexts. The maximum output is 128K tokens. It possesses strong logical reasoning, long text understanding and code generation capabilities, and strikes a balance between performance and inference efficiency. It performs exceptionally well in multi-task benchmarks and is suitable for scenarios such as intelligent interaction, enterprise applications, and development assistance.
Text Generation
Kimi-k2.5 is the most comprehensive model released by the Dark Side of the Moon to date. It features a native multimodal architecture design, and supports both visual and textual inputs, thinking and non-thinking modes, as well as dialogue and Agent tasks.
Deep thinking
Visual comprehension
Text generation