Qwen3.5 2B

by Alibaba

Qwen3.5 2B is a lightweight 2 billion parameter dense model designed for efficient deployment while keeping multimodal vision-language capability, built on a hybrid architecture that pairs Gated DeltaNet with attention layers. It supports a 262,144 token native context window and runs in non-thinking mode by default for faster responses. Reported results include MMLU-Pro 55.3 and MMMU 64.2, with the model handling text, image, and video inputs across more than 200 languages. Thinking mode can be enabled when a task calls for step-by-step reasoning. Its small footprint makes it well suited to prototyping, task-specific fine-tuning, and resource-constrained scenarios where a compact multimodal model is preferred.

Key info

Input
Output
Features
Context window
262K
Max output
262K
Input price
$0.02 /1M
Output price
$0.10 /1M
  • US residency available
  • Zero data retention via Enterprise
  • No training by default

Available routes

Qwen3.5 2B runs on 1 route through the Opper gateway. Compare residency, ZDR, and training posture at a glance β€” full data-handling detail per route below.

ProviderRegionZero data retentionTrainingInputOutput
USEnterpriseNo$0.02$0.10

Training posture across routes: No training on prompts by default.

Data handling per route

Each route hosting Qwen3.5 2B has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

DeepInfra β€” United StatesπŸ‡ΊπŸ‡Έ

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; unknown.

Zero data retention
Available via Opper Enterprise contract.
Training
No training on customer data.
Logging
Limited debug logs
Third-party access
None disclosed
GDPR DPA
No DPA
Transfer mechanism
unknown

Benchmarks

Independent benchmark scores β€” composite indices for reasoning, coding, and math, plus individual eval scores where available.

Global rank#320 of 534 LLMs
TierEfficient
Output speed28 tok/s
First token0.44s
Intelligence Index10.2
Coding Index19.7
Reasoning & knowledge
GPQA Diamond
46%
Humanity's Last Exam
2%
Long-context reasoning
24%
Coding
SciCode
3%
Agentic & tool use
Terminal-Bench Hard
4%
τ²-Bench Telecom
69%
Math & instruction following
IFBench
31%

Get started

Call Qwen3.5 2B through the Opper gateway with one API key. Let your coding agent set it up, or call it directly β€” Opper is drop-in compatible with the OpenAI, Anthropic, and Google AI SDKs.

Set it up with your agent

Copy this and paste it into your coding agent β€” Claude Code, Cursor, Codex, and more β€” and it'll wire up Opper for you.

Or call it directly

import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPPER_API_KEY,
baseURL: "https://api.opper.click/v3/compat",
});
const completion = await client.chat.completions.create({
model: "deepinfra/Qwen/Qwen3.5-2B",
messages: [{ role: "user", content: "Hello" }],
});
console.log(completion.choices[0].message.content);

Compare Qwen3.5 2B with…

Side-by-side on privacy, EU hosting, pricing, and benchmarks.

Other models from Alibaba

Start building with 300+ models

One API key. Every major provider. Up and running in minutes.

Get startedView Documentation
Qwen3.5 2B by Alibaba β€” pricing, benchmarks | Opper AI