Kimi K2.5 Fast

by Moonshot

Kimi K2.5 Fast is a speed-oriented variant of the full K2.5 model, built to deliver quicker responses while keeping its native multimodal and agentic capabilities. It is based on the same 1 trillion parameter MoE architecture with 32 billion active parameters and retains the vision-language integration and reasoning of the base model. The variant supports text and image inputs alongside the agent swarm coordination paradigm, letting developers balance response speed against reasoning depth. It supports a 256K token context window and tool calling for autonomous agent workflows. It suits applications that need responsive visual reasoning, coding assistance, and tool-augmented tasks. Like the rest of the K2.5 line it uses native INT4 quantization from quantization-aware training, which keeps quality close to full precision while lowering memory use.