Moonshot AI Releases Kimi K2.7-Code: Open-Source Coding Model with Better Token Efficiency
Moonshot AI, the Chinese startup behind the Kimi assistant, has released Kimi K2.7-Code — an open-source coding-focused agentic model that improves on its predecessor with significantly better token efficiency and stronger performance on real-world software engineering tasks. The model is available on Hugging Face and supports both thinking mode and multi-step tool calling.
Built for Real-World Coding
Kimi K2.7-Code is a coding-specialized iteration built upon Kimi K2.6. It adopts the same native int4 quantization and is recommended for deployment on inference engines like vLLM and SGLang. The model forces thinking mode and preserves reasoning chains, making it well-suited for complex software engineering workflows that require multi-step planning and execution.
One of the headline improvements is a roughly 30% reduction in thinking-token usage compared to Kimi K2.6, achieved without sacrificing output quality. This means the model uses fewer computational resources per task while maintaining or improving accuracy — a meaningful step forward for cost-conscious AI deployments.
"Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency." — Moonshot AI, via Hugging Face model card
Benchmarking Against GPT-5.5 and Claude Opus 4.8
Moonshot tested K2.7-Code against GPT-5.5 (running in Codex with xhigh mode) and Anthropic's Claude Opus 4.8 (running in Claude Code with xhigh mode) across multiple benchmarks. The model uses a 262,144-token context length and was evaluated on Kimi Code Bench V2, a proprietary benchmark spanning 10+ programming languages and realistic production scenarios including backend services, infrastructure, systems programming, security, and ML engineering.
The model was also evaluated on third-party benchmarks like Binary Wolf (which tests code generation from compiled binaries), MLS-Bench-Lite (which measures ability to invent generalizable ML methods), and MCP-Atlas (which evaluates tool-use through Model Context Protocol). Moonshot's in-house Kimi Claw 24/7 Bench measures long-horizon agentic performance across 17 professional scenarios spanning software engineering, ML research, recruiting, trading, and marketing.
Interleaved Thinking and Tool Use
A standout feature of K2.7-Code is its support for interleaved thinking and multi-step tool calling. The model can reason through a problem, take action by calling external tools or writing code, observe results, and iteratively refine its approach — all within a single session. This makes it particularly well-suited for autonomous coding agents that need to navigate large codebases, run tests, and fix issues without human intervention.
The model is available under an open license and can be deployed via Docker Model Runner, vLLM, or SGLang, with an OpenAI-compatible API. This release continues a trend of increasingly capable open-weight coding models that challenge proprietary offerings from OpenAI and Anthropic, giving developers more choices for building AI-assisted development pipelines.
Source: Hugging Face — Moonshot AI / Kimi K2.7-Code model card (June 11, 2026) | Hacker News (292 points)