Three frontier-grade coding models. One fortnight. If you blinked in early June 2026, you missed an entire generation of open-weight AI shipping while you were refilling your coffee.
I build with these things every day — agents, CLI workflows, the unglamorous plumbing that turns a model into something that actually finishes work. So this isn’t a benchmark-scraping listicle. It’s the comparison I wish someone had handed me on June 14th, when all three of these models were suddenly real, downloadable, and fighting for a slot in my stack.
Here’s the short version for the impatient: MiniMax M3, GLM 5.2, and Kimi K2.7-Code are not the same tool with different logos. One is a multimodal long-context workhorse, one is a coding-plan bargain with a million-token window, and one is a tool-use specialist that quietly out-uses models costing five times as much. Pick wrong and you’ll either overpay or under-deliver. Let’s fix that.

Why June 2026 felt like the floor falling out
Step back for a second, because the timeline alone tells the story.
- June 1 — MiniMax ships M3, an open-weight multimodal model with a 1M-token context window and a brand-new sparse-attention architecture.
- June 12 — Moonshot AI drops Kimi K2.7-Code, a trillion-parameter coding specialist, on Hugging Face.
- June 12 — The US government orders Anthropic to suspend global access to its top-tier Fable 5 and Mythos 5 models, citing an export-control directive.
- June 13 — Zhipu AI (Z.ai) releases GLM 5.2 across every tier of its coding plan and promises MIT-licensed open weights within a week.
So in the space of about twelve days, two of the most capable open coding models on Earth launched, a third arrived, and the most powerful Western frontier models got pulled off the international table. The vacuum was filled before most teams noticed there was one.
That’s the real headline. Not “China caught up” — that framing is a year stale. The story now is that the open-weight frontier moves faster than any procurement cycle, any quarterly roadmap, any blog post. By the time a model has independent benchmarks, its successor is in training. If your tech strategy assumes you can standardise on one model for eighteen months, that strategy is already broken.
The flip side is genuinely good news for builders: you have leverage you didn’t have in 2024. Three vendors, all hungry, all undercutting each other on price, all shipping weights you can self-host. Competition like this is exactly what pushes quality up and cost down. The trick is knowing what you’re actually choosing between.
Meet the three contenders
Before the head-to-head, a quick, honest introduction to each — including the part most launch coverage skips: which numbers are independently verified and which are the vendor marking its own homework.
MiniMax M3 — the multimodal long-haul truck
MiniMax, the Shanghai lab, released M3 on June 1. Its signature feature isn’t a benchmark — it’s an architecture. M3 runs on MiniMax Sparse Attention (MSA), which swaps full attention for a system that only processes the relevant blocks of a long context. The practical payoff: roughly one-twentieth the per-token compute at 1M tokens compared to the previous generation, with prefill reported as 9× faster and decoding more than 15× faster.
That matters more than it sounds. Long-context models usually choke — slow, expensive, and increasingly forgetful the deeper you go. MSA is MiniMax’s bet that you can have a genuinely usable million-token window without your bill or your latency exploding.
M3 is also natively multimodal: text, image, and video in, text out. It’s the only model in this trio that reads a screenshot or a screen recording without a bolt-on. MiniMax reports a SWE-Bench Pro score of 59.0%, which it says edges past GPT-5.5 and Gemini 3.1 Pro while sitting below Claude Opus 4.8. On the OpenRouter listing, pricing came in around $0.30 per million input tokens and $1.20 per million output at launch (a promo rate; standard pricing is roughly double that), with cache reads near $0.06.
The asterisk: those benchmarks were run on MiniMax’s own infrastructure with its own agent scaffolding. Treat them as directional, not gospel, until neutral boards weigh in. And if you use the hosted API rather than self-hosting the weights, remember the provider operates under Chinese jurisdiction — a real consideration for anyone touching regulated or client data.
GLM 5.2 — the coding-plan bargain with a 1M window
Zhipu’s GLM 5.2 launched on June 13 and made two loud promises: a truly usable 1-million-token context window and MIT-licensed open weights landing within a week of release. Reports peg it as a large Mixture-of-Experts model — one widely cited figure is 744B total parameters with 40B active — built coding-first, with two thinking modes (High and Max, the latter recommended for the gnarly stuff).
What makes GLM 5.2 interesting for working developers isn’t a leaderboard. It’s the GLM Coding Plan. The entry tier sits around $18/month for roughly 400 prompts a week, scaling up through Pro (~2,000/week), Max (~8,000/week), and a seat-based Team tier — and GLM 5.2 is included on all of them at no premium over 5.1. That’s roughly a tenth of what the comparable premium Claude Code and Claude Max tiers cost. For a solo builder or a lean shop, that price-to-capability ratio is hard to argue with.
It also slots into your existing tools with almost no friction. Day-one support covers Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code. If your agent speaks an OpenAI- or Anthropic-shaped API and lets you set a custom endpoint, GLM 5.2 is a config swap — point the client at the Z.ai endpoint and set the model to glm-5.2.
The asterisk: at launch, Zhipu published no official benchmarks. No SWE-bench, no LiveCodeBench, nothing. A few early third-party readings have trickled out (one outlet clocked it at the top of a reasoning benchmark called BridgeBench and around 300 tokens/second), but verification is genuinely thin right now. If a model’s quality matters for production, you test it yourself before you trust the marketing.
Kimi K2.7-Code — the tool-use specialist that punches up
Moonshot AI’s K2.7-Code arrived June 12 and is the most narrowly focused of the three. It’s a 1-trillion-parameter Mixture-of-Experts model with 32B active per token (384 experts, 8 selected plus 1 shared, 61 layers, MLA attention), a 256K-token context window, and a 400M-parameter vision encoder for image and video. Weights are on Hugging Face under a Modified MIT license, shipped in native INT4.
The pitch is efficiency, not raw size. K2.7-Code uses about 30% fewer “thinking” tokens than its predecessor K2.6 while scoring higher on Moonshot’s coding benchmarks (+21.8% on Kimi Code Bench v2, +11.0% on Program Bench, +31.5% on a multi-language suite). Fewer reasoning tokens for better results is a direct line to a lower bill on token-metered workflows.
But the number that made me sit up is MCPMark Verified: K2.7-Code scores 81.1, beating Claude Opus 4.8’s 76.4 on real tool-use tasks across environments like GitHub, Postgres, Filesystem, and Playwright. For agentic work — where the model isn’t writing a function in isolation but orchestrating tools across many steps — that’s the metric that predicts whether your agent actually finishes the job. API pricing is around $0.95 input / $4.00 output per million tokens, with a Kimi Code CLI plan from $19/month.
The asterisks (two of them): First, every K2.7 benchmark published so far is one of Moonshot’s own proprietary suites — no independent SWE-bench Verified, LiveCodeBench, or Terminal-Bench numbers yet. Second, “thinking” is always on (with preserve_thinking across turns) and there’s no instant mode, so you pay the reasoning tax on every call whether you want it or not. And self-hosting is brutal: the comparable K2.6 quant weighs in around 340GB and wants 350GB+ of combined RAM and VRAM. For nearly everyone, that means renting the API, not owning the model.
The head-to-head, in one table
Here’s the comparison stripped to what actually drives a decision. Where a figure is vendor-reported, I’ve said so — because a comparison that hides its uncertainty is just marketing with a grid.
| GLM 5.2 | MiniMax M3 | Kimi K2.7-Code | |
|---|---|---|---|
| Lab | Zhipu AI (Z.ai), Beijing | MiniMax, Shanghai | Moonshot AI, Beijing |
| Released | June 13, 2026 | June 1, 2026 | June 12, 2026 |
| Context window | 1M tokens | 1M tokens | 256K tokens |
| Architecture | MoE (~744B/40B active, reported) | MoE + Sparse Attention (MSA) | MoE, 1T total / 32B active |
| Multimodal | No (coding-first) | Yes — text, image, video | Yes — image, video via vision encoder |
| License | MIT (open weights, ~1 wk after launch) | Open weights promised (~10 days) | Modified MIT (weights live) |
| Headline strength | 1M context + cheap coding plan | Long-context multimodal + speed | Tool use / MCP, token efficiency |
| Pricing signal | ~$18/mo plan (≈1/10 of Claude tiers) | ~$0.30/$1.20 per 1M (promo) | ~$0.95/$4.00 per 1M; $19/mo CLI |
| Benchmark status | None official at launch | Vendor-run (59% SWE-Bench Pro) | Vendor-run (81.1 MCPMark, beats Opus 4.8) |
| Biggest catch | Unproven, no independent scores | Hosted under Chinese jurisdiction | Thinking always on; 340GB+ to self-host |
One developer ranking that made the rounds after hands-on testing put the broader field roughly as Fable 5 ahead, then Kimi K2.7, then Opus 4.8 level with GLM 5.2, then GPT-5.5, then MiniMax M3 — but take that with a fistful of salt. It’s one tester’s order, on one set of tasks, in a week when half these models had no verified numbers at all. Your workload is the only leaderboard that counts.
So where does “vibe coding” fit in all this?
Worth pausing here, because the term gets thrown around loosely. Vibe coding is the workflow where you describe what you want in plain language and let the model write, run, and fix the code — you’re steering by intent and vibes rather than typing every line yourself. For a lot of people (myself included, on plenty of projects), it’s now the default way software gets built.
It’s not a fringe idea anymore. Zhipu literally titled the GLM-5 technical paper “from Vibe Coding to Agentic Engineering” — the labs themselves see this as the trajectory: loose, conversational prototyping maturing into structured, autonomous engineering that can run for hours unattended.
But here’s the thing nobody tells you: the best model for vibe coding depends entirely on what stage of the vibe you’re in.
- Early, exploratory, “just make me a thing” prototyping — where you’re iterating fast, throwing screenshots at it, changing your mind every two minutes — wants a cheap, fast, forgiving model with multimodal input. MiniMax M3 shines here. The low token cost means you can iterate guilt-free, and feeding it a screenshot of a design instead of describing it is a genuine workflow unlock.
- Big-codebase vibe coding — “read my entire repo and refactor the auth layer” — is where context window is king. GLM 5.2’s million-token window (and that bargain coding plan) lets you keep the whole project in view without constant re-explaining, which is the single biggest source of friction in agentic work.
- The serious end — long-horizon, tool-heavy agentic builds that touch your database, your file system, and your Git history across hundreds of steps — wants the model that won’t lose the plot mid-task. Kimi K2.7-Code’s tool-use scores are built for exactly this. When the job is less “write a function” and more “drive my whole toolchain to ship a feature,” reliable tool calls beat a prettier code sample every time.
The uncomfortable truth: vibe coding amplifies whatever model you give it. A great model with good context turns vague intent into working software. A weak one turns it into a confident pile of bugs you didn’t write and don’t understand. Choose the model for the stage, not the hype.

Which one should you actually use?
Skipping the diplomatic hedging. Here’s how I’d route the decision.
Choose MiniMax M3 if your work is long-context and multimodal — reviewing large codebases, reasoning across files, or any workflow where a screenshot, diagram, or video is part of the input. It’s also the one I’d reach for when cost-per-iteration is the binding constraint, because the sparse-attention architecture keeps long-context work fast and cheap. Just self-host the weights for anything sensitive.
Choose GLM 5.2 if you want the most capability per euro and you live in a big codebase. The coding plan at roughly a tenth of premium Claude pricing, plus a real 1M-token window and frictionless drop-in support for Claude Code and friends, makes it the obvious default for solo builders and lean teams. The catch is faith — you’re trusting it before the independent benchmarks land, so prototype on it before you bet a deadline on it.
Choose Kimi K2.7-Code if you’re building autonomous agents that orchestrate tools — MCP servers, databases, browsers, file systems — over long sessions. Its tool-use performance is the standout result in this entire comparison, and the token-efficiency gains directly cut your running costs. Pay the API rate rather than fighting the 340GB self-hosting requirement, and accept that thinking is always on.
Or — and this is what I actually do — use all three. Route by task. Cheap multimodal iteration to M3, big-context refactors to GLM 5.2, tool-heavy agent runs to Kimi. Tools like Kilo Code, OpenCode, and Claude Code make swapping providers a config change, not a migration. In a market moving this fast, portability beats loyalty. Build your workflow so you can switch models in an afternoon, and you turn the chaos of June 2026 from a threat into an advantage.
A note on trusting any of these numbers
Quick gut-check, because it’ll save you grief. Almost every benchmark in this article is vendor-reported. GLM 5.2 launched with none at all. Even the well-regarded public suites have contamination problems — SWE-Bench Pro exists partly because older benchmarks leaked into training data and inflated scores.
None of that makes these models bad. It means the only benchmark that matters is your own repo, your own tasks, your own definition of “done.” Set up a small, repeatable eval — three or four real tickets from your backlog — and run each model against it before you commit. An afternoon of testing will tell you more than every launch-day blog post combined, this one included.
Frequently asked questions
Is GLM 5.2 actually free? The weights are MIT-licensed and free to download and self-host once released. The hosted GLM Coding Plan is paid, starting around $18/month — but that’s roughly a tenth of comparable premium Claude tiers, and GLM 5.2 is included on every plan tier at no extra cost.
Which is best for vibe coding specifically? There’s no single winner. MiniMax M3 is best for fast, cheap, multimodal prototyping; GLM 5.2 for vibe coding inside large codebases thanks to its 1M-token window; Kimi K2.7-Code for serious, tool-heavy agentic builds. Match the model to the stage you’re in.
Can I use these with Claude Code? Yes. GLM 5.2 supports Claude Code on day one — point the client at the Z.ai endpoint and set the model to glm-5.2. Kimi and MiniMax both expose OpenAI-/Anthropic-compatible APIs, so most agentic coding tools accept them as a custom endpoint with a key swap.
Which has the biggest context window? GLM 5.2 and MiniMax M3 both offer 1 million tokens. Kimi K2.7-Code offers 256K — smaller, but still large enough for most single-repo work.
Are the benchmark scores reliable? Treat them as directional. As of mid-June 2026, most published scores are vendor-run on the labs’ own infrastructure, and GLM 5.2 launched with no official benchmarks. Run your own evaluation on real tasks before committing to production.
Can I run any of these on a normal machine? Not the full weights. Kimi K2.7-Code needs roughly 350GB+ of combined RAM and VRAM to self-host, and the others are large MoE models too. For most people the hosted API or a managed provider is the practical route; self-hosting is for teams with serious GPU budgets or strict data-residency requirements.
The bottom line
Two weeks in June 2026 gave developers three serious open-weight coding models, a frontier that’s now genuinely competitive on price, and a clear signal that the pace isn’t slowing down. MiniMax M3 is your multimodal, long-context, cost-efficient prototyping engine. GLM 5.2 is the best capability-per-euro bet for big-codebase work, if you can trust it before the benchmarks land. Kimi K2.7-Code is the tool-use specialist for real agentic builds.
The smartest move isn’t picking a favourite — it’s building a workflow flexible enough to use whichever one fits the task in front of you, and to swap in the next model when it inevitably ships next month. In a market this fast, adaptability is the strategy.
If you’re trying to figure out how this fits into your own stack — whether that’s wiring these models into a coding agent, building AI automation into your business, or just getting a no-nonsense second opinion before you commit — that’s exactly the kind of work I do at Graham Miranda. Get in touch and let’s build something that lasts longer than the news cycle.
Have a model you think belongs in this comparison, or a workload you can’t decide how to route? Drop it in the comments — I read them, and I update these guides as the field moves (which, lately, is constantly).









