MiniMax M2.5 makes frontier AI cheap enough to run agents nonstop — and the cost savings are staggering.
MiniMax’s M2.5 arrives as a reckoning for cloud AI costs. It promises frontier performance at a fraction of incumbent prices. The Shanghai startup claims MoE tricks let a 230‑billion‑parameter model activate only 10 billion at runtime. That design, plus the Forge reinforcement‑learning pipeline and CISPO stability method, produces fast, agentic behavior for file creation, coding, and enterprise work. MiniMax reports 30% of internal tasks and 80% of new code now come from M2.5. For deeper context on coding agents and competition, see my earlier roundup of AI coding agents: AI Coding Agent Showdown. This shift could change product design and budgets across startups and enterprises.
As someone balancing research and product in wireless and generative AI, I’ve sat through many demos promising miracles. At Ericsson I measure real costs — spectrum, compute, and people time. When MiniMax told me their M2.5 handles 30% of internal tasks and generates 80% of new code, I flashed back to a team meeting where a developer moaned about expensive API bills. I laughed, then recalculated our budget on the spot. If AI workers become cheap, my finance team will both cheer and start assigning overtime to the bots.
MiniMax M2.5
MiniMax M2.5 is a strategic pivot from expensive, always-on large models toward cheap, high-volume AI labor. The model uses a Mixture of Experts (MoE) architecture: 230 billion parameters on paper, but only about 10 billion activated per token. That sparse activation is the core efficiency story. MiniMax trained M2.5 with a proprietary Forge reinforcement-learning pipeline and stabilized it with a CISPO (Clipping Importance Sampling Policy Optimization) approach to avoid destructive policy updates. The company says training took two months and focused on thousands of simulated workspaces.
Performance that punches above its size
Benchmarks show M2.5 punching near the top. MiniMax reports SWE-Bench Verified at 80.2%, BrowseComp at 76.3%, Multi-SWE-Bench at 51.3%, and BFCL (Tool Calling) at 76.8%. Those numbers place it close to Anthropic’s Claude Opus 4.6 in key agentic tasks. MiniMax also claims real-world productivity impacts: 30% of tasks at MiniMax HQ are completed by M2.5 and 80% of newly committed code is generated by it. VentureBeat covered the release and pricing details in depth at the source: https://venturebeat.com/technology/minimaxs-new-open-m2-5-and-m2-5-lightning-near-state-of-the-art-while.
Speed, cost, and agent economics
Two variants target production use. M2.5-Lightning runs ~100 tokens/sec and costs $0.30 per 1M input tokens and $2.40 per 1M output tokens. Standard M2.5 runs ~50 tokens/sec at $0.15 per 1M input and $1.20 per 1M output. In practical terms MiniMax says you can run four continuous agents for a year for roughly $10,000 — about 1/10th to 1/20th the cost of models like Claude Opus 4.6. On the ThursdAI podcast a host noted M2.5’s speed reduces token use per task, with example task costs of ~$0.15 versus ~$3.00 for Claude Opus.
Implications for developers and enterprises
When intelligence drops from costly consultant rates to near-operational wages, software design changes. Teams will favor persistent agents that run autonomously for hours: coding, summarizing, researching, and generating formatted documents like Word, Excel, and PowerPoint. The MiniMax M2.5 story is not just model quality; it’s the unit economics of running agents at scale. If the weights and license terms truly become open, the ecosystem effects will accelerate even faster.
MiniMax M2.5 Business Idea
Product: An enterprise SaaS called AgentMill—an AI workforce orchestration platform that spins up, supervises, and bills task-specific agents powered by MiniMax M2.5. Each agent is templated for common enterprise workflows: legal brief drafting, financial report generation, automated QA and code refactoring, and formatted file creation (Word/Excel/PPT). The platform integrates with Git, Google Workspace, Microsoft 365, and common ticketing systems.
Target market: Mid-market to large enterprises in finance, legal, consulting, and product teams who need continuous, auditable automation without heavy compute spend.
Revenue model: Subscription plus usage fees. Tiered pricing: base orchestration fee ($5k–$25k/month) plus per-agent runtime credits priced competitively against MiniMax API costs. Add-ons: compliance auditing, custom agent training, and priority SLAs.
Why now: MiniMax M2.5 cuts agent runtime costs by up to 95% versus leading proprietary models, making continuous agents economically viable. Early adopters can automate high-value workflows while preserving audit trails and compliance—an attractive ROI for executives focused on productivity and cost reduction.
The Cheap-Agent Tipping Point
MiniMax M2.5 signals a shift from AI as a costly occasional tool to AI as everyday labor. When agents become affordable, product roadmaps change: features become autonomous services, not user-triggered helpers. Teams will rewire workflows around persistent AI colleagues that draft, test, and file work continuously. How will your organization redesign jobs and teams when AI labor costs drop by an order of magnitude?
FAQ
What makes MiniMax M2.5 cheaper than other models?
MiniMax uses a Mixture of Experts (MoE) design: 230B parameters on paper but roughly 10B active per token. Combined with Forge RL training and CISPO stability, this yields high performance with lower compute and token costs.
How do M2.5 pricing and speed compare to Claude Opus 4.6?
MiniMax lists Standard M2.5 at $0.15 per 1M input / $1.20 per 1M output and Lightning at $0.30/$2.40, with 50–100 tokens/sec. Anthropic’s Claude Opus 4.6 pricing can be ~1/10th–1/20th higher per task in MiniMax’s examples.
Is M2.5 fully open source and safe for enterprises?
MiniMax says M2.5 is “open,” but weights, code, and license details are not fully published yet. Enterprises should verify license terms, audit logs, and compliance features before large-scale deployment.
