
The AI Agent War, Part I – Who’s Winning the Stack
Sep 16, 20254 min readTL;DR
AI agents have shifted from prototypes to modular enterprise systems. Infrastructure has fractured into interoperable layers, while coding agents battle for market dominance. The winners will be defined not by scale alone, but by interoperability, IP strength, and enterprise readiness.
How to read the benchmarks CB Insights uses two metrics to compare startups and incumbents in this space:
- Mosaic Score (0–1000): a composite index of traction, financial health, and market potential. Scores above 900 place a company in the global top 1%, signaling durability.
- Commercial Maturity (1–5): a stage rating of how far companies have progressed from research to market presence.
- 1/5 = research/early pilots
- 2/5 = limited adoption
- 3/5 = expanding into multiple markets
- 4/5 = strong adoption, scaling
- 5/5 = entrenched market presence
These benchmarks give executives a common lens to compare early challengers with entrenched leaders — critical for vendor selection and M&A.
From Monoliths to Modular Infrastructure
Five years ago, adopting AI meant picking one vendor and locking in. In 2025, that model has collapsed. The AI ecosystem has split into five interoperable layers, giving enterprises unprecedented flexibility — but also new integration and governance risks.
- Foundation models: Mistral AI ($6.2B valuation, Mosaic Score 909/1000, 99th percentile, Headcount +11 in the past year) has become Europe’s sovereignty bet against US incumbents. Its open-source model allows lower costs, faster adoption, and stronger privacy controls for enterprises wary of lock-in.
- Memory systems: Letta’s MemGPT (Headcount +150 in the past year) solves the long-term state problem, enabling agents to “remember” across sessions — critical for enterprise-scale reliability.
- Orchestration: CrewAI (60% Fortune 500 penetration) and n8n ($40M ARR, Headcount +297 in the past year, open-source scale) are the glue, making multi-agent workflows viable in production.
- Retrieval & data pipelines: LlamaIndex (Headcount +97 in the past year) integrates 300+ enterprise formats, pulling siloed data into agent workflows.
- Guardrails: Skyflow (Headcount +27 in the past year) secures personally identifiable information (PII) with polymorphic encryption and API-level access controls. It extends this protection into AI guardrails during data collection, model training, and agent execution — making it highly relevant for compliance-heavy enterprises. Vijil (Headcount +39 in the past year) delivers both proactive and reactive security: its Evaluate platform stress-tests agent safety pre-deployment, while Dome provides real-time defense once agents are in production.
Why it matters: The shift mirrors past transitions (mainframes → PCs, telcos → cloud). Enterprises can now build best-of-breed stacks, but each added module introduces new points of failure. Flexibility is up — so are integration, compliance, and vendor-selection risks.
Innovation Signals Executives Should Watch
The infrastructure race is no longer about “bigger models.” Differentiation now lies in how companies tackle bottlenecks, costs, and regulatory hurdles:
- GPU scarcity hedges: Baseten aggregates GPUs across clouds, protecting enterprises from supply shocks.
- Agent OS vision: /dev/agents (Headcount +40 in the past year) is creating a platform layer, akin to “Android for AI agents,” with potential ecosystem lock-in.
- Autonomous resilience: H’s Runner H (Maturity not yet rated, Headcount +164 in the past year) is a self-healing web automation agent that adapts to UI changes automatically — reducing downtime and maintenance costs.
- Sector specialization: Arcee builds small models tailored for finance, healthcare, and law, unlocking compliance-heavy markets.
- Reliable reasoning: Imbue’s Sculptor emphasizes safe sandboxed testing and maintainable reasoning, appealing to enterprises prioritizing stability.
- Cost-efficient compute: StackBlitz’s WebContainers run Node.js natively in browsers, slashing cloud infrastructure costs.
Executive takeaway: Innovation is shifting from raw model power to resilience, cost-efficiency, and regulatory fit. These capabilities will decide who builds durable moats versus who gets commoditized.
Coding Agents: Leaders, Challengers, and Hype
If infrastructure is the battlefield, coding copilots are the frontline troops — the most visible agents in daily workflows. Leadership is emerging, but the market remains unsettled.
Leaders
- GitHub Copilot dominates with 35% share, 15+ million users, and $2B revenue. Its commercial maturity is listed as N/A — not because it is immature, but because the 1–5 scale is designed for startups, while Copilot sits inside Microsoft’s entrenched portfolio. In practice, this “N/A” signals dominance: it has already achieved the equivalent of level 5 market presence. Copilot’s edge lies in integration, becoming the “Microsoft Office of coding.”
- Anthropic Claude (Mosaic 955/1000, Headcount +16 in the past year) is projected to hit $35B revenue by 2027. Its defense contracts, including with the NNSA, signal credibility in mission-critical contexts.
Challengers
- Anysphere Cursor ($500M ARR in 10 months, Headcount +168 in the past year) owns the IDE experience. Its enterprise pivot suggests a direct challenge to Copilot.
- Factory Droids (Headcount +116 in the past year) automates entire dev cycles, not just snippets — a leap from assistance to autonomy.
- Poolside (Mosaic 803/1000, Commercial Maturity 3/5, Headcount +50 in the past year) jumped from $250K to $30M in revenue in one year (120x). Rapid adoption, but volatility mirrors opportunity.
Speculative Bets
- Cognition Devin (Mosaic 939/1000, Headcount +51 in the past year) is marketed as an “AI engineer,” but with only 13.8% task success against a $9.8B valuation, it highlights the gap between hype and delivery.
- Emergent (Headcount –21 in the past year, maturity not yet rated) and Delty (Headcount –3 in the past year, maturity not yet rated) promote “AI staff engineers,” reframing agents as digital colleagues — early but notable in shaping adoption narratives.
Why Defensibility Matters
High adoption does not guarantee long-term survival in the coding agent market. Many tools grow quickly, but without unique technology, they become commoditized or acquired.
What creates durability is proprietary intellectual property (IP) — assets competitors cannot easily replicate:
- StackBlitz’s WebContainers run Node.js directly in the browser, cutting infrastructure costs and creating a platform others must build on.
- Supermaven’s Babble offers coding models specialized for developer workflows, going deeper than general-purpose LLMs.
- Mintlify’s llms-full.txt establishes a documentation format optimized for AI consumption, which could become an industry standard.
These are not just features — they are moats. They make the company harder to displace, more attractive as an acquisition target, and more likely to outlast fast-growing but copyable competitors.
Executive insight:
When evaluating coding agents, don’t be distracted by user counts or flashy demos. Ask: What IP moat protects this company? The market leaders will be those combining enterprise adoption with unique, defensible technology. Everyone else risks consolidation.
Bridge
But this battle is not confined to IDEs or developer workflows. Increasingly, governments are picking winners, turning AI agents into instruments of sovereignty. That changes the entire playing field.
Sources
This analysis draws on CB Insights’ AI Agent Tech Stack Report and Coding AI Agent & Copilot Report.
Data points — including Mosaic Scores, Commercial Maturity ratings, headcount growth, revenue, and adoption metrics — are taken directly from the company scouting reports (see pp. 3–15, 21, 30–31).
Contact [email protected] if you have trouble accessing.




