Visual explainer

Intercom’s “2× in 12 months” goal didn’t just land. It overshot.

Darragh Curran’s post is not really a victory lap about one model. It’s a case study in what happens when a large R&D org treats AI as a factory redesign problem instead of a tool adoption problem.

Source: ideas.fin.ai / Darragh Curran
Company size referenced: 1,305 people
R&D scale referenced: ~500 builders/operators
Core claim: 3× over 16 months
1 — Thesis

The real story: Intercom picked a blunt metric, then used it to force organizational change.

Nine months ago, the team publicly committed to doubling productivity in 12 months. Instead of waiting for AI to “naturally” improve work, they deliberately redesigned incentives, workflows, review loops, and internal tooling around agentic coding.

The headline claim: Intercom says it already hit the 2× target early, and that over a 16‑month window the org has effectively 3×’d PR throughput per head.

The posture matters more than the metric: this wasn’t framed as “everyone should use AI more.” It was framed as “our executional factory must be rebuilt for a world where AI massively expands capacity.”

The implicit argument: large incumbents are not safe. If they do not aggressively modernize around AI leverage, startups with fewer people and higher velocity will eat them alive.

Why the post matters

Most AI productivity stories are vague. This one is unusually concrete: it links agent usage to defect backlog, shipping speed, downtime, PR review automation, plugin ecosystems, and company-wide adoption.

That makes it more useful — and also more debatable.

“AI unlocks an abundance of capacity, dramatically amplifying our ability to execute against our product vision.”

Core framing of the article
2 — Measurement

Why they measured merged PRs per total R&D headcount

Curran knows this metric invites criticism. He uses it anyway because it creates pressure on the entire delivery system, not just individual engineers.

Step 1

Pick a throughput proxy

Merged PRs are treated as the “units” coming off the production line.

Step 2

Divide by everyone

Not just engineers: product design, leadership, ICs — the whole R&D cost base.

Step 3

Expose bottlenecks

Review queues, workflow drag, org structure, and prioritization debt become impossible to hide.

Step 4

Check side effects

Velocity alone is not enough. They also track quality, downtime, costs, and customer-facing outcomes.

Why this metric is useful

  • Simple enough to rally around.
  • Hard enough to force system-wide improvement.
  • Connects output to organizational cost, not just coding activity.
  • Pushes redesign of roles, review, and tooling — not just prompting habits.

Where this metric is weak

  • PRs are not all equal in scope or value.
  • Can be gamed if not paired with quality and stability checks.
  • Risks glorifying output if leadership gets sloppy.
  • Needs surrounding context to mean anything.
3 — The scoreboard

The article’s core numbers, in one place

This is the compact version of the claim set. The post’s punch comes from stacking velocity, quality, cost, and automation metrics together instead of showing only one sexy chart.

Long-run throughput
Merged PRs per head over ~16 months.
Defect backlog
-54%
Critical and high defects closed, medium nearing zero.
Idea → shipped
-39%
Median time almost cut in half.
Downtime from code changes
-35%
Even while deployment throughput doubled.
Agent-driven PRs
93.6%
Claude Code is the primary coding system.
AI-approved PRs
19.2%
Near-term goal: over 50%.
Area Claim Interpretation Signal
ProductivityThroughput proxy 2× goal hit early; 3× over 16 months Intercom is saying the org-level factory got materially more productive, not just individuals feeling faster. Strong
QualityStatic-analysis trend Net-positive 5-week streak They acknowledge an early quality dip from “slop cannon” behavior, then argue harnesses/models corrected it. Promising
StabilityProduction reliability Downtime from breaking changes down 35% Key counter to the obvious objection that more velocity must mean more outages. Strong
CostUnit economics AI spend up, cost per PR down by ~50% The post reframes token cost as noise relative to payroll-dominated fully loaded delivery cost. Strategic
AdoptionBehavior change 93.6% agent-driven PRs This is not “AI-assisted coding.” It is an explicit move to agent-first execution. Very high
Review automationApproval bottleneck 19.2% AI-approved PRs; 497 fully autonomous PRs in 4 weeks The most radical part of the post: not just AI writing code, but AI reviewing and shipping some of it. High-risk / high-leverage
4 — The nine impacts

What changed inside the R&D “factory”

The post’s backbone is a nine-part impact list. Read together, it shows a company trying to turn AI from individual productivity gain into organizational throughput infrastructure.

Impact 1
Defect backlog got attacked like operational debt, not triage theater.
-54%

A core product defect backlog shrank by 54%. Critical and high-severity defects were closed; medium-severity defects were nearing closure too.

The deeper shift is the target state: almost no backlog, and eventually an SLO-like posture where reported issues get fixed inside 24 hours instead of ping-ponging around prioritization queues.

customer visibleops simplification
Impact 2
Intercom says it is shipping more changes and shipping them meaningfully faster.
2×+ / -39%

Product changes more than doubled in some monthly comparisons, and median time from idea to shipped fell by 39%.

The company’s pitch is simple: speed is not vanity here. More shipping velocity means more customer value, faster feedback, and more ambition in roadmap scope.

roadmap leveragetime-to-value
Impact 3
Code quality dipped first, then started recovering.
from decline → positive streak

This is one of the better parts of the article because it doesn’t pretend agentic coding was instantly clean. Their internal structural-quality metric worsened as AI wrote more code.

Curran’s claim is that this was temporary and fixable as models and harnesses improved. The recent positive streak is presented as evidence that average quality can trend upward again.

honest caveatquality harnesses
Impact 4
More throughput did not mean more fire drills.
-35% downtime

Downtime attributable to breaking code changes fell by 35%, even while deployments doubled.

The implied mechanism: if AI handles more of the mechanical work, humans spend more energy on architecture and system correctness instead of rote implementation.

reliabilitycounterintuitive win
Impact 5
AI costs exploded — and they think that’s exactly what should happen.
spend ↑ / cost per PR ↓

Intercom is explicit: they are not currently optimizing AI spend. They want leverage first.

The key accounting lens is unit cost per PR. Since payroll dominates fully loaded cost, higher token bills can still be massively positive if PRs per head climb fast enough.

ROI framinganti-token-panic
Impact 6
Claude Code became the default mode of technical work.
93.6%

Within weeks, the org pushed past 80% agent-driven PRs and then aimed for 95%. They’re hovering just below that.

The phrase to notice is “all technical work is becoming agent-first.” That’s not tooling preference. That’s doctrine.

behavior changeagent-first
Impact 7
They are now trying to automate the review bottleneck too.
19.2% AI-approved

As PR volume rose, human review became the obvious choke point. Intercom responded by auto-approving low-risk changes first — cleanups, small fixes, focused improvements.

Auto-approved PRs merged in a median of 14.6 minutes versus 75.8 minutes org-wide. That is where the post shifts from “AI helps engineers” to “AI redesigns organizational flow control.”

queue compressionhighest controversy
Impact 8
Shared skills became a company-scale force multiplier.
153 contributors / 267 skills

The “team-2x” plugin marketplace is maybe the quiet killer feature in the story. When one person builds the best skill for a repeated task, everyone inherits the upgrade.

That makes AI leverage cumulative instead of individual. The org improves not only because people use agents, but because their agent workflows are being productized internally.

shared platformcompounding knowledge
Impact 9
The biggest constraint is no longer model access. It’s human adaptation.
top 5% produce 6× median

The strongest users are wildly ahead of the median. The post says elite users often spend more than $1k/month on tokens, but spending alone doesn’t guarantee results.

The bottleneck is uneven capability: prompt quality, workflow sophistication, intensity of use, and the willingness to actually change how work gets done.

talent varianceenablement problem
5 — Operating model

What Intercom seems to believe about winning with agents

Under the charts, there’s a management philosophy. The org is treating AI leverage as a systems design problem with four core moves.

1. Standardize around one primary agentic stack

  • They “went all-in” on Claude Code instead of spreading effort across many coding systems.
  • That reduces fragmentation and makes training, plugin distribution, and telemetry simpler.

2. Build org-level infrastructure on top of it

  • Private plugin marketplace.
  • Auto-updating skills.
  • Dedicated team-2x platform effort.

3. Push AI beyond implementation into review and release

  • Agent-written PRs.
  • AI approvals for low-risk changes.
  • Some fully autonomous ship-to-prod loops.

4. Instrument people, not just models

  • Track usage intensity, prompt quality, efficiency, and output.
  • Treat uneven adoption as an operational problem to solve.
6 — Uneven adoption

The post’s most believable point: the gains are real, but very unevenly distributed.

This is where the article becomes less marketing and more field report. Intercom is not claiming everyone became 3× better. It is claiming that a subset of people already operate at a radically different ceiling — and that the company now has to drag the rest of the org upward.

The top 5% of contributors produce more than 6× the output of the median engineer. That’s not a small spread. It suggests a new kind of organizational inequality: agent literacy inequality.

Curran notes that heavy spend correlates with higher throughput, but it does not guarantee it. That matters. The scarce resource is not just model access. It is workflow sophistication: knowing how to scope tasks, manage long-running agent loops, evaluate results, and codify repeatable skills.

In other words, “buy more tokens” is not the lesson. Build better operators is the lesson.

Important caveat

If the elite performers keep compounding while the median barely moves, the organization can still improve overall while leaving a large chunk of the team semantically behind.

That creates management pressure: training, telemetry, internal exemplars, and stronger defaults become mandatory.

7 — Beyond R&D

The diffusion story may be even bigger than the engineering story.

The article ends by zooming out: AI usage is no longer framed as a specialist behavior. It is becoming a company-wide way of working.

Peak Claude Code users
~1,100
Across a company of 1,305 people.
R&D headcount cited
473
Meaning hundreds outside R&D crossed the CLI/adoption barrier.
Interpretation
Company-wide
Finance, recruiting, legal, support, sales all cited.

What non-engineering teams are doing

  • Generating reports in minutes instead of hours.
  • Querying internal data with natural language.
  • Building mini apps and interactive reports.
  • Creating specialized workflows around company data.

Why that matters strategically

  • The “2×” program stops being an engineering transformation and becomes an operating model transformation.
  • The command line becomes less of a specialist boundary.
  • Internal AI products start behaving like real software platforms, not experiments.
8 — Bottom line

What this post is really trying to prove

Intercom wants readers to stop asking whether AI helps individuals and start asking how to redesign an organization around agents, shared skills, and automated flow.

The strongest takeaway: meaningful AI leverage at company scale does not come from vague encouragement. It comes from hard targets, centralized platform work, workflow standardization, telemetry, and a willingness to automate politically sensitive bottlenecks like code review.

The strongest skepticism: some of the bolder claims — especially around AI approval and fully autonomous shipping — depend heavily on the guardrails being much better than outsiders can currently verify.

The sharpest summary: Intercom is arguing that the future belongs to companies that treat agentic work as an industrial system. Not a feature. Not a hackathon trick. A system.

TL;DR

They didn’t just buy a smarter coding assistant.

They rebuilt the throughput factory around AI, then measured whether the machine actually moved faster, cheaper, and safer.