S19 A forensic teardown of the 82% viral claim

The rework number went viral.
The baseline didn't.

A forensic teardown of the viral claim that 82% of AI tokens go to rework. Eight independent datasets, one rebuilt dollar. The direction is right. The number is marketing. The baseline is from 1978.

Scroll through eight interactive charts built from Faros, CodeRabbit, DORA, METR, Stack Overflow, GitClear, SWE-Bench, and McKinsey data. Each chart stress-tests one piece of the Entelligence screenshot. Toggle between the vendor's framing and what the independent telemetry actually shows.

Filed

May 27, 2026

Engine

Vanilla JS · SVG · Static data from 8 independent sources

Coverage

8 charts · 8 sources · 2020 to 2026

Verdict

Directionally right, numerically dubious

§ I The screenshot that launched a thousand takes

In late May 2026, a dashboard screenshot from a startup called Entelligence went viral. It claimed that across 2,444 companies, 82% of AI-generated tokens go to bugs, rework, and review. Four cents of every dollar reaches a shipped feature.

The number felt right to anyone who has watched an LLM confidently produce code that passes the linter and fails the user. Faros, CodeRabbit, DORA, GitClear, Stack Overflow, and Jellyfish had all, independently, published data pointing in the same direction. But "felt right" and "is right" are different claims with different evidence standards.

The chart is from Entelligence's own product dashboard. Entelligence sells AI code-review tools and engineering-intelligence dashboards. The 82% number is a sales asset for the product that fixes the problem the number describes. That does not make it wrong. It does mean the lab has to do what the screenshot did not: show the sources, name the assumptions, and rebuild the dollar from scratch.

The claim is directionally consistent with six independent datasets. It is numerically suspect in the specific 82% / $0.04 framing. And it is missing the one number that changes everything: the pre-AI baseline.

S19.1 The baseline nobody mentioned

Pre-AI software engineering already spent 75-80% of effort on maintenance, debugging, and review. The 82% number looks alarming until you remember the baseline.

The shocking 82% is roughly the same number we have been quoting since the Carter administration. Source: Lientz, Swanson & Tompkins, CACM 1978.

S19.2 The perception gap

The most-cited evidence that AI makes developers slower had to walk itself back. METR's 2025 RCT found a 19% slowdown. Their 2026 replication found 4%, with wide confidence intervals.

Developers thought they were 20% faster. They were 19% slower. A year later, the effect shrank to 4% and stopped being significant. Source: METR 2025, 2026.

S19.3 The capability nobody charted

While the rework discourse raged, the underlying models got dramatically better at the exact task they were accused of failing.

From 4.8% to 82.6% in 30 months. By the time you read a benchmark, it is already wrong. Source: Princeton SWE-Bench, Vals AI leaderboard.

S19.4 Acceleration whiplash

Faros AI tracked 22,000 developers across 4,000+ teams for two years. The verdict: AI helps the top line and hurts the bottom at the same time.

AI doubled the speedometer and tripled the brake-wear bill. Source: Faros AI, April 2026, n=22,000 developers.

S19.5 The quality cost is real

CodeRabbit analyzed 470 open-source PRs. AI-generated code had 1.7x more issues across every category.

The rework cost is not a vendor invention. Source: CodeRabbit, Dec 2025, n=470 PRs.

S19.6 Trust is falling as usage rises

Developers are using AI more even as they trust it less. This is the human texture behind the rework claim.

84% of developers use AI tools. 29% trust them. Source: Stack Overflow 2023-2025, n=49,009.

S19.7 Copy, don't refactor

AI optimizes for producing more code, not leaving the codebase smaller. 2024 was the first year copy-pasted lines exceeded refactored lines.

The crossover happened in 2024. AI encourages duplication, not consolidation. Source: GitClear, 211M lines, 2020-2024.

S19.8 The rebuilt dollar

Same dollar, three very different stories. Here is Sankar's framing, Jellyfish's data, and McKinsey's survey, side by side, with assumptions visible.

The $0.04 figure does not survive triangulation. The rework share does. Source: Entelligence, TechCrunch/Jellyfish Apr 2026, McKinsey 2025.

§ X Receipts

Entelligence claim

82%tokens on rework

Pre-AI baseline

75-80%on maintenance (1978)

Faros code churn

+861%high-AI vs low-AI

Faros incidents/PR

+242.7%high-AI vs low-AI

CodeRabbit issues

1.7xAI vs human PRs

SWE-Bench best

82.6%raw model (May 2026)

Dev AI usage

84%Stack Overflow 2025

Dev AI trust

29%Stack Overflow 2025

§ XI Methodology & Colophon

Engine

Every chart is hand-drawn in SVG by a small vanilla JavaScript module. No frameworks, no build step. Data is baked into the page as static arrays. Nothing is fetched from an API.

Data sources

Entelligence dashboard screenshot (viral, May 2026). Faros AI Acceleration Whiplash report (Apr 2026, n=22,000 devs). CodeRabbit State of AI vs Human Code Generation (Dec 2025, n=470 PRs). METR RCT (Jul 2025, n=16; Feb 2026 replication, n=57). DORA State of DevOps 2024 (n=3,000) and 2025 (n=4,879). Stack Overflow Developer Survey 2025 (n=49,009). GitClear AI Copilot Code Quality 2025 (211M changed lines). SWE-Bench Verified leaderboard (Vals AI, May 2026). Jellyfish Q1 2026 (n=7,548, via TechCrunch). McKinsey State of AI 2025 (n=1,993). Lientz, Swanson & Tompkins, CACM 1978.

Reading list

Faros AI · Acceleration Whiplash ↗
CodeRabbit · State of AI vs Human Code ↗
METR · AI developer productivity study ↗
DORA · State of AI-Assisted Software Development 2025 ↗
Stack Overflow · Developer Survey 2025 ↗
Lientz, Swanson & Tompkins · CACM 1978 ↗

Limitations

The Entelligence claim's methodology is opaque, self-reported, and commercially motivated. The "2,444 companies" figure almost certainly includes free-tier and trial accounts, not paying enterprise customers. The $0.04 net feature delivery number is inconsistent with every other published ROI estimate. SWE-Bench scores above 85% carry documented contamination concerns. The METR RCT's 2025 cohort was small (n=16) and the 2026 replication had severe selection bias. The pre-AI baseline of 75-80% is from 1978 and may not reflect contemporary practice. All forward-looking numbers are self-reported surveys, not controlled measurements.

← Back to the portfolio View the source on GitHub ↗