S19 A forensic teardown of the 82% viral claim
The rework number went viral.
The baseline didn't.
A forensic teardown of the viral claim that 82% of AI tokens go to rework. Eight independent datasets, one rebuilt dollar. The direction is right. The number is marketing. The baseline is from 1978.
Scroll through eight interactive charts built from Faros, CodeRabbit, DORA, METR, Stack Overflow, GitClear, SWE-Bench, and McKinsey data. Each chart stress-tests one piece of the Entelligence screenshot. Toggle between the vendor's framing and what the independent telemetry actually shows.
§ I The screenshot that launched a thousand takes
In late May 2026, a dashboard screenshot from a startup called Entelligence went viral. It claimed that across 2,444 companies, 82% of AI-generated tokens go to bugs, rework, and review. Four cents of every dollar reaches a shipped feature.
The number felt right to anyone who has watched an LLM confidently produce code that passes the linter and fails the user. Faros, CodeRabbit, DORA, GitClear, Stack Overflow, and Jellyfish had all, independently, published data pointing in the same direction. But "felt right" and "is right" are different claims with different evidence standards.
The chart is from Entelligence's own product dashboard. Entelligence sells AI code-review tools and engineering-intelligence dashboards. The 82% number is a sales asset for the product that fixes the problem the number describes. That does not make it wrong. It does mean the lab has to do what the screenshot did not: show the sources, name the assumptions, and rebuild the dollar from scratch.
S19.1 The baseline nobody mentioned
Pre-AI software engineering already spent 75-80% of effort on maintenance, debugging, and review. The 82% number looks alarming until you remember the baseline.
S19.2 The perception gap
The most-cited evidence that AI makes developers slower had to walk itself back. METR's 2025 RCT found a 19% slowdown. Their 2026 replication found 4%, with wide confidence intervals.
S19.3 The capability nobody charted
While the rework discourse raged, the underlying models got dramatically better at the exact task they were accused of failing.
S19.4 Acceleration whiplash
Faros AI tracked 22,000 developers across 4,000+ teams for two years. The verdict: AI helps the top line and hurts the bottom at the same time.
S19.5 The quality cost is real
CodeRabbit analyzed 470 open-source PRs. AI-generated code had 1.7x more issues across every category.
S19.6 Trust is falling as usage rises
Developers are using AI more even as they trust it less. This is the human texture behind the rework claim.
S19.7 Copy, don't refactor
AI optimizes for producing more code, not leaving the codebase smaller. 2024 was the first year copy-pasted lines exceeded refactored lines.
S19.8 The rebuilt dollar
Same dollar, three very different stories. Here is Sankar's framing, Jellyfish's data, and McKinsey's survey, side by side, with assumptions visible.
§ X Receipts
§ XI Methodology & Colophon
Every chart is hand-drawn in SVG by a small vanilla JavaScript module. No frameworks, no build step. Data is baked into the page as static arrays. Nothing is fetched from an API.
Entelligence dashboard screenshot (viral, May 2026). Faros AI Acceleration Whiplash report (Apr 2026, n=22,000 devs). CodeRabbit State of AI vs Human Code Generation (Dec 2025, n=470 PRs). METR RCT (Jul 2025, n=16; Feb 2026 replication, n=57). DORA State of DevOps 2024 (n=3,000) and 2025 (n=4,879). Stack Overflow Developer Survey 2025 (n=49,009). GitClear AI Copilot Code Quality 2025 (211M changed lines). SWE-Bench Verified leaderboard (Vals AI, May 2026). Jellyfish Q1 2026 (n=7,548, via TechCrunch). McKinsey State of AI 2025 (n=1,993). Lientz, Swanson & Tompkins, CACM 1978.
Faros AI · Acceleration Whiplash ↗
CodeRabbit · State of AI vs Human Code ↗
METR · AI developer productivity study ↗
DORA · State of AI-Assisted Software Development 2025 ↗
Stack Overflow · Developer Survey 2025 ↗
Lientz, Swanson & Tompkins · CACM 1978 ↗
The Entelligence claim's methodology is opaque, self-reported, and commercially motivated. The "2,444 companies" figure almost certainly includes free-tier and trial accounts, not paying enterprise customers. The $0.04 net feature delivery number is inconsistent with every other published ROI estimate. SWE-Bench scores above 85% carry documented contamination concerns. The METR RCT's 2025 cohort was small (n=16) and the 2026 replication had severe selection bias. The pre-AI baseline of 75-80% is from 1978 and may not reflect contemporary practice. All forward-looking numbers are self-reported surveys, not controlled measurements.