Vol. XII · No. 05 · May 2026
Jake Cuth.

The rework number went viral.
The baseline didn't.

A forensic teardown of the viral claim that 82% of AI tokens go to rework. Eight independent datasets, one rebuilt dollar. The direction is right. The number is marketing. The baseline is from 1978.

Scroll through eight interactive charts built from Faros, CodeRabbit, DORA, METR, Stack Overflow, GitClear, SWE-Bench, and McKinsey data. Each chart stress-tests one piece of the Entelligence screenshot. Toggle between the vendor's framing and what the independent telemetry actually shows.


In late May 2026, a dashboard screenshot from a startup called Entelligence went viral. It claimed that across 2,444 companies, 82% of AI-generated tokens go to bugs, rework, and review. Four cents of every dollar reaches a shipped feature.

The number felt right to anyone who has watched an LLM confidently produce code that passes the linter and fails the user. Faros, CodeRabbit, DORA, GitClear, Stack Overflow, and Jellyfish had all, independently, published data pointing in the same direction. But "felt right" and "is right" are different claims with different evidence standards.

The chart is from Entelligence's own product dashboard. Entelligence sells AI code-review tools and engineering-intelligence dashboards. The 82% number is a sales asset for the product that fixes the problem the number describes. That does not make it wrong. It does mean the lab has to do what the screenshot did not: show the sources, name the assumptions, and rebuild the dollar from scratch.

The claim is directionally consistent with six independent datasets. It is numerically suspect in the specific 82% / $0.04 framing. And it is missing the one number that changes everything: the pre-AI baseline.

Pre-AI software engineering already spent 75-80% of effort on maintenance, debugging, and review. The 82% number looks alarming until you remember the baseline.

0% 25% 50% 75% 100% 77.5% Pre-AI baseline (1978) 82% Entelligence claim (2026) +4.5 pp
The shocking 82% is roughly the same number we have been quoting since the Carter administration. Source: Lientz, Swanson & Tompkins, CACM 1978.

The most-cited evidence that AI makes developers slower had to walk itself back. METR's 2025 RCT found a 19% slowdown. Their 2026 replication found 4%, with wide confidence intervals.

0% -50% -25% +25% +50% Predicted +24% Self-reported +20% METR 2025 -19% (CI: -39 to +1) METR 2026 -4% (CI: -15 to +9) Slower ←                           → Faster
Developers thought they were 20% faster. They were 19% slower. A year later, the effect shrank to 4% and stopped being significant. Source: METR 2025, 2026.

While the rework discourse raged, the underlying models got dramatically better at the exact task they were accused of failing.

0% 25% 50% 75% 100% Oct 2023 Apr 2024 Oct 2024 Apr 2025 May 2026 SWE-Agent 4.8% Devin 14.7% OpenHands 35.3% o3 58.8% Claude 4 72.7% 82.6%
From 4.8% to 82.6% in 30 months. By the time you read a benchmark, it is already wrong. Source: Princeton SWE-Bench, Vals AI leaderboard.

Faros AI tracked 22,000 developers across 4,000+ teams for two years. The verdict: AI helps the top line and hurts the bottom at the same time.

0% -500% -250% +250% +500% +900% Epics completed +66% Tasks completed +33.7% PRs merged +16.2% No-review merges +31.3% Bugs per developer +54% Time to first review +156.6% Incidents per PR +242.7% Review time +441.5% Code churn +861%
AI doubled the speedometer and tripled the brake-wear bill. Source: Faros AI, April 2026, n=22,000 developers.

CodeRabbit analyzed 470 open-source PRs. AI-generated code had 1.7x more issues across every category.

0x 1.0x 2.0x 3.0x Human baseline Maintainability 1.8x Correctness 1.6x Security 1.9x Error handling 1.5x Documentation 1.7x Overall 1.7x
The rework cost is not a vendor invention. Source: CodeRabbit, Dec 2025, n=470 PRs.

Developers are using AI more even as they trust it less. This is the human texture behind the rework claim.

0% 25% 50% 75% 100% 2023 2024 2025 70% 76% 84% Usage 40% ~35% 29% Trust
84% of developers use AI tools. 29% trust them. Source: Stack Overflow 2023-2025, n=49,009.

AI optimizes for producing more code, not leaving the codebase smaller. 2024 was the first year copy-pasted lines exceeded refactored lines.

0% 10% 20% 30% 2020 2021 2022 2023 2024 Copilot GA 24.1% 9.5% Refactored 8.3% 12.3% Copy/pasted Crossover
The crossover happened in 2024. AI encourages duplication, not consolidation. Source: GitClear, 211M lines, 2020-2024.

Same dollar, three very different stories. Here is Sankar's framing, Jellyfish's data, and McKinsey's survey, side by side, with assumptions visible.

Entelligence (Sankar) Jellyfish-implied McKinsey-implied $0.44 Rework $0.27 Review $0.25 Other overhead $0.04 Shipped feature ~$0.50 Overhead ~$0.30 Marginal rework ~$0.20 Delivered value ~$0.40 Overhead ~$0.40 Productivity ~$0.10-0.20 Net savings Each bar = $1.00 of AI coding spend
The $0.04 figure does not survive triangulation. The rework share does. Source: Entelligence, TechCrunch/Jellyfish Apr 2026, McKinsey 2025.

Entelligence claim
82%tokens on rework
Pre-AI baseline
75-80%on maintenance (1978)
Faros code churn
+861%high-AI vs low-AI
Faros incidents/PR
+242.7%high-AI vs low-AI
CodeRabbit issues
1.7xAI vs human PRs
SWE-Bench best
82.6%raw model (May 2026)
Dev AI usage
84%Stack Overflow 2025
Dev AI trust
29%Stack Overflow 2025

Engine

Every chart is hand-drawn in SVG by a small vanilla JavaScript module. No frameworks, no build step. Data is baked into the page as static arrays. Nothing is fetched from an API.

Data sources

Entelligence dashboard screenshot (viral, May 2026). Faros AI Acceleration Whiplash report (Apr 2026, n=22,000 devs). CodeRabbit State of AI vs Human Code Generation (Dec 2025, n=470 PRs). METR RCT (Jul 2025, n=16; Feb 2026 replication, n=57). DORA State of DevOps 2024 (n=3,000) and 2025 (n=4,879). Stack Overflow Developer Survey 2025 (n=49,009). GitClear AI Copilot Code Quality 2025 (211M changed lines). SWE-Bench Verified leaderboard (Vals AI, May 2026). Jellyfish Q1 2026 (n=7,548, via TechCrunch). McKinsey State of AI 2025 (n=1,993). Lientz, Swanson & Tompkins, CACM 1978.

Reading list

Faros AI · Acceleration Whiplash ↗
CodeRabbit · State of AI vs Human Code ↗
METR · AI developer productivity study ↗
DORA · State of AI-Assisted Software Development 2025 ↗
Stack Overflow · Developer Survey 2025 ↗
Lientz, Swanson & Tompkins · CACM 1978 ↗

Limitations

The Entelligence claim's methodology is opaque, self-reported, and commercially motivated. The "2,444 companies" figure almost certainly includes free-tier and trial accounts, not paying enterprise customers. The $0.04 net feature delivery number is inconsistent with every other published ROI estimate. SWE-Bench scores above 85% carry documented contamination concerns. The METR RCT's 2025 cohort was small (n=16) and the 2026 replication had severe selection bias. The pre-AI baseline of 75-80% is from 1978 and may not reflect contemporary practice. All forward-looking numbers are self-reported surveys, not controlled measurements.