In 2024 we collected 200 pull requests across 31 teams that each shipped a real production incident. For every PR we walked back through the reviewer comments, the CI logs, the postmortem, and — where we could — interviewed the reviewer.
Three things came out of that data. The first two were what we expected. The third surprised us.
What we expected
1. Reviewer attention drops fast within a session. First PR a reviewer sees in a session catches 1.8× more issues than the third PR in the same session. By the fifth PR the catch rate has plateaued. This is well-documented; we just confirmed it.
2. Larger diffs attract less feedback per line. Past 200 lines changed, comment density collapses. Reviewers either skim or hand-wave or sign off and move on. Also expected.
What we didn't expect
3. Time of day is the strongest predictor of reviewer miss-rate. PRs reviewed between 11:00 and 14:00 local time have a 2.4× lower incident rate than PRs reviewed between 15:00 and 18:00. Even controlling for reviewer experience, diff size, and team. The afternoon is the bug-shaped hole in your review process.
This was surprising enough that we asked the dataset what was different about post-lunch reviews. The diffs themselves weren't bigger. The reviewers weren't junior. The PR descriptions weren't worse. The thing that was different was comment specificity: morning reviewers asked questions about behavior; afternoon reviewers commented on style.
Why this matters for tooling
Most automated review tools optimise for "comprehensiveness" — surface every issue, every time. The data suggests this is the wrong target. Reviewers don't have unlimited attention; surfacing 14 issues at 4 p.m. doesn't get you 14 fixed issues, it gets you 2 fixed issues and 12 ignored ones.
The lesson we took into codingassist.bot: a tool that produces three high-confidence, behavior-relevant signals is worth more than one that produces fourteen mixed-quality findings. The comment density isn't the metric — the fix-rate is. Optimising for that means giving the reviewer a small number of things they cannot easily ignore, especially after lunch.
Methodology, briefly
- 200 PRs, 31 teams, 9 industries, 14 months of postmortem data
- Every PR had a published incident retro within 30 days of merge
- Reviewer interviews ran 25–40 minutes; we recorded but did not transcribe
- The full dataset (anonymised) is available to academic partners on request