codeintelligently
Back to posts
Code Intelligence & Analysis

Code Review Analytics: What PR Data Tells About Team Health

Vaibhav Verma
7 min read
code reviewanalyticsengineering metricsteam healthdeveloper productivity

Code Review Analytics: What PR Data Tells About Team Health

Last year, I worked with a team that was struggling with velocity. They had 8 engineers, a clean codebase, good test coverage, and a well-defined sprint process. On paper, everything was right. But they were shipping 40% less than comparable teams.

I pulled their PR data. The answer was right there.

Average time from PR open to first review: 26 hours. Average time from PR open to merge: 4.1 days. Average review cycles (rounds of feedback before approval): 2.8. One developer was responsible for 52% of all reviews.

The bottleneck wasn't the code. It wasn't the process. It was the review pipeline. One senior engineer had become the de facto gatekeeper, and everyone else waited for their review before merging anything.

We fixed the review distribution, and cycle time dropped to 1.4 days within a month. Same team. Same code. Same process. Just better data about how reviews were actually flowing.

The Metrics That Matter

Most teams track zero metrics about their code review process. Those that do track the wrong ones. "Number of PRs reviewed" is vanity. "Lines of code reviewed" is noise.

Here are the metrics I track and why:

1. Time to First Review (TTFR)

The time between a PR being opened and receiving its first review comment. This is the single most impactful metric for engineering velocity.

Why it matters: Developers context-switch when waiting for reviews. A PR that sits for 24 hours doesn't just delay that feature by 24 hours. It causes the developer to start something new, and then they have to context-switch back when the review comes in. Research from Microsoft puts the cost of each context switch at 15-20 minutes of productivity.

Target: Under 4 hours during business hours. If you can get to under 2 hours, you'll see a measurable jump in throughput.

2. PR Cycle Time

The time from PR open to PR merge. This encompasses review time, rework time, and any CI/CD delays.

Why it matters: This is your end-to-end delivery speed for code changes. Long cycle times compound. If your average cycle time is 4 days and a feature requires 3 PRs in sequence, that feature takes 12 working days just in review overhead.

Target: Under 24 hours for standard PRs. Under 4 hours for bug fixes.

3. Review Cycles (Iterations)

The number of back-and-forth rounds between author and reviewer before the PR is approved.

Why it matters: High iteration counts signal one of several problems:

  • Misalignment on standards (the author and reviewer disagree about what "good" looks like)
  • Insufficient upfront design (the approach is being debated in the review instead of before coding)
  • Unclear PR scope (the PR is doing too many things, making it hard to review)

Target: 1.5 or fewer average review cycles. If you're regularly above 2, investigate why.

4. Review Distribution (Gini Coefficient)

How evenly review work is spread across the team. A Gini coefficient of 0 means perfectly equal distribution. A coefficient of 1 means one person does all reviews.

Why it matters: Uneven review distribution creates bottlenecks (as in my opening example), burns out senior engineers, and prevents junior engineers from developing review skills.

Target: Gini coefficient below 0.3. This doesn't mean everyone reviews equally (seniors should review more), but no one person should account for more than 25% of reviews on a team of 6+.

5. PR Size

Measured in lines changed (additions + deletions) or files changed.

Why it matters: Large PRs get worse reviews. This isn't opinion. A study by SmartBear found that review quality drops significantly after 400 lines. Reviewers start skimming. Defects slip through. The data from Google's engineering practices confirms this.

Target: Under 400 lines changed per PR. Under 10 files changed.

6. Review Comment Quality Ratio

The ratio of substantive comments (architecture, logic, security) to nitpick comments (formatting, naming, style).

Why it matters: If 80% of review comments are about formatting, your team is wasting time on things a linter should catch. Automate the nitpicks, and review time drops while review quality improves.

Target: At least 70% substantive comments. Automate everything else.

How to Collect the Data

GitHub

bash
# PR cycle time for the last 30 days
gh pr list --state merged --limit 100 --json number,createdAt,mergedAt,additions,deletions,changedFiles | \
  jq '.[] | {
    pr: .number,
    days: ((((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 86400) * 10 | floor / 10),
    lines: (.additions + .deletions),
    files: .changedFiles
  }'

# Review distribution: who reviewed the most PRs
gh api repos/{owner}/{repo}/pulls?state=closed\&per_page=100 --jq '.[].number' | \
  xargs -I {} gh api repos/{owner}/{repo}/pulls/{}/reviews --jq '.[].user.login' | \
  sort | uniq -c | sort -rn

GitLab and Azure DevOps

Both platforms expose similar data through their APIs. The principle is the same: query merged MRs/PRs, calculate time deltas, and aggregate by reviewer.

Dedicated Tools

  • LinearB: Purpose-built for engineering metrics. Tracks DORA metrics plus review-specific data.
  • Sleuth: Focuses on deployment frequency and change lead time with review breakdown.
  • Pluralsight Flow (formerly GitPrime): Deep git analytics including review patterns.
  • Custom dashboards: For teams that want full control, a weekly script that queries the GitHub API and writes to a PostgreSQL database works well.

The Framework: REVIEW Health Score

I compute a composite health score for code review processes using five inputs:

R - Response time: Average TTFR. Score: 10 if <2h, 7 if <4h, 4 if <8h, 1 if >8h.

E - Efficiency: Average review cycles. Score: 10 if <1.5, 7 if <2, 4 if <3, 1 if >3.

V - Volume balance: Review distribution Gini coefficient. Score: 10 if <0.2, 7 if <0.3, 4 if <0.5, 1 if >0.5.

I - Input size: Average PR size in lines. Score: 10 if <200, 7 if <400, 4 if <800, 1 if >800.

E - End-to-end time: Average cycle time. Score: 10 if <1d, 7 if <2d, 4 if <4d, 1 if >4d.

W - Worth (quality): Substantive comment ratio. Score: 10 if >80%, 7 if >60%, 4 if >40%, 1 if <40%.

Total score out of 60. Above 45 is healthy. 30-45 needs attention. Below 30 needs urgent intervention.

Common Anti-Patterns and Fixes

The Gatekeeper

Pattern: One senior developer reviews almost everything. Fix: Assign a "review buddy" to each PR based on code ownership, not seniority. Use CODEOWNERS files in GitHub to distribute automatically.

The Rubber Stamp

Pattern: PRs get approved with "LGTM" and no comments. Fix: Require at least one substantive comment per review. Some teams add a "I tested this by..." requirement.

The Mega-PR

Pattern: PRs regularly exceed 1,000 lines. Fix: Stacked PRs. Break the work into small, reviewable chunks that build on each other. Tools like Graphite and ghstack make stacked PRs practical.

The Stale PR

Pattern: PRs sit open for days, accumulating merge conflicts. Fix: Set up automated reminders. If a PR has no review after 4 hours, ping the assigned reviewer. If no reviewer is assigned, auto-assign based on CODEOWNERS.

The Bikeshed

Pattern: Reviews focus on naming, formatting, and style rather than logic and architecture. Fix: Automate formatting (Prettier), automate style (ESLint), automate import ordering (eslint-plugin-import). Remove human judgment from anything a tool can decide.

The Contrarian Take: Code Review Isn't About Finding Bugs

I know this is going to be controversial. But the data backs it up: code review catches only 15-30% of defects, according to studies by Microsoft Research and SmartBear.

If you're relying on code review as your primary quality gate, you're building on a weak foundation. Tests, type checking, and static analysis catch more bugs more reliably than human reviewers.

So what is code review for? Knowledge transfer. When a developer reviews another developer's code, they learn about that part of the system. They see new patterns and techniques. They build a shared understanding of how the codebase should evolve.

This reframing changes how you design your review process. Instead of asking "Did the reviewer find bugs?", ask "Did the reviewer learn something?" Instead of assigning reviews to the most experienced developer, assign them to the developer who would benefit most from understanding the changed area.

Your reviews will be faster, your knowledge distribution will improve, and your defect rate won't change because you'll still have tests, types, and linters doing the heavy lifting on bug detection.

Start Tracking Today

Pull your PR data right now. It takes 10 minutes with the GitHub CLI commands above. Calculate your TTFR, cycle time, and review distribution.

I guarantee you'll find at least one surprise. Maybe it's a bottleneck you didn't know about. Maybe it's a team member who's silently drowning in review requests. Maybe it's a pattern of large PRs that's killing your review quality.

The data is sitting in your GitHub history, waiting to tell you something. Go look.

$ ls ./related

Explore by topic