Build a Code Quality Dashboard Your Team Will Use
Build a Code Quality Dashboard Your Team Will Use
I've built 5 code quality dashboards over my career. The first 3 were failures. Not because they had bad data or ugly UI, but because nobody looked at them after the first week. They joined the graveyard of well-intentioned engineering tools that get launched to fanfare and ignored within a month.
The last 2 succeeded. The difference wasn't the metrics I chose or the visualization library I used. It was understanding why dashboards fail and designing around those failure modes from the start. This post is the playbook for building a code quality dashboard that your team will actually check regularly, not just one they'll bookmark and forget.
Why Most Code Quality Dashboards Fail
I surveyed 24 engineering managers about their quality dashboards. 19 of them (79%) said they had one. 6 of them (25%) said their team checks it weekly. 2 of them (8%) said it influences actual decisions. That's a 92% failure rate for influencing behavior, which is supposedly the whole point.
The failure patterns are consistent:
Failure Mode 1: Too Many Metrics. The dashboard shows 25+ metrics because the person who built it figured "more data is better." The result is analysis paralysis. Nobody knows which number matters, so nobody acts on any of them. I've seen dashboards with code coverage, cyclomatic complexity, Halstead metrics, maintainability index, coupling, cohesion, lines of code, comment ratios, and 15 other numbers. Nobody on the team could tell me what a "good" Halstead volume was.
Failure Mode 2: No Actionable Context. The dashboard shows a number going up or down, but doesn't help you understand why or what to do about it. "Code coverage dropped from 78% to 74%" is information. "Code coverage dropped because the new auth module (47 files, 0 tests) was merged on Tuesday" is actionable context.
Failure Mode 3: Not Connected to Workflow. The dashboard exists on a separate URL that engineers have to actively choose to visit. It's not in their daily workflow. Out of sight, out of mind. The dashboards that work are the ones that intersect with activities engineers already do: PR reviews, sprint planning, deployment pipelines.
Failure Mode 4: Vanity Metrics. The dashboard tracks things that feel important but don't correlate with actual quality outcomes. Lines of code per sprint. Number of PRs merged. Story points completed. These measure activity, not quality.
The 7-Metric Dashboard
After experimenting with dozens of metrics, I've settled on exactly 7 that I put on every code quality dashboard. Not 6. Not 8. Seven, because that's the maximum number of metrics a team can track without cognitive overload, and each one maps to a specific quality dimension.
Metric 1: Change Failure Rate
What it measures: Percentage of deployments that cause a failure in production (rollback, hotfix, or incident).
Target: <5% for mature teams, <10% for growing teams.
Why it matters: This is the single best proxy for overall code quality. If your changes break production frequently, your quality process has gaps.
How to calculate:
Change Failure Rate = (Failed Deployments / Total Deployments) x 100
Pull this from your deployment tool (GitHub Actions, CircleCI, ArgoCD) crossed with your incident tracker.
Metric 2: PR Review Turnaround Time (p50 and p90)
What it measures: Time from PR opened to first meaningful review.
Target: p50 <4 hours, p90 <24 hours.
Why it matters: Slow reviews create context-switching, merge conflicts, and frustration. They're also a leading indicator of team health. When review times creep up, it usually means the team is overloaded or disengaged.
Metric 3: Escaped Bug Rate
What it measures: Number of bugs found in production per sprint that should have been caught by tests or review.
Target: <2 per sprint for a team of 6-8 engineers.
Why it matters: This measures the effectiveness of your quality gates (tests, reviews, QA). Unlike code coverage, which measures what you tested, escaped bugs measure what you missed.
Metric 4: Test Suite Health
What it measures: Two sub-metrics: (a) percentage of CI runs where tests pass on first attempt, and (b) test suite execution time.
Target: (a) >95% green on first run, (b) <10 minutes for the full suite.
Why it matters: Flaky tests erode trust in the test suite. Slow tests discourage running them. Both lead to engineers skipping tests, which leads to more escaped bugs.
Metric 5: Dependency Health Score
What it measures: Composite score based on: known vulnerabilities in dependencies, percentage of dependencies more than 2 major versions behind, and abandoned dependencies (no commit in 12+ months).
Target: 0 high/critical vulnerabilities. <10% of dependencies significantly outdated.
Why it matters: Your code's quality includes the quality of code you depend on. This metric catches supply chain risk before it becomes an incident.
Metric 6: Code Hotspot Churn
What it measures: Files that are changed most frequently AND have high complexity. These are your "hotspots" where bugs are most likely to emerge.
Why it matters: Not all code needs the same quality investment. A file changed 40 times in 3 months with a cyclomatic complexity of 25 is where your next bug will come from. A file changed once in 6 months doesn't need your attention regardless of its complexity.
How to calculate:
# Get files by change frequency (last 90 days)
git log --since="90 days ago" --name-only --pretty=format: | \
sort | uniq -c | sort -rn | head -20Cross-reference with complexity scores from your static analysis tool (ESLint, SonarQube, etc.).
Metric 7: Deployment Frequency
What it measures: How often the team deploys to production.
Target: At least daily for mature teams. At least weekly for growing teams.
Why it matters: Deployment frequency is the contrarian metric. Most people think it's a velocity metric, not a quality metric. I disagree. Teams that deploy frequently have smaller changesets, which are easier to review, easier to test, and easier to roll back. High deployment frequency forces quality practices because the cost of a failure is lower (small blast radius) and the feedback loop is tighter.
Building the Dashboard: Technical Implementation
I recommend building on top of existing tools rather than creating a custom dashboard from scratch. Here's the stack I use:
Data sources:
- GitHub API for PR metrics and code churn
- CI/CD tool API (GitHub Actions, CircleCI) for deployment and test suite metrics
- Dependency scanner (Trivy, Snyk) for dependency health
- Incident tracker (PagerDuty, Opsgenie) for change failure rate
Dashboard tool: Grafana. It's free, connects to everything, and your ops team probably already runs it. If you want something simpler, a Google Sheet with automated data import works surprisingly well for teams under 20.
Data pipeline:
// scripts/collect-quality-metrics.ts
interface QualityMetrics {
changeFailureRate: number;
prReviewP50: number;
prReviewP90: number;
escapedBugs: number;
testFirstPassRate: number;
testSuiteTime: number;
depHealthScore: number;
hotspotCount: number;
deployFrequency: number;
collectedAt: Date;
}
async function collectMetrics(): Promise<QualityMetrics> {
const [deployments, prs, incidents, testRuns, deps] = await Promise.all([
fetchDeployments("last_sprint"),
fetchPRs("last_sprint"),
fetchIncidents("last_sprint"),
fetchTestRuns("last_sprint"),
fetchDependencyReport(),
]);
return {
changeFailureRate: calcChangeFailureRate(deployments, incidents),
prReviewP50: calcPercentile(prs.map((p) => p.timeToFirstReview), 50),
prReviewP90: calcPercentile(prs.map((p) => p.timeToFirstReview), 90),
escapedBugs: incidents.filter((i) => i.type === "escaped_bug").length,
testFirstPassRate: testRuns.filter((t) => t.firstAttemptPass).length / testRuns.length,
testSuiteTime: calcMedian(testRuns.map((t) => t.duration)),
depHealthScore: deps.healthScore,
hotspotCount: await calcHotspots(),
deployFrequency: deployments.length,
collectedAt: new Date(),
};
}Run this on a cron job (weekly is sufficient for most teams).
The Stealable Framework: Dashboard Adoption in 3 Weeks
Week 1: Build and Seed
- Implement the 7 metrics with data from the last 90 days
- Establish baselines ("here's where we are today")
- Don't set targets yet. Let the team see the data first
Week 2: Integrate Into Workflow
- Add a dashboard summary to the top of every sprint retrospective
- Post a weekly Slack digest (automated) with the 3 most changed metrics
- Add a quality gate to PRs: if a PR will increase the hotspot count, flag it
Week 3: Set Targets and Iterate
- As a team, agree on targets for each metric (use the benchmarks above as starting points)
- Assign each metric an "owner" who's responsible for investigating when it trends wrong
- Schedule a monthly 30-minute "dashboard review" to assess whether the metrics are still the right ones
The contrarian take on dashboards: the goal is NOT to make all metrics green. The goal is to make trends visible so the team can make informed trade-off decisions. Sometimes you'll consciously let escaped bugs tick up for a sprint because you're shipping a high-risk feature fast. That's fine, as long as it's a conscious decision visible on the dashboard, not a surprise discovered 3 weeks later.
A dashboard that changes behavior is worth building. A dashboard that nobody looks at is waste. Design for the first outcome by embedding the data into workflows your team already uses, keeping the metric count low, and making every number actionable.
$ ls ./related
Explore by topic