How to Run a Codebase Health Check
How to Run a Codebase Health Check
Last quarter, a VP asked me a question I couldn't answer: "Is our codebase healthy?" I had opinions. I had gut feelings. I didn't have data. That moment forced me to build a repeatable process for measuring codebase health, and the results surprised everyone on the leadership team.
Most engineering orgs treat codebase health like a vibe check. Someone senior says "the code is fine" or "the code is a mess," and everyone nods. That's not engineering. That's folklore. I'm going to walk you through the exact process I now run every quarter, complete with the metrics, tools, and thresholds that actually matter.
Why Codebase Health Checks Matter
Here's the contrarian take: your codebase doesn't need to be "clean." It needs to be changeable. I've seen beautifully architected codebases where every feature takes 3 weeks to ship. I've seen messy codebases where the team delivers daily. The difference isn't code aesthetics. It's whether the codebase actively resists the changes your business needs.
A codebase health check measures changeability, not beauty. If you're optimizing for anything else, you're solving the wrong problem.
The VITAL Signs Framework
After running health checks across 9 codebases over two years, I've settled on 5 dimensions that predict whether a codebase will slow your team down. I call them VITAL signs.
V - Velocity of Change
How fast can you safely make changes? Measure this with:
- Lead time for changes: Time from first commit to production deploy. Healthy: under 24 hours. Concerning: over 1 week.
- Deploy frequency: How often you ship to production. Healthy: daily or more. Concerning: monthly or less.
- PR cycle time: Time from PR open to merge. Healthy: under 12 hours. Concerning: over 48 hours.
# Measure average PR cycle time from the last 90 days
gh pr list --state merged --limit 100 --json createdAt,mergedAt \
--jq '[.[] | ((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 3600] | add / length'If your lead time is over a week, the codebase is fighting you. It doesn't matter how elegant the architecture looks on a whiteboard.
I - Incident Correlation
Which parts of the codebase cause production incidents? This is the metric that gets executive attention.
// Track change failure rate per module
interface ModuleIncidentData {
module: string;
deploysInPeriod: number;
incidentsCaused: number;
changeFailureRate: number;
}
// Real numbers from a health check I ran:
const results: ModuleIncidentData[] = [
{ module: "auth/", deploysInPeriod: 34, incidentsCaused: 8, changeFailureRate: 0.235 },
{ module: "billing/", deploysInPeriod: 21, incidentsCaused: 6, changeFailureRate: 0.286 },
{ module: "notifications/", deploysInPeriod: 45, incidentsCaused: 2, changeFailureRate: 0.044 },
];A module with a change failure rate above 15% is a health hazard. Above 25%, it's an emergency.
T - Test Effectiveness
Notice I said effectiveness, not coverage. Coverage is a vanity metric. I've seen codebases with 90% coverage that break constantly because the tests were testing implementation details instead of behavior.
Measure test effectiveness with:
- Defect escape rate: What percentage of bugs make it past your test suite into production?
- Test-to-change ratio: When you change production code, how many test files need updating? If it's more than 1:1, your tests are coupled to implementation.
- Mutation testing score: Tools like Stryker can tell you what percentage of intentionally introduced bugs your tests actually catch. Aim for above 60%.
# Run mutation testing to measure true test effectiveness
npx stryker run --reporters clear-text
# Look for the mutation score, not line coverageA - Architectural Clarity
Can a new engineer understand where to put things? I measure this by tracking how often PRs get "wrong location" feedback in code review. If more than 10% of PRs get this feedback, your architecture isn't communicating intent.
Other signals:
- Circular dependencies: Use
madge --circular src/to detect them. Any circular dependency is a health issue. - God modules: Any module with more than 50 direct dependents is a bottleneck. It'll show up in merge conflicts and slow reviews.
- Layering violations: Are your UI components importing from database modules? Map your intended layers and check for violations.
L - Legacy Burden
How much of your codebase is effectively frozen because nobody understands it or dares to change it?
# Find files that haven't been modified in over a year but are still imported
git log --all --diff-filter=M --since="1 year ago" --name-only --pretty=format: | \
sort -u > recently_modified.txt
# Compare against files that are actually imported/usedIn one health check, I found that 34% of the codebase hadn't been touched in 18 months but was still actively referenced. That's 34% of code that's effectively unmaintainable because nobody on the current team wrote it or understands it.
Running the Health Check: Step by Step
Step 1: Gather Automated Metrics (2-4 hours)
Pull data from your existing tools. You don't need anything new.
| Metric | Source | Command/Query |
|---|---|---|
| PR cycle time | GitHub/GitLab API | See script above |
| Deploy frequency | CI/CD platform | Count deploys per week |
| Change failure rate | Incident tracker + git | Correlate incidents to commits |
| Test mutation score | Stryker/pit | Run mutation testing |
| Circular deps | madge | madge --circular src/ |
| Code ownership gaps | git-fame or git-of-theseus | See below |
# Identify ownership gaps: files where the last author has left the team
git log --format='%ae %H' --diff-filter=M -- src/ | \
sort -k1,1 -u | \
awk '{print $1}' | sort | uniq -c | sort -rnStep 2: Run Developer Surveys (1 day)
Automated metrics miss the human side. Ask your team 5 questions, rated 1-5:
- "I can confidently make changes to any part of the codebase" (measures knowledge distribution)
- "Our test suite catches bugs before production" (measures test confidence)
- "I know where to put new code" (measures architectural clarity)
- "I can understand code written by other team members" (measures readability)
- "Our development environment rarely blocks me" (measures tooling health)
Average score below 3 on any question is a red flag. Below 2 is a five-alarm fire.
Step 3: Correlate and Prioritize (2 hours)
The magic happens when you cross-reference automated data with survey results. In my last health check:
- Automated metrics showed billing/ had a 28.6% change failure rate
- Developer surveys showed billing/ scored 1.8 on "confident making changes"
- Git analysis showed the original billing author left 14 months ago
That's not three separate problems. That's one problem with three symptoms: knowledge loss. The fix wasn't rewriting billing code. It was pairing sessions and documentation sprints.
Step 4: Build the Health Report
Present findings as a one-page scorecard:
CODEBASE HEALTH REPORT - Q2 2026
=================================
Overall Health Score: 62/100 (Needs Attention)
VITAL Signs:
V - Velocity of Change: 7/10 (Lead time: 18 hours)
I - Incident Correlation: 4/10 (2 modules above 20% CFR)
T - Test Effectiveness: 6/10 (Mutation score: 54%)
A - Architectural Clarity: 8/10 (Low circular deps)
L - Legacy Burden: 5/10 (28% frozen code)
Top 3 Action Items:
1. Pair programming rotation for billing/ module
2. Add mutation testing to CI for auth/ module
3. Schedule architecture decision records (ADR) sprint
The Stealable Framework: Quarterly VITAL Check
Here's the process you can copy directly:
- Week 1, Day 1: Run automated metric collection scripts (save them; they're reusable)
- Week 1, Day 2-3: Send developer survey (5 questions, anonymous, takes 3 minutes)
- Week 1, Day 4: Cross-reference data, build scorecard
- Week 1, Day 5: Present to engineering leadership with 3 prioritized action items
- Weeks 2-12: Execute on action items, track improvement
The key insight that took me too long to learn: don't try to fix everything. Pick the 3 items where automated metrics AND developer surveys agree there's a problem. Those are your highest-conviction bets.
What Good Looks Like
After four quarters of running VITAL checks, the team I worked with went from a health score of 48 to 79. Lead time dropped from 6 days to 14 hours. Change failure rate went from 22% to 7%. But the number I'm most proud of is the developer survey: "I can confidently make changes to any part of the codebase" went from 2.1 to 3.8.
That's what a healthy codebase feels like. Not perfect code. Confident engineers.
Your codebase is talking to you through metrics, incident reports, and the frustration on your team's faces. A health check is just learning to listen systematically. Start this quarter. You'll wonder why you waited.
$ ls ./related
Explore by topic