Why Your Sprint Velocity Is Lying to You
Why Your Sprint Velocity Is Lying to You
I spent two years tracking sprint velocity religiously. We had Jira dashboards, burndown charts, velocity trend lines. Leadership loved it. "Velocity is up 15% quarter over quarter!" they'd announce at all-hands. Meanwhile, the engineers were miserable, customers were complaining about bugs, and our actual time-to-market for features had gotten worse. Velocity was going up, and we were going backwards.
Sprint velocity is one of the most dangerous metrics in software engineering. Not because it's useless, but because it's convincing. It looks like a productivity metric. It acts like a productivity metric. But it measures something completely different from what most teams think.
The Contrarian Take: Velocity Measures Estimation Consistency, Not Productivity
Here's what velocity actually tells you: how many story points your team completes per sprint. That's it. And since your team defines story points, velocity is a self-referential metric. It's like measuring your income in a currency you print yourself.
The fundamental problem: story points don't have a fixed unit of value. A "5-point story" on Team A might represent 3 days of work. On Team B, it's 1 day. On Team A next quarter (after point inflation), it's 2 days. You're tracking a number that means something different every time you measure it.
I tracked this across 8 teams. Here's what I found:
| Team | Velocity Trend (6 months) | Actual Output Trend | Customer-Facing Features/Month |
|---|---|---|---|
| Alpha | +22% | Flat | 3.2 -> 3.1 |
| Beta | +8% | -15% | 4.1 -> 2.8 |
| Gamma | -5% | +30% | 2.0 -> 3.7 |
| Delta | +35% | +10% | 1.8 -> 2.4 |
Team Gamma's velocity was declining while their actual output was improving by 30%. They'd started estimating more honestly. Team Beta's velocity was rising while they delivered less, because they'd inflated their point estimates to hit a velocity target their manager set.
The Four Ways Velocity Lies
Lie 1: Point Inflation
When velocity becomes a performance target, Goodhart's Law kicks in: the measure stops being a good measure. Engineers learn that estimating a task at 5 points instead of 3 makes the team "faster" without changing anything.
I've watched this happen in real time. A team that averaged 40 points per sprint was told to "increase velocity to 60." Three sprints later, they hit 62. Nothing had changed except the estimates. The same tasks that were 3 points became 5 points. Management celebrated.
Lie 2: Complexity Hiding
Velocity counts completed stories. It doesn't count stories abandoned mid-sprint, stories that got "simplified" (scope cut) to finish on time, or stories that were marked complete but left behind known bugs.
In one audit, I found that 28% of "completed" stories had open follow-up tickets filed within 2 weeks. The work wasn't done. It was just marked done so velocity numbers looked right.
Lie 3: Maintenance Invisibility
Bug fixes, dependency updates, test improvements, and refactoring don't get story points on most teams. So velocity only tracks feature work while ignoring the maintenance work that keeps the codebase shippable.
Teams that skip maintenance show higher velocity in the short term. Then they hit a cliff where everything takes 3x longer because the codebase has degraded. The velocity chart shows a hockey stick followed by a flatline.
Lie 4: The Averaging Problem
Velocity is reported as a team average. This hides everything interesting. A sprint where one engineer completed 30 points and another completed 0 (blocked by infrastructure issues) shows the same velocity as a sprint where everyone contributed evenly. The average hides the dysfunction.
What to Measure Instead
I've replaced velocity with four metrics that actually predict team performance. I call this the SHIP framework.
S - Cycle Time
How long does a unit of work take from start to finish? Not from estimation to completion. From first commit to production deploy.
# Measure cycle time from PR creation to merge (proxy for cycle time)
gh pr list --state merged --limit 50 --json createdAt,mergedAt \
--jq '.[] | {
pr: .number,
hours: ((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 3600
}'Cycle time is honest. It can't be inflated by changing estimates. It directly measures how fast work flows through your system. Target: median under 48 hours for a standard feature.
H - Hit Rate
What percentage of started work items get completed without scope changes, rollbacks, or follow-up bug fixes? This measures execution quality, not just quantity.
Track it simply: for every item completed this sprint, check back 2 weeks later. If a follow-up fix or scope-related ticket was filed, the original item doesn't count as a "clean hit."
Target: above 85% clean hit rate. Below 70% means you're shipping incomplete work.
I - Idle Time
What percentage of time are engineers blocked and unable to make progress? Blocked by code review, blocked by CI, blocked by unclear requirements, blocked by environment issues.
// Track blocking events
interface BlockingEvent {
engineer: string;
startTime: Date;
endTime: Date;
reason: 'code_review' | 'ci_pipeline' | 'requirements' | 'environment' | 'dependencies';
durationHours: number;
}
// Aggregate: what percentage of available engineering hours are spent blocked?
// Target: below 15%. Most teams are shocked to find it's above 30%.P - Predictability
How accurately can you predict when work will be done? Not "how many points will we complete?" but "if we start this feature Monday, when will it ship?"
Measure prediction accuracy: for the last 20 completed items, how far off was the initial estimate from actual delivery?
Predictability Score = 1 - (average |actual - estimate| / average actual)
A score above 0.7 means you're predictable enough for business planning. Below 0.5 means your estimates are coin flips.
The Stealable Framework: Replacing Velocity in 4 Sprints
Sprint 1: Start tracking SHIP metrics alongside velocity. Don't remove velocity yet. Just add the new measurements. Use PR data for cycle time, manual tracking for hit rate and idle time.
Sprint 2: Present both sets of metrics to the team. Show where velocity and SHIP metrics diverge. This is where the "aha" moment happens. Teams see that a high-velocity sprint with low hit rate actually produced negative value (shipped bugs).
Sprint 3: Stop setting velocity targets. Replace with cycle time and hit rate targets. Tell the team: "I don't care how many points you complete. I care that work flows quickly (cycle time) and lands cleanly (hit rate)."
Sprint 4: Drop velocity from your dashboards. Report SHIP metrics to leadership. Frame it as: "Instead of measuring how busy we are, we're measuring how fast and how clean we ship."
The Hard Conversation
Dropping velocity requires a conversation with leadership that goes something like this: "We've been reporting a number that makes us look productive but doesn't correlate with customer value. Here's the data proving the disconnect. Here's what we're replacing it with, and here's why these new metrics better predict business outcomes."
Every VP I've had this conversation with has pushed back initially and agreed within one quarter. The SHIP metrics are harder to game, easier to act on, and more directly connected to what the business actually cares about: shipping quality software on a predictable schedule.
Stop measuring how many imaginary points your team completes. Start measuring how fast clean work gets to customers. The difference in what you optimize for will change everything.
$ ls ./related
Explore by topic