Developer Productivity

Toil in Software Engineering: Finding and Eliminating It

Vaibhav Verma

June 12, 2026

9 min read

toildeveloper-productivityautomationengineering-efficiencysreengineering-leadership

Toil in Software Engineering: Finding and Eliminating It

Google's SRE team popularized the concept of toil: work that's manual, repetitive, automatable, and scales linearly with system size. They set a target of keeping toil below 50% of an SRE's time. Most software engineering teams I've worked with don't even measure their toil, and when they do, the number is usually horrifying.

I ran a toil audit on my team last quarter. The result: engineers spent 38% of their time on work that was repetitive, manual, and could be automated. That's 38% of salaries, 38% of energy, 38% of your team's finite capacity spent on tasks that a script could do. And unlike Google's SRE-focused definition, engineering toil extends far beyond operations into the daily development workflow.

The Contrarian Take: Most Engineering "Best Practices" Create Toil

Here's what nobody talks about: many of the processes teams adopt in the name of quality and rigor are actually toil generators. Manual QA checklists. Required approval from 3 reviewers. Hand-written changelog entries. Manually updated dependency graphs. Each one sounds responsible. Together, they create a death spiral where the process of building software takes longer than actually building it.

The question isn't "is this process valuable?" The question is "is this process valuable enough to justify its cost in human time, given that it could be automated?"

Identifying Toil: The Four Properties

Not all manual work is toil. Designing a system architecture is manual but not toil. Writing a one-off migration script is manual but not toil. True toil has four properties:

Repetitive: You've done this task before, in essentially the same way
Manual: A human has to execute the steps
Automatable: A computer could do this with the right tooling
Scaling: The work grows as your system or team grows

typescript

// A toil classifier
interface EngineeringTask {
  name: string;
  isRepetitive: boolean;    // Done more than once per month
  isManual: boolean;        // Requires human execution
  isAutomatable: boolean;   // Could be scripted/automated
  scalesWithGrowth: boolean; // More instances as system grows
}

function isToil(task: EngineeringTask): boolean {
  return task.isRepetitive &#x26;&#x26; task.isManual &#x26;&#x26; task.isAutomatable &#x26;&#x26; task.scalesWithGrowth;
}

// Examples:
const tasks: EngineeringTask[] = [
  {
    name: "Update version numbers before release",
    isRepetitive: true,    // every release
    isManual: true,         // engineer edits files
    isAutomatable: true,    // semantic-release does this
    scalesWithGrowth: true, // more packages = more version updates
  }, // TOIL

  {
    name: "Design API for new feature",
    isRepetitive: false,   // each API is unique
    isManual: true,
    isAutomatable: false,   // requires human judgment
    scalesWithGrowth: false,
  }, // NOT TOIL
];

The Toil Catalog: Where Engineering Time Disappears

After auditing 4 teams, I've cataloged the most common sources of engineering toil. The percentages represent the average share of total toil each category represents.

Category 1: Release Toil (28% of total toil)

Manually updating version numbers
Writing changelog entries by reading git log
Running manual smoke tests before deploy
Manually tagging releases and creating GitHub releases
Coordinating release timing across teams

bash

# Toil example: manually creating a changelog
# Engineer reads through all PRs since last release,
# writes summaries, categorizes changes, formats output
# Time: 30-90 minutes per release

# Automated alternative: conventional commits + auto-changelog
# npx conventional-changelog -p angular -i CHANGELOG.md -s
# Time: 0 minutes (runs in CI)

Category 2: Environment Toil (22% of total toil)

Setting up local development environments
Debugging "works on my machine" issues
Manually seeding test databases
Resetting environments after failed tests
Managing local service dependencies

Category 3: Code Review Toil (19% of total toil)

Manually checking for style violations (should be linters)
Verifying test coverage meets thresholds (should be CI)
Checking for missing documentation (should be CI)
Reviewing auto-generated code (migration files, schemas)

Category 4: Testing Toil (17% of total toil)

Manually running integration tests locally
Updating test fixtures after data model changes
Rerunning tests due to flaky failures
Manually testing UI flows that could be automated

Category 5: Communication Toil (14% of total toil)

Status update meetings that could be async
Writing deployment notifications manually
Repeating the same onboarding explanations
Answering the same "how do I...?" questions

The Toil Elimination Process

Step 1: Toil Tracking (1 week)

Have each engineer log toil for one week using a simple format:

typescript

interface ToilEntry {
  task: string;
  category: 'release' | 'environment' | 'review' | 'testing' | 'communication';
  timeMinutes: number;
  frequency: 'daily' | 'weekly' | 'per-release' | 'per-feature';
  automationDifficulty: 'easy' | 'medium' | 'hard';
}

Don't overthink this. A shared spreadsheet works fine. The goal is to make invisible work visible.

Step 2: Calculate the Toil Budget (1 day)

Aggregate the data and compute your team's toil percentage:

Total Toil Hours/Week = sum of all toil entries
Total Available Hours/Week = team size x 40
Toil Percentage = Total Toil Hours / Total Available Hours

Set a target. Google targets below 50% for SRE. For product engineering teams, I target below 20%. If you're above 30%, toil elimination should be your top priority because it's consuming almost a third of your engineering investment.

Step 3: Rank by ROI (1 day)

For each toil item, calculate:

Annual Time Cost = frequency_per_year x time_per_occurrence x engineers_affected
Automation Cost = estimated engineering days to build automation
ROI = Annual Time Cost / Automation Cost

Toil Item	Annual Hours	Automation Days	ROI
Manual changelog	78	2	39x
Dev environment setup	160	10	16x
Rerunning flaky tests	312	5	62x
Manual style review	208	3	69x
Status update meetings	520	1 (async tool)	520x

Step 4: Automate Top 3 per Quarter

Take the top 3 items by ROI. Assign them to engineers as first-class project work, not side projects. Track completion and impact.

The most important cultural shift: toil elimination is not optional "when you have time" work. It's engineering work with measurable ROI that deserves sprint allocation.

The Stealable Framework: The TOIL Dashboard

Build a simple dashboard that tracks toil over time:

typescript

interface ToilDashboard {
  currentToilPercentage: number;    // Target: below 20%
  toilTrend: 'increasing' | 'decreasing' | 'stable';
  topToilItems: Array&#x3C;{
    name: string;
    hoursPerWeek: number;
    automationStatus: 'identified' | 'in-progress' | 'automated';
  }>;
  toilEliminatedThisQuarter: number;  // Hours saved per week
  cumulativeSavings: number;          // Total hours saved since tracking began
}

Review the dashboard monthly. Celebrate toil elimination the same way you celebrate feature launches. An automation that saves 5 hours per week is equivalent to hiring 12.5% of an engineer. That's worth celebrating.

The Compound Effect

Here's what makes toil elimination so powerful: the savings compound. When you automate a task that took 5 hours per week, you don't just save 5 hours this week. You save 5 hours every week, forever. Over a year, that's 260 hours. Over three years, 780 hours. And the engineer-hours freed up can be spent on more toil elimination, creating a virtuous cycle.

I tracked this on my team. In Q1, we automated 8 hours per week of toil. In Q2, using some of those freed hours, we automated another 12 hours per week. By Q4, we'd reduced total toil from 38% to 14%. The team shipped 40% more features that year with the same headcount.

That's not magic. It's arithmetic. But you have to actually do the work of identifying, measuring, and eliminating toil. Most teams never start because toil feels like "just part of the job." It isn't. It's waste, and your team deserves better.