How GitHub Copilot Changes Your Code Review Process
How GitHub Copilot Changes Your Code Review Process
Our team's PR approval rate dropped from 78% first-pass to 52% first-pass within two months of adopting GitHub Copilot. Not because the code was worse. Because our review process wasn't designed for AI-generated code.
After 10 months of adjusting, our first-pass approval rate is back to 74%. But the review process looks nothing like it did before. Here's exactly what changed.
What Copilot Does to Your PRs
Before Copilot, a typical PR on our team had 80-200 lines of changes. The developer wrote every line intentionally. Each change had a reason.
After Copilot, PRs ballooned to 200-500 lines. The extra lines came from three sources:
-
Copilot completions that were accepted but unnecessary. Auto-complete suggests adding error handling, logging, or utility functions that look useful but aren't needed for the current task.
-
Boilerplate that Copilot generates beautifully but shouldn't exist. Copilot will generate an entire CRUD API with full error handling in 30 seconds. Impressive, but if you only needed the "read" endpoint, the rest is dead code.
-
Inconsistent patterns from different Copilot sessions. Copilot generates slightly different code each time, depending on what files you have open. Morning PRs use one pattern, afternoon PRs use another.
The Five Review Process Changes
Change 1: PR Size Limits Got Strict
We implemented a hard limit: 300 lines per PR. No exceptions.
Before Copilot, this limit felt unnecessary because developers naturally kept PRs manageable. After Copilot, developers would accept a stream of completions and end up with 600+ line PRs without realizing it.
The 300-line limit forces developers to be selective about which Copilot suggestions they keep. That selectiveness is the first quality filter.
Change 2: We Added "Copilot Context" to PR Descriptions
Every PR now includes a section explaining which parts were Copilot-generated and which were hand-written. Not to shame anyone, but to help reviewers allocate their attention correctly.
## AI Context
- **Copilot-generated:** API route handler, Zod schema validation
- **Hand-written:** Business logic in calculateShipping(), migration file
- **Modified from Copilot:** Error handling (changed from try/catch to Result pattern)This single change cut our review time by 20%. Reviewers knew exactly where to look closely and where to skim.
Change 3: We Changed What Reviewers Look For
Old review focus: "Is this code correct?" New review focus: "Does this code fit?"
The distinction matters. Copilot code is usually correct in isolation. It does what it says. But it often doesn't fit the codebase. It uses different patterns, imports different libraries, or structures things differently than the rest of the project.
Our reviewer guide now emphasizes:
| Old Focus | New Focus |
|---|---|
| Syntax errors | Pattern consistency |
| Logic bugs | Architectural fit |
| Missing features | Unnecessary code |
| Test presence | Test quality |
| Documentation | Context comments |
Change 4: We Introduced the "Delete First" Rule
When reviewing Copilot-heavy PRs, the first question is always: "What can we remove?"
Copilot generates generous code. It adds comprehensive error handling to functions that are only called from trusted internal code. It generates utility functions "just in case." It creates types for objects that are only used once.
Our reviewers now have explicit permission to comment "remove this" on any code that isn't directly needed for the PR's stated goal. Before Copilot, suggesting code deletion could feel aggressive. Now it's expected.
Real example from our codebase:
// Copilot generated a complete validation utility
// for a field that's only set by our own internal service
function validateWebhookPayload(payload: unknown): WebhookPayload {
if (!payload || typeof payload !== "object") {
throw new ValidationError("Invalid payload");
}
if (!("event" in payload) || typeof payload.event !== "string") {
throw new ValidationError("Missing event field");
}
if (!("data" in payload) || typeof payload.data !== "object") {
throw new ValidationError("Missing data field");
}
// ... 20 more lines of validation
return payload as WebhookPayload;
}
// What we actually needed (internal service, trusted input):
const payload = body as WebhookPayload;This happens constantly. Copilot doesn't know that the webhook comes from your own service, so it generates full validation. The reviewer needs to know the context to make the call.
Change 5: We Added Automated Pattern Checks
We created a set of custom ESLint rules that catch the most common Copilot inconsistencies:
// .eslintrc.js (partial)
module.exports = {
rules: {
// Catch Copilot's tendency to use different HTTP clients
"no-restricted-imports": ["error", {
patterns: [{
group: ["axios", "node-fetch", "got"],
message: "Use our internal apiClient from @/lib/api",
}],
}],
// Catch Copilot's tendency to use try/catch instead of our Result pattern
"no-restricted-syntax": ["error", {
selector: "TryStatement",
message: "Use Result<T,E> pattern. See ARCHITECTURE.md#error-handling",
}],
},
};These rules catch about 40% of pattern violations before the PR reaches a human reviewer. That's 40% less work for the reviewer and 40% less back-and-forth.
The Copilot Review Decision Tree
Use this to decide how deeply to review each file in a Copilot-assisted PR:
1. Is the file in the "AI Context" section as hand-written?
YES → Standard review (trust the developer's judgment)
NO → Continue to step 2
2. Does the file touch auth, payments, or data mutations?
YES → Deep review (line by line, check every edge case)
NO → Continue to step 3
3. Is the file a test file?
YES → Check test quality (behavior vs implementation)
NO → Continue to step 4
4. Is the file UI/presentation only?
YES → Light review (visual check, accessibility basics)
NO → Standard review with pattern check
Metrics: Before and After
After 10 months with the new process, here are our numbers:
| Metric | Before Copilot | After (No Process) | After (New Process) |
|---|---|---|---|
| First-pass approval | 78% | 52% | 74% |
| Avg review time | 22 min | 38 min | 25 min |
| Post-merge bugs | 3.2/month | 7.1/month | 3.8/month |
| PR size (avg lines) | 145 | 387 | 198 |
| Pattern violations | 2.1/PR | 6.8/PR | 2.4/PR |
The new process gets us close to pre-Copilot quality levels while keeping the productivity benefits of AI-assisted coding.
What I Got Wrong
I initially blamed Copilot for the quality drop. "The AI is generating bad code" was my diagnosis. I was wrong. Copilot generates reasonable code. The problem was that our review process assumed every line of code was intentionally written by a human who understood the system.
Once I stopped blaming the tool and started adapting the process, things improved fast. The tool isn't the problem. The process gap is the problem.
If your team just adopted Copilot and your quality metrics are dropping, don't turn off Copilot. Update your review process. Start with the five changes above, measure for a month, and adjust from there.
$ ls ./related
Explore by topic