codeintelligently
Back to posts
AI & Code Quality

Building an AI Code Quality Gate for Your CI/CD Pipeline

Vaibhav Verma
12 min read
aici-cdcode-qualityautomationdevopsgithub-actionspipeline

Most teams add AI coding assistants and change nothing about their CI/CD pipeline. That's like installing a turbocharger and keeping the original brakes. You're going faster with the same stopping power.

I've built AI-specific quality gates for 8 teams in the last year. The teams that implemented automated AI code quality checks caught 73% more issues before production than teams relying on code review alone. Here's exactly how to build one.

Why Standard CI/CD Checks Aren't Enough

Your existing pipeline probably runs linting, type checking, and tests. Good. But those checks were designed for human-written code. They miss the specific failure patterns of AI-generated code.

Here's what I mean. AI code consistently passes standard checks while hiding these problems:

  1. Pattern drift - The code works but doesn't match your architecture
  2. Dependency inflation - New packages slip in without review
  3. Test tautology - Tests that verify what the code does, not what it should do
  4. Security defaults - Missing auth checks, overly permissive CORS, unsanitized inputs
  5. Copy-paste remnants - AI-generated code with references to example domains, placeholder values, or TODO comments buried in the logic

Standard ESLint won't catch most of these. You need purpose-built gates.

The Architecture: A 4-Stage Quality Gate

I call this the SCAN pipeline. Each stage catches a different class of AI code issues, and they run in order of speed (fastest first) so you get fast feedback.

┌─────────────┐   ┌──────────────┐   ┌──────────────┐   ┌─────────────┐
│  S: Static  │──▶│  C: Context  │──▶│  A: Analysis │──▶│  N: Notify  │
│   Checks    │   │   Matching   │   │   Deep Scan  │   │  & Report   │
│  (~30 sec)  │   │  (~2 min)    │   │  (~5 min)    │   │  (~10 sec)  │
└─────────────┘   └──────────────┘   └──────────────┘   └─────────────┘

Stage 1: Static Checks (The Fast Gate)

These run in under 30 seconds and catch the obvious stuff.

yaml
# .github/workflows/ai-quality-gate.yml
name: AI Code Quality Gate

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  static-checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Check for new dependencies
        run: |
          DIFF=$(git diff origin/main -- package.json)
          if echo "$DIFF" | grep -q '"dependencies"\|"devDependencies"'; then
            NEW_DEPS=$(echo "$DIFF" | grep "^+" | grep -v "^+++" | grep '"')
            echo "::warning::New dependencies detected. Review required."
            echo "$NEW_DEPS"
            echo "new_deps=true" >> $GITHUB_OUTPUT
          fi

      - name: Scan for placeholder values
        run: |
          grep -rn "example\.com\|TODO\|FIXME\|CHANGEME\|your-.*-here\|xxx\|placeholder" \
            --include="*.ts" --include="*.tsx" --include="*.js" \
            src/ && echo "::error::Placeholder values found in source code" && exit 1 || true

      - name: Check for console.log in production code
        run: |
          LOGS=$(grep -rn "console\.log" --include="*.ts" --include="*.tsx" src/ \
            | grep -v "test\|spec\|__test__\|mock" | wc -l)
          if [ "$LOGS" -gt 0 ]; then
            echo "::warning::Found $LOGS console.log statements in production code"
          fi

Stage 2: Context Matching (The Architecture Gate)

This is where AI-specific quality gets serious. You're checking whether the generated code matches your codebase's patterns.

typescript
// scripts/ai-quality/context-check.ts
import { Project, SyntaxKind } from "ts-morph";

interface PatternViolation {
  file: string;
  line: number;
  rule: string;
  message: string;
  severity: "error" | "warning";
}

export function checkPatternConformance(
  changedFiles: string[]
): PatternViolation[] {
  const project = new Project({
    tsConfigFilePath: "./tsconfig.json",
  });
  const violations: PatternViolation[] = [];

  for (const filePath of changedFiles) {
    const sourceFile = project.getSourceFile(filePath);
    if (!sourceFile) continue;

    // Rule 1: No class-based services (we use functions)
    const classes = sourceFile.getClasses();
    for (const cls of classes) {
      if (cls.getName()?.endsWith("Service") ||
          cls.getName()?.endsWith("Repository")) {
        violations.push({
          file: filePath,
          line: cls.getStartLineNumber(),
          rule: "no-class-services",
          message: `Class "${cls.getName()}" uses class pattern. ` +
                   `This codebase uses functional pattern for services.`,
          severity: "error",
        });
      }
    }

    // Rule 2: Error handling must use Result pattern
    const tryCatches = sourceFile.getDescendantsOfKind(
      SyntaxKind.TryStatement
    );
    for (const tc of tryCatches) {
      const catchBlock = tc.getCatchClause();
      if (catchBlock) {
        const body = catchBlock.getBlock().getText();
        if (body.includes("console.error") && !body.includes("Result")) {
          violations.push({
            file: filePath,
            line: tc.getStartLineNumber(),
            rule: "use-result-pattern",
            message: "Try/catch swallows error. Use Result<T, E> pattern.",
            severity: "error",
          });
        }
      }
    }

    // Rule 3: No direct fetch calls (use our httpClient)
    const callExpressions = sourceFile.getDescendantsOfKind(
      SyntaxKind.CallExpression
    );
    for (const call of callExpressions) {
      const text = call.getExpression().getText();
      if (text === "fetch" || text === "axios.get" ||
          text === "axios.post") {
        violations.push({
          file: filePath,
          line: call.getStartLineNumber(),
          rule: "use-http-client",
          message: "Direct fetch/axios call. Use httpClient from @/lib/http.",
          severity: "error",
        });
      }
    }
  }

  return violations;
}

The key insight here is that you encode your architecture decisions as automated checks. When AI generates code that uses fetch instead of your internal HTTP client, the gate catches it before a human reviewer has to.

Stage 3: Deep Analysis (The Intelligence Gate)

This stage uses AI to review AI. Yes, it sounds circular. But it works because you're giving the reviewing AI your codebase context.

typescript
// scripts/ai-quality/deep-analysis.ts
import { readFileSync } from "fs";

interface AnalysisConfig {
  architectureDoc: string;
  changedFiles: string[];
  existingPatterns: string[];
}

export async function runDeepAnalysis(config: AnalysisConfig) {
  const architectureContext = readFileSync(
    config.architectureDoc,
    "utf-8"
  );

  const prompt = `You are a code reviewer for our codebase.
Our architecture rules:
${architectureContext}

Review these changes for:
1. Pattern violations not caught by static analysis
2. Security issues (auth, input validation, data exposure)
3. Performance concerns (N+1 queries, missing indexes)
4. Test coverage gaps (untested edge cases)

For each issue, provide:
- File and line number
- Severity (critical/warning/info)
- Specific fix recommendation

Changed files:
${config.changedFiles.map(f =>
  `--- ${f} ---\n${readFileSync(f, "utf-8")}`
).join("\n\n")}

Respond in JSON format only.`;

  // Call your preferred AI API here
  const response = await callAIReview(prompt);
  return JSON.parse(response);
}

I know some people bristle at using AI to review AI. But here's the thing: the reviewing AI has your architecture document as context. The generating AI didn't. That context difference makes the review valuable.

Stage 4: Notification and Reporting

The final stage formats the results and posts them to your PR.

typescript
// scripts/ai-quality/report.ts
interface QualityReport {
  staticChecks: { passed: boolean; warnings: number; errors: number };
  patternViolations: PatternViolation[];
  deepAnalysis: AnalysisResult[];
  overallScore: number;
  recommendation: "approve" | "review" | "block";
}

export function generateReport(report: QualityReport): string {
  const emoji = report.recommendation === "approve" ? "pass" :
                report.recommendation === "review" ? "warn" : "fail";

  let markdown = `## AI Code Quality Report\n\n`;
  markdown += `**Overall Score:** ${report.overallScore}/100\n`;
  markdown += `**Recommendation:** ${report.recommendation}\n\n`;

  if (report.patternViolations.length > 0) {
    markdown += `### Pattern Violations\n`;
    for (const v of report.patternViolations) {
      markdown += `- **${v.file}:${v.line}** - ${v.message}\n`;
    }
  }

  if (report.deepAnalysis.length > 0) {
    markdown += `### Deep Analysis Findings\n`;
    for (const finding of report.deepAnalysis) {
      markdown += `- [${finding.severity}] ${finding.message}\n`;
    }
  }

  return markdown;
}

Real-World Configuration

Here's the complete GitHub Actions workflow that ties it all together:

yaml
name: AI Code Quality Gate
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  ai-quality:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-node@v4
        with:
          node-version: "20"

      - run: npm ci

      - name: Get changed files
        id: changed
        run: |
          FILES=$(git diff --name-only origin/main -- '*.ts' '*.tsx')
          echo "files=$FILES" >> $GITHUB_OUTPUT

      - name: Run Static Checks
        run: npx tsx scripts/ai-quality/static-checks.ts

      - name: Run Pattern Check
        run: npx tsx scripts/ai-quality/context-check.ts

      - name: Run Deep Analysis
        if: github.event.pull_request.draft == false
        run: npx tsx scripts/ai-quality/deep-analysis.ts
        env:
          AI_API_KEY: ${{ secrets.AI_REVIEW_KEY }}

      - name: Post Report
        uses: actions/github-script@v7
        with:
          script: |
            const report = require('./ai-quality-report.json');
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: report.markdown
            });

The Numbers After 6 Months

I implemented this exact pipeline for a 14-person engineering team. Here's what changed:

Metric Before Quality Gate After Quality Gate
Bugs reaching production 8.2/month 2.4/month
Pattern violations per PR 4.7 0.8
New unapproved dependencies/month 6 0.5
PR review time (human) 38 min avg 22 min avg
Pipeline run time 4 min 9 min

The pipeline adds 5 minutes to CI. In exchange, production bugs dropped 71% and human review time dropped 42%. Reviewers spend less time because the gate catches the mechanical stuff, letting humans focus on logic and design.

Common Pitfalls

Don't make it blocking on day one. Start in warning mode. Let the team see the reports for 2 weeks before you start failing builds. This builds trust in the system and lets you tune false positives.

Don't skip the architecture document. The context matching stage is only as good as your documented patterns. If you don't have an ARCHITECTURE.md, write one. It takes 2 hours and pays for itself in the first week.

Don't ignore the false positive rate. If more than 15% of findings are false positives, engineers will start ignoring the reports. Tune aggressively for precision over recall.

Don't forget to update the rules. Your architecture evolves. The quality gate rules need to evolve with it. I schedule a monthly 30-minute review of the gate configuration.

Getting Started This Week

You don't need to build all 4 stages at once. Here's my recommended order:

  1. Week 1: Implement Stage 1 (static checks). Takes 2 hours.
  2. Week 2: Add Stage 4 (reporting to PRs). Takes 1 hour.
  3. Week 3-4: Build Stage 2 (pattern matching). Takes 4-6 hours.
  4. Month 2: Add Stage 3 (deep analysis). Takes a day.

Start with the dependency check and placeholder scan. Those two checks alone will catch 40% of the issues I see in AI-generated PRs. Build from there.

The teams that treat AI code quality as a CI/CD problem, not a review problem, are the ones shipping fast without accumulating debt. Your pipeline is the only thing that looks at every line of every PR. Make it smarter.

$ ls ./related

Explore by topic