Post

The 62% Problem — Security Flaws in AI-Generated Code

The Cloud Security Alliance found security flaws in 62% of AI-generated code samples they tested. What types of flaws, why AI generates them, and the review and tooling practices that catch them before they reach production.

The 62% Problem — Security Flaws in AI-Generated Code

In early 2026, the Cloud Security Alliance published results from a large-scale analysis of AI-generated code. Across models, languages, and task types, 62% of code samples contained at least one security flaw.

That number gets cited a lot. What gets cited less often is the breakdown: what kinds of flaws, in what contexts, and what the implications actually are for teams using AI-assisted development.

pie title Security Flaw Distribution in AI-Generated Code
    "Injection vulnerabilities" : 18
    "Hardcoded credentials" : 15
    "Insecure dependencies" : 12
    "Insufficient input validation" : 10
    "Error handling leaks" : 7
    "Other" : 18

What the 62% Actually Contains

The flaw distribution in the CSA analysis wasn’t uniform. The headline number covered a range from critical to minor:

Injection vulnerabilities (SQL, command, LDAP): ~18% of samples. AI models trained on tutorial code learn patterns like string-formatted SQL queries — code that works, that’s everywhere in training data, and that’s insecure.

Hardcoded credentials and secrets: ~15%. Models generate working examples with placeholder values that look like real values. Developers copy the pattern and sometimes the values.

Insecure dependencies and imports: ~12%. AI suggests packages without checking versions; older versions of common packages carry known CVEs.

Insufficient input validation: ~10%. Models generate the happy path. Edge cases — malformed input, extreme values, encoding issues — are frequently absent.

Improper error handling that leaks information: ~7%. Stack traces returned to API callers. Error messages that reveal internal paths or database structure.

The distribution matters because it tells you where to focus review effort.


Why AI Generates Insecure Code

The training data problem is structural: the internet has a lot of insecure code. Stack Overflow answers optimise for getting something working, not for security hardening. Tutorial sites show the simple case. Internal codebases with injection vulnerabilities contributed to the training corpus.

The model learned to predict what comes next in code — and in a lot of training examples, what came next was insecure.

There’s also a task framing problem: when you ask an AI to “add a search endpoint,” it generates a search endpoint. The security requirements of that endpoint aren’t in the prompt, so they often aren’t in the output.


The Review Pipeline That Actually Catches This

SAST in the CI pipeline, not just on PR merge:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# .github/workflows/security-scan.yml
name: Security Scan

on: [push, pull_request]

jobs:
  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Semgrep
        uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/security-audit
            p/secrets
            p/owasp-top-ten
          generateSarif: "1"
      
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: semgrep.sarif

SAST running on every push — not just on PRs to main — catches issues before they’re embedded in review discussions.

Secret scanning at commit time:

1
2
3
4
5
6
7
8
9
10
# Install pre-commit hooks
pip install pre-commit detect-secrets

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Secrets caught before commit are never in git history. Secrets caught at PR review are in history even if the PR is rejected.

Dependency vulnerability scanning:

1
2
3
4
5
6
7
8
# Python: safety or pip-audit
pip-audit --require-hashes -r requirements.txt

# Node: npm audit
npm audit --audit-level=moderate

# In CI: fail builds on high/critical CVEs
pip-audit --require-hashes --strict -r requirements.txt

The AI-Specific Code Review Checklist

When reviewing AI-generated code, go beyond what you’d check for human-written code:

Injection risks:

  • Database queries use parameterised statements, not string formatting
  • Shell commands don’t include user-controlled input
  • Template rendering uses context-aware escaping

Input validation:

  • Function validates input types and ranges at the entry point
  • File paths are validated and restricted to expected directories
  • Deserialized data is validated against a schema before use

Credential handling:

  • No hardcoded credentials, API keys, or passwords
  • Secrets loaded from environment or secret management (not config files committed to repo)
  • Connection strings don’t contain credentials

Error handling:

  • Exception handlers log internally, don’t expose to external callers
  • HTTP error responses don’t include stack traces or internal paths
  • Sensitive data isn’t logged

Dependencies:

  • New imports are necessary (AI sometimes adds unused imports from examples)
  • Pinned versions; scan against CVE databases

The Practical Stance

The 62% finding doesn’t mean AI-generated code is net negative for security. Human developers introduce security flaws too — the base rate for human code in production is non-trivial.

What it means is that AI-generated code needs the same review discipline you’d apply to junior developer output: systematic, not trusting, focused on the high-risk patterns.

The teams I’ve seen handle this well have SAST running on every commit (not just nightly), require security review for any AI-generated code touching authentication, external APIs, or data persistence, and run secret scanning as a pre-commit hook that blocks commits rather than warns about them.

The teams that struggle have the tools configured but not enforced — scanning running in advisory mode, security checks that developers can bypass with a flag.


Day 19 of the Production Agentic AI series. Previous: Tool Orchestration at Scale

This post is licensed under CC BY 4.0 by the author.