AI Builds Your App. Nobody Told It About Security.

45% of AI-generated code fails security tests. The problem isn't careless developers — it's that AI has a different definition of "done." A first-hand look at what that means.

AI coding tools write functional applications quickly. They also scaffold insecure backends with the same confidence, the same clean code, and zero indication that anything is wrong. Forty-five percent of AI-generated code fails security tests — that is the finding from Veracode's 2025 GenAI Code Security Report, which tested more than 100 language models. The number has not improved as models have gotten larger. The problem is not careless developers. The problem is that AI has a fundamentally different definition of "done" — one that has nothing to do with what an attacker would try.

What "Done" Means to an AI, and Why That Definition Is the Problem

When an AI coding assistant finishes scaffolding a backend, it has passed one test: does the code run? The happy path works. The UI renders. The data saves. The endpoint returns a 200. By the AI's criterion, the job is complete.

An adversary applies a different test: what happens when I send malformed input? What happens when I authenticate as one user and request another user's data? What happens when I call the admin endpoint directly? What happens when I extract the credentials embedded in your JavaScript?

These two definitions of "done" — the developer's and the attacker's — are only compatible if the person building the software thinks to apply the second test after the first one passes. AI does not apply the second test. Today's mainstream tools have no model of an adversary. Security is not part of its success criteria unless you make it part of the prompt, and even then, results are inconsistent.

This is not a criticism of any specific tool. It is a description of how AI coding systems work by design. These tools are trained to produce code that functions. They are not trained to produce code that resists deliberate abuse. That gap — between functional and secure — is where the problem lives.

Functional (left) vs. secure (right) — AI optimises for the first definition. Security requires a different test.

What I Built, and What I Found

Earlier this year I built an image generation tool: a Node.js backend, a PostgreSQL-based managed database service, a frontend for user uploads and prompt input. Standard setup. The kind of thing an AI-assisted workflow scaffolds in a few hours.

The AI built the entire database layer — tables, queries, API endpoints, authentication flow. It worked. Every feature functioned exactly as intended. Users could sign up, submit prompts, generate images, retrieve their history. Clean code, sensible naming, readable structure.

What the AI did not do:

Row-level security was not enabled. The database was configured so that any authenticated user could query any row in any table. User A could read User B's generation history, stored prompts, and any other data by constructing the right request. The AI created the tables correctly. It simply did not add the access policies that restrict what each authenticated user can see and modify.

The service key — the admin-level credential that bypasses all access controls — was in the frontend code. The managed database service issues two types of credentials: an anonymous key (safe for client-side use, respects access policies) and a service key (server-only, bypasses all policies entirely). The AI used the service key throughout, because the service key is the one that works immediately without access policies in place. It made it into the client bundle where any user could extract it.

There was no server-side input validation. The frontend validated user inputs. The API endpoints did not. Any request sent directly to the API — bypassing the frontend entirely — would be processed without any checks.

None of this was hidden. I was not tricked. The code was exactly what it appeared to be. It was architecturally insecure in ways that only become visible when you think like someone trying to break it, not like someone who built it.

The Four Patterns That Appear in Almost Every AI-Scaffolded Backend

Wiz Research analyzed thousands of applications built with AI coding tools in 2025 and identified four primary failure categories. They match exactly what I found, and they appear consistently across different tools, different developers, and different project types.

1. Missing or misconfigured access policies

Row-level security — the mechanism that restricts which database rows each user can access — is opt-in in most managed PostgreSQL services. AI scaffolds tables and queries that work without it. Adding it requires anticipating unauthorized access patterns, which is outside the AI's default scope.

A security audit of 1,645 applications built with a popular AI coding platform in 2025 found that 170 of them exposed user data including names, emails, financial records, and API keys. The root cause in every case: row-level security was absent or incorrectly configured. The platform had introduced its own security scanning feature months earlier. That scanner flagged whether policies existed — not whether they were correctly written. False confidence compounded the original problem.

2. Service keys in client-side code

The distinction between a publishable key (safe for frontend use, respects access rules) and a service key (server-only, bypasses all access rules) is a concept AI understands when asked and ignores when writing code. The service key is the one that works immediately. It ends up in the frontend because it makes the happy path functional.

In early 2026, a social application built with AI assistance was found to have exposed 1.5 million API tokens and 35,000 email addresses. The service key — admin-level database access — was present in client-side JavaScript. Anyone who loaded the application could extract it and query the entire database.

3. No authentication on endpoints

If the prompt says "build an endpoint that updates invoice status," the AI builds the endpoint. It does not enforce who can call it. Apiiro tracked a 10x increase in APIs missing authorization between December 2024 and June 2025, across AI-generated codebases. Missing endpoint authentication is now the dominant category of new security findings in AI-generated code.

4. Frontend-only input validation

React form validation, client-side checks, UI constraints — AI generates these reliably. What it does not generate without explicit instruction is the same validation at the API layer. Any attacker calling the API directly bypasses every check in the frontend. The application appears secure when used normally. It is not.

Why AI's Confident Output Suppresses Your Instinct to Check

Here is the counterintuitive finding from a 2022 Stanford study, peer-reviewed and not replicated in the opposite direction since: developers who used AI coding assistants produced more security vulnerabilities than developers who wrote code themselves — and were more confident their code was secure.

The mechanism matters. AI generates clean, well-commented, readable code. It uses consistent naming. It structures files sensibly. It passes the visual inspection test that a developer's eye applies before moving on. Nothing about the output signals that something might be wrong. There is no comment next to the missing access policy. There is no note that the service key should not leave the server.

The confident presentation of AI output is not incidental to the security problem. It is part of it. The code looks finished because the AI's criterion for finished is met. The developer's pattern-matching — searching for code that appears incomplete or suspicious — finds nothing. So the review is skipped, or abbreviated, or deferred indefinitely.

This is why the Stanford finding holds even for experienced developers. It is not a knowledge gap. It is a trust calibration problem. AI communicates certainty through the texture of its output. Developers read that texture. The instinct to check gets suppressed by the apparent completeness of what has been delivered.

Confident, complete-looking output reduces the instinct to check what might be missing.

Why the Review Step Disappears

Code review exists because writing code and reviewing code require different cognitive modes. Writing optimizes for making something work. Reviewing optimizes for finding what could go wrong. These are not compatible perspectives — you cannot hold both simultaneously with any reliability.

In traditional development, the gap between writing and reviewing is structural. Different people, different time, different context. The review step is enforced by the workflow.

In AI-assisted development, that gap collapses. You describe what you want. The AI builds it. It works. The next prompt is already forming. The workflow is conversational — iterative, fast, momentum-driven. Review is not a separate phase. It feels like stopping to distrust a collaborator who just delivered exactly what you asked for.

This is not a discipline problem that better developers avoid. It is a workflow problem the medium creates. The faster and more conversational the loop, the more review is experienced as interruption rather than process.

Apiiro's data shows a 153% increase in architectural design flaws in AI-generated codebases in 2025. These are not bugs — small errors that could appear anywhere. They are structural decisions about where credentials live, who can read which data, and what the API trusts. Decisions made by the AI and not reviewed by a human.

What the Numbers Say

The research on AI-generated code security is now substantial enough to represent a consistent finding across multiple independent sources — Veracode's 2025 GenAI Code Security Report, Wiz Research, Apiiro, Georgetown CSET, and independent vendor scans. The figures below are drawn from these reports, not a single canonical study; the direction and magnitude align across all of them.

Veracode's 2025 report tested more than 100 language models against known security scenarios. Forty-five percent of AI-generated code contains security flaws. AI-generated code is 2.74 times more likely to introduce cross-site scripting vulnerabilities and 1.91 times more likely to introduce insecure object references than human-written code. The failure rate for XSS specifically — across all models tested — is 86%. Georgetown CSET's November 2024 policy brief reached the same 86% XSS failure figure independently.

Apiiro tracked a 10x increase in new security findings per month from AI-generated codebases between December 2024 and June 2025. Privilege escalation paths increased 322%. Architectural design flaws increased 153%.

A scan of approximately 5,600 applications built with AI coding tools in 2025 found more than 2,000 vulnerabilities and over 400 exposed secrets. Broken access control — the OWASP A01:2025 category — saw a 40% surge in reported incidents in the same year.

IT Pro's 2025 analysis found AI-generated code is the cause of one in five enterprise breaches. Sixty-nine percent of security leaders reported finding serious vulnerabilities in AI-generated code.

These numbers do not describe an edge case population of negligent developers. They describe the average output of the current generation of AI coding tools, across a broad range of use cases and developer experience levels.

Who Is Building Software Now

For most of software's history, the population of people capable of deploying a functioning web application included, by definition, people with enough technical background to know that a service key should not be in client-side JavaScript. Building the application required knowing that.

AI coding tools have changed the entry point. Non-developers — founders, marketers, product managers, designers — are now building and deploying production applications that handle real user data. This is a genuine capability shift with real value. It is also a structural security shift with real consequences.

The same 5,600-application scan found 400+ exposed secrets across apps built by this broader population. Most of these builders did not knowingly skip a security step. They did not make a calculated risk decision. They had no mental model that includes "what happens when someone queries another user's data" as a question worth asking before deploying.

This is not a solvable problem through better documentation or more security warnings. It is a structural shift in who builds software, at a time when the tools that enable that shift do not communicate their security limitations. The volume of internet-connected software with architectural security flaws is growing — not because more careless people are writing code, but because a new population of builders is creating software with tools that complete the work confidently and say nothing about what they left out.

What to Check Before Shipping Any AI-Scaffolded Backend

Four questions cover the highest-risk patterns consistently found in AI-generated code:

Is row-level security enabled on every table that stores user data? Not just present — correctly configured. A policy that exists but is written incorrectly offers no protection. Test it: create two test users, write data with one, attempt to read it with the other using a direct database query.

Where are your credentials, and is the admin-level key anywhere in client-side code? Open your built frontend files and search for any key that appears in your database service dashboard under "service" or "secret" credentials. If it is there, it is public.

Does every API endpoint enforce authentication at the server layer — not just in the frontend? Call your endpoints directly with a tool like Postman or curl, without going through your UI, and without valid authentication. What do they return?

Is there server-side input validation on every endpoint that accepts user input? Frontend validation is a UX concern. Server-side validation is a security concern. They are not interchangeable. The API should reject malformed or oversized input regardless of what the UI allows.

These are not comprehensive. They are the four patterns that appear most consistently in AI-generated code that reaches production without a security review.

The Actual Problem

Not speed. Not laziness. Not a generation of developers who should have known better.

The actual problem: AI defines "done" as functional. Security requires a different definition of done — one that includes what happens when the system is used by someone trying to break it. AI does not apply that definition. Today's mainstream tools have no model of an adversary. And its confident, clean, well-structured output actively suppresses the developer's instinct to look for what is missing.

The review step is not optional. The AI's presentation of the work as finished is not evidence that the work is finished.

That distinction — between functional and secure — is still the developer's responsibility. AI has not changed that. It has just made the gap considerably harder to see.