Stop checking AI-generated code. Start generating less of it

May 28, 2026

9

According to Sonar’s State of Code Developer Survey report for 2026, based on a survey of over 1,100 developers, 42% of committed code is now AI-assisted, and roughly 29% of it gets merged without manual review. Not “light review.” No review at all.

The industry’s response has been predictable: more guardrails. Static analysis. Token linting. Visual regression testing. Accessibility audits. Security scans. Each tool is a reasonable reaction to a real failure mode. Taken together, though, they describe something uncomfortable: a system permanently compensating for its own unreliability. The AI generates. The tooling checks. The developers arbitrate. And the whole apparatus scales linearly with the volume of code being produced.

That is the wrong scaling curve for any enterprise that plans to build more than a handful of applications.

The conventional framing — “How do we build better guardrails for AI-generated code?” — is not wrong. In my opinion, it is just incomplete. The more productive question should be, “How do we reduce the amount of code that needs guardrails in the first place?”

That question leads us to a fundamentally different architecture, one that thoughtfully applies AI on an escalating curve from zero to partial to full code generation. One I call the AI assembly model.

First, let’s take a deeper look at how things work today.

The generate-then-check treadmill

When a generative AI tool produces a UI component from scratch — a data table, a form, a navigation bar — the output is probabilistic. It might be correct. It might also carry a missing authentication check, a hardcoded color value that bypasses the design system, broken accessibility markup, or a state management pattern that collapses under concurrent load. You will not know until you inspect it. And inspection, at enterprise scale, is expensive.

So, the industry layers on post-generation validation. A static analyzer catches potential injection vectors. A linter flags design token drift. A visual regression suite compares the rendered component against a baseline. An accessibility scanner checks ARIA roles and contrast ratios. A DAST tool probes the running application for OWASP Top 10 vulnerabilities. Each of these tools addresses a genuine risk. None of them prevents the risk from occurring. They detect it after the fact.

This is a reactive posture, and it has a structural cost problem. Every new application built on a generate-first model requires the full battery of checks to run again. Every component generated from a prompt is a fresh surface for every category of defect. Double the number of apps, and you double the audit burden. Triple them, and you triple it. There is no compounding advantage. Each generation event starts from zero.

For a team shipping one experimental chatbot, that cost is manageable. For an enterprise program building dozens of internal applications across regulated business lines, it becomes the dominant line item in the development life cycle—not in compute costs, but in developer hours spent diagnosing wrong output, QA cycles catching regressions, and production incidents when defects slip through.

What if most code was never generated at all?

The AI assembly model starts from a different premise. The most reliable code is code that was never generated on demand.

Instead of prompting a large language model (LLM) to write a component from scratch every time, the assembly model maps developer intent — whether expressed through a natural-language prompt, a visual canvas interaction, or a Figma import — to a pre-built, tested, certified component from an enterprise library. The AI’s job is not to write the component. It is to select the right component and configure it.

This is a meaningful architectural distinction, not a marketing one. The assembly model operates across three tiers of generation, each with a different risk profile.

Zero generation: component mapping. Developer intent is matched against the component library. If a certified component exists that satisfies the requirement, it is selected directly. No code generation fires at all. The component arrives with its security posture, accessibility compliance, visual consistency, and cross-platform fidelity already verified. The consuming application inherits all of it.
Minimal generation: configuration and binding. The AI configures the selected component: setting properties, wiring data connectors, binding navigation paths, attaching authentication context. This is schema-bounded work. The configuration space is enumerable and verifiable. An AI misconfiguring a property against a typed schema is a detectable, correctable error — categorically different from an AI inventing a flawed implementation from whole cloth.
Targeted generation: filling genuine gaps. Custom business logic, novel integrations, components that genuinely have no library equivalent — these are generated. This is where AI code generation adds real value, and it is also the only tier where full guardrail checks are necessary. The critical difference is scope. Instead of validating everything, you validate only what was actually generated.

The guardrail, in this model, is not a check that fires after generation. It is the routing rule that sends developer intent to a pre-built artifact instead of a generative model. If the library has the answer, generation never starts. When it does start, it is scoped precisely to the gap that triggered it.

What pre-built components actually guarantee

The assembly model works only if the components in the library are genuinely certified artifacts, not just reusable snippets. Quality must be a property of the component itself, not something the consuming application is responsible for verifying. That means each component in the enterprise library must carry binding guarantees across several dimensions.

Visual consistency. Design tokens, dark mode behavior, responsive breakpoints, and brand compliance are verified at component build time. Every application that assembles from these components inherits visual fidelity without running per-app visual regression on the assembled portion. Token drift — the slow divergence of generated components from a design system — is eliminated for anything sourced from the library.
Security. Authentication scaffolding, CSRF protection, and OWASP compliance are structural properties of the component. You cannot assemble an insecure version of a secure component. This is a stronger guarantee than post-generation scanning, which can tell you only whether a particular generation run introduced a vulnerability. It cannot prevent the vulnerability from being generated in the first place.
Accessibility. WCAG AA compliance is validated once at component build time: color contrast, ARIA roles, focus management, keyboard navigation, screen reader compatibility, and interactive component behavior. Every application that consumes the component inherits the result. This is significant because accessibility defects in AI-generated code are among the most consistently overlooked in post-generation review, and among the most expensive to remediate after deployment.
Cross-platform fidelity. A single component declaration produces both a tested web artifact and a tested mobile artifact. Platform parity is a property of the component, not a testing burden repeated per application. For enterprises maintaining parallel web and mobile portfolios, this alone can eliminate a meaningful fraction of the QA life cycle.

Back-end services: where architectural guardrails matter most

The front-end component story is compelling, but the harder problem — and the higher-stakes one — lives in back-end services. Persistence layers, API endpoints, security filters, service integrations — this is where the most code gets generated in a typical enterprise application, and where architectural mistakes are most consequential.

The AI assembly model handles this by embedding architectural guardrails as structural properties of every generated service — not as optional patterns that developers must remember to follow, but as invariants that the platform enforces. The distinction matters. A pattern that developers can forget to apply is a pattern that will be forgotten, especially under the time pressure that AI-assisted velocity creates.

Six back-end guardrails, in particular, define the difference between code that merely compiles and code that can safely run a regulated business.

Stateless, horizontally scalable services. No session state in the application layer. Any instance can serve any request. Scaling becomes an infrastructure decision — add instances behind a load balancer — rather than an application architecture change. The same service architecture that handles a pilot with fifty users handles a production rollout serving millions. This follows the twelve-factor app methodology’s stateless processes principle, and it means that the gap between “prototype” and “production” is not an architectural rewrite.
Safe, cached, auditable data access. All database interaction runs through a generated persistence layer. There is no pattern in the platform’s output that produces an unguarded, hand-assembled SQL call — the kind that leads to the injection vulnerabilities that have topped the OWASP Top 10 for over a decade. Frequently accessed data is cached consistently across services. Every write operation carries an automatic audit trail: who changed what, and when. For regulated industries, this is not a convenience. It is a compliance requirement that the architecture satisfies by default.
Secrets isolated from code. No credentials appear in generated service code. API keys, database passwords, and encryption keys are injected at deployment time from a secure secrets vault, never written to source control. Rotating a credential requires no code change and no redeployment of business logic. This is the twelve-factor “externalized config” principle made structural: not a recommendation in a style guide, but a property of the code generation pipeline itself.
Role-based access control, end to end. Most platforms define access rules at the UI layer and leave back-end enforcement to developers. The assembly model generates RBAC as a single continuous constraint that spans every layer. A user sees only what their role permits in the interface. Their API calls are validated against the same role definition before any business logic executes. Their data queries are filtered at the database layer. One definition, enforced everywhere. No gaps. No drift between the access a user appears to have and the access they actually have.
API-bounded service contracts. Every service exposes a typed, versioned API contract. Services communicate through those contracts, never through shared data stores or direct coupling. Each service can be changed and redeployed independently without coordinated releases across the stack. This is what makes microservice architecture actually work in practice, as opposed to the distributed monolith that many teams accidentally build when service boundaries are not enforced by the platform.
Security validated against industry standards. Generated applications are tested against the OWASP Top 10 and verified through dynamic application security testing under real-world conditions. Compliance teams receive independently auditable evidence of security posture at every release — not a developer’s assertion that best practices were followed, but verifiable test results against a known standard.

None of these are novel ideas in isolation. Twelve-factor apps, OWASP compliance, externalized secrets, end-to-end RBAC — these are well-understood engineering principles. What is novel is making them structural properties of a code generation architecture rather than aspirational items on a checklist. When these guardrails are architectural invariants, they do not depend on developer discipline. They do not erode under deadline pressure. They do not vary between teams.

The cost argument, honestly

The AI assembly model is not free of trade-offs. It carries a higher context overhead than a bare generative approach. Teaching the system your component library schema, your design token bindings, your architectural constraints — all of this consumes tokens before the first line of useful output is produced. A naive comparison of per-session token cost will favor the generate-first model.

But that comparison is misleading, because it ignores where the real costs accumulate.

In a generate-first model, every component is produced in full, every time. Each generation run burns tokens on implementation code that already exists in a tested form somewhere in the organization’s component library, if only the model knew to use it. Self-correction loops are frequent, because probabilistic output regularly misses the target on the first pass. And every generated component requires the full audit cycle: security, accessibility, visual regression, functional testing.

In the assembly model, the component code already exists. The AI configures rather than constructs. A fraction of the tokens. A fraction of the self-correction loops. A fraction of the output requiring validation. The context overhead is paid once per session. The generation savings compound across every component assembled. And they compound again with every additional application built on the same library.

The real advantage, though, is not in token economics. It is in defect cost. Fewer developer hours spent diagnosing incorrect AI output. Fewer QA cycles spent catching regressions that a generate-first model produces stochastically. Fewer production incidents when defects evade the guardrail stack entirely. A pre-built, certified component absorbs those costs once, at build time. Every application that uses it inherits the savings. That is a compounding return on quality investment — the opposite of the linear cost growth that characterizes generate-then-check.

Certified by construction vs. verified by testing

For enterprises operating in regulated industries, such as financial services, health care, government, and insurance, the compliance implications of the assembly model deserve separate attention.

A generate-first model produces a compliance artifact that says, in essence: “We generated this code, and then we tested it, and the tests passed.” That is a valid compliance posture. It is also a fragile one. It depends on the completeness of the test suite, the rigor of the review process, and the assumption that every generation run will be subjected to the same standard of scrutiny. Given that 29% of AI-assisted code is already merging without review, that assumption is under visible strain.

The assembly model produces a different artifact: “This application was assembled from components that were certified at build time against these specific standards. Only the custom-generated portions required runtime validation.” The certified-by-construction approach reduces the compliance surface to the genuinely novel code — the business logic and integrations that no library component could satisfy. Everything else carries its compliance evidence with it, embedded in the component’s certification history.

This is not a theoretical distinction. It changes the conversation with auditors, with regulators, and with the internal risk committee. It shifts compliance from a per-release testing exercise to a structural property of the development platform. And it scales: the hundredth application built on a certified library faces the same compliance burden as the first, not a hundred times the burden.

The uncomfortable implication

The AI code generation debate, as currently framed, asks the wrong question. “How do we add better guardrails to AI-generated code?” is a question that accepts the premise of generate everything then check everything. It leads to an arms race between generation volume and validation tooling — an arms race where the volume is growing at 42% of committed code and rising, and the tooling is perpetually one defect category behind.

The AI assembly model reframes the question. Not “how do we check more effectively?” but “how do we generate less in the first place?” Not “how do we catch defects downstream?” but “how do we make defects structurally impossible for the assembled portion of the application?”

Guardrails are necessary. They will remain necessary for every line of code that AI genuinely generates. The argument here is not against guardrails. It is against a model where guardrails are the primary quality mechanism for an entire application, including the 70% or 80% of it that could have been assembled from certified parts.

The teams that figure this out first will not just ship faster. They will ship with a quality profile that generate-first teams cannot match without proportionally scaling their validation infrastructure — which is to say, without giving back most of the velocity gains that AI-assisted development was supposed to deliver.

—

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Previous articlePinterest releases 2026 summer fashion report

Next articleAn open-source toolkit for controlling out-of-control AI agents