How AI Made Us Productive and Clueless: A 2025 Retrospective

2026-01-01 · by Matthew van Bird

AI in 2025 & Predicting 2026

In 2025, I reviewed a 200 file pull request that no one on the team could explain, including the author. "Copilot generated most of it," he said. "It compiles and the tests pass." The PR merged the next day.

This wasn't an isolated incident. Across .NET teams, AI became ubiquitous in software development. Scaffolding, refactors, async conversions, test generation, and migrations happened almost instantly. Pull requests grew larger, merged faster, and velocity metrics soared.

But behind the numbers, something fundamental shifted: engineers increasingly couldn't explain the systems they were shipping.

We optimised for output over understanding, and that trade-off is now embedded in our codebases.

Code Reviews Lost Their Purpose

Code reviews once focused on clarity of intent, abstraction boundaries, ownership, failure modes, and scalability. By 2025, they'd become perfunctory checks: "Does it compile? Do the tests pass? Copilot generated most of it anyway."

The problem isn't speed. It's the erosion of reasoning. When reviewers can't trace the logic because they didn't write it and the author didn't fully understand it, we're not reviewing, we're rubber-stamping.

The Illusion of Modular Design

AI-generated architectures often look well-structured but reveal problems under scrutiny. Here's a real example I encountered:

public interface IUserManager
{
    Task<User> GetUserAsync(int id);
}

public class UserManager : IUserManager
{
    private readonly IUserOrchestrator _orchestrator;
    private readonly ILogger<UserManager> _logger;

    public async Task<User> GetUserAsync(int id)
    {
        _logger.LogInformation("UserManager.GetUserAsync called with id: {Id}", id);
        var result = await _orchestrator.GetUserAsync(id);
        _logger.LogInformation("UserManager.GetUserAsync completed for id: {Id}", id);
        return result;
    }
}

public interface IUserOrchestrator
{
    Task<User> GetUserAsync(int id);
}

public class UserOrchestrator : IUserOrchestrator
{
    private readonly IUserRepository _repository;
    private readonly ILogger<UserOrchestrator> _logger;

    public async Task<User> GetUserAsync(int id)
    {
        _logger.LogInformation("UserOrchestrator.GetUserAsync called with id: {Id}", id);
        var result = await _repository.GetByIdAsync(id);
        _logger.LogInformation("UserOrchestrator.GetUserAsync completed for id: {Id}", id);
        return result;
    }
}

Three layers of abstraction. Two interfaces with single implementations. Four log statements. Zero business logic. The "Manager" manages nothing: it logs and delegates. The "Orchestrator" orchestrates nothing: it logs and delegates. This could have been a direct repository call.

When I ask why these layers exist: "It's more scalable, the AI structured it this way for separation of concerns." But there's no separation of what concerns? No variation in behaviour, no different implementations, no actual decisions being made. Just indirection for the sake of indirection.

The result is systems that are simultaneously over-engineered and under-reasoned: harder to maintain, debug, and trace through. Every stack trace now has three extra frames. Every modification requires changing three files. The complexity is real; the benefits are imaginary.

Ritual Without Reason

I've encountered production code with elaborate manual garbage collection routines:

public class MemoryOptimizationService : IHostedService
{
    private readonly ILogger<MemoryOptimizationService> _logger;
    private Timer _timer;

    public Task StartAsync(CancellationToken cancellationToken)
    {
        _timer = new Timer(OptimizeMemory, null, TimeSpan.Zero, TimeSpan.FromMinutes(5));
        return Task.CompletedTask;
    }

    private void OptimizeMemory(object state)
    {
        var gen0Before = GC.CollectionCount(0);
        var gen1Before = GC.CollectionCount(1);
        var gen2Before = GC.CollectionCount(2);
        var memoryBefore = GC.GetTotalMemory(false);

        _logger.LogInformation("Starting memory optimization. Memory: {Memory}MB, Gen0: {Gen0}, Gen1: {Gen1}, Gen2: {Gen2}", 
            memoryBefore / 1024 / 1024, gen0Before, gen1Before, gen2Before);

        GC.Collect(2, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();
        GC.Collect(2, GCCollectionMode.Forced);

        var gen0After = GC.CollectionCount(0);
        var gen1After = GC.CollectionCount(1);
        var gen2After = GC.CollectionCount(2);
        var memoryAfter = GC.GetTotalMemory(true);

        _logger.LogInformation("Completed memory optimization. Memory: {Memory}MB (saved {Saved}MB), Gen0: {Gen0}, Gen1: {Gen1}, Gen2: {Gen2}",
            memoryAfter / 1024 / 1024, (memoryBefore - memoryAfter) / 1024 / 1024,
            gen0After, gen1After, gen2After);
    }
}

This runs every five minutes in production. When I asked why:

"Copilot said it would improve memory performance and fix the memory leak."

Here is what that code actually does.

We already have Datadog monitoring memory, GC pressure, and collection counts. This code duplicates that monitoring while flooding our logs with useless metrics.

The .NET runtime is already optimised. The GC uses generational collection, predictive algorithms, and workload-specific heuristics that decades of research went into. Forcing a Gen2 collection every five minutes undermines all of that.

This doesn't fix memory leaks, it masks them. If memory consumption is growing, forcing collections just delays the inevitable. The leak still exists; you've just added a periodic plaster that makes the real problem harder to diagnose.

The performance impact is real. Forced Gen2 collections pause all threads. Every five minutes, we're introducing stop-the-world latency spikes that can hit active requests.

The most damaging part: the metrics looked good in testing. Memory usage showed a sawtooth pattern, developers saw the "saved MB" logs, and everyone concluded it was working. No one questioned whether this was solving a real problem or understood that modern garbage collectors don't need manual intervention.

This is programming by ritual: performing actions because they appear authoritative, not because they solve the actual problem. The AI suggested it, the metrics looked reassuring, and critical thinking stopped there.

Similarly, I've seen simple email validation replaced with regex patterns spanning 200+ characters:

// Before: simple and maintainable
if (string.IsNullOrWhiteSpace(email) || !email.Contains("@"))
    return false;

// After: Copilot's "improvement"
var pattern = @"^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|""(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])quot;;
if (!Regex.IsMatch(email, pattern)) return false;

The justification: "Copilot provided it, so it must be more robust." No one could explain what the pattern does, what edge cases it handles, or why our use case, basic validation for this system, needed this complexity or how it married up to other areas of the system. The complexity was trusted because it looked professional.

Async Without Understanding

The misuse of async/await is everywhere. I've seen:

Synchronous work wrapped in Task.Run:

public async Task<User> GetUserAsync(int id)
{
    return await Task.Run(() => _users.FirstOrDefault(u => u.Id == id));
}

This doesn't make anything faster; it adds thread pool overhead to an already synchronous in-memory operation. The "Async" suffix and "await" keyword create the illusion of scalability.

Async methods that block immediately:

public async Task<Data> FetchDataAsync()
{
    var result = await _httpClient.GetStringAsync(url);
    return ProcessData(result.Result); // .Result blocks the thread
}

Over-parallelization without throttling:

var tasks = items.Select(async item => await ProcessItemAsync(item));
await Task.WhenAll(tasks);

When "items" contains 50,000 records, this spawns 50,000 concurrent operations, overwhelming database connections, API rate limits, and memory.

The common thread: "AI suggested async would make it scale better." The code looks modern and performant. Under load, it falls over. Async is a tool for I/O-bound operations with natural waiting - not a performance enhancer you sprinkle on synchronous code.

The Death of the Valuable Unit Test

Unit tests used to catch bugs. Now they often just confirm that code does what it currently does, whether that's correct or not.

The pattern: a developer writes (or AI generates) an implementation, then prompts: "Write unit tests for this class." The AI obliges:

public class DiscountCalculator
{
    public decimal Calculate(decimal price, string customerType)
    {
        if (customerType == "Premium")
            return price * 0.9m;
        return price;
    }
}

// AI-generated tests
[Test]
public void Calculate_WithPremiumCustomer_Returns90PercentOfPrice()
{
    var calculator = new DiscountCalculator();
    var result = calculator.Calculate(100m, "Premium");
    Assert.AreEqual(90m, result);
}

[Test]
public void Calculate_WithRegularCustomer_ReturnsFullPrice()
{
    var calculator = new DiscountCalculator();
    var result = calculator.Calculate(100m, "Regular");
    Assert.AreEqual(100m, result);
}

The tests pass. Coverage looks great. But no one asked: Is this correct?

What about:

Gold tier customers who should get 15% off?
Negative prices?
Null or empty customer types?
Case sensitivity: is "premium" the same as "Premium"?
What happens with "VIP" customers?

The AI wrote tests for what the code does, not what it should do. This is the fundamental failure: tests that verify current behaviour instead of verifying requirements.

I've seen this pattern everywhere:

Buggy validation logic:

public bool IsValidEmail(string email)
{
    return email.Contains("@");
}

// AI-generated test
[Test]
public void IsValidEmail_WithAtSymbol_ReturnsTrue()
{
    Assert.IsTrue(IsValidEmail("user@domain.com"));
}

The test passes, but "@@@@" would also pass. The implementation is wrong; the test confirms the wrongness.

Off-by-one errors:

public bool IsWorkingHour(int hour)
{
    return hour >= 9 && hour < 17;  // Bug: should be <= 17
}

// AI-generated test
[Test]
public void IsWorkingHour_At16_ReturnsTrue()
{
    Assert.IsTrue(IsWorkingHour(16));
}

No test for hour 17. The bug stays hidden because the test was generated from the implementation, not from the requirement "working hours are 9 AM to 5 PM."

The value of unit tests used to be in the thinking: understanding edge cases, questioning assumptions, discovering bugs before they ship. That process happened while writing the test. When you prompt AI with "test this class," you outsource the thinking. The AI can't question whether the implementation matches requirements because it doesn't know the requirements. It only knows the code.

The result: test suites with high coverage and low value. They catch regressions in current behaviour but never caught the original bugs. They give false confidence because green checkmarks don't mean correctness, they mean consistency with whatever was written first.

When Comments Lie

AI-generated comments present another subtle problem: they describe what code does, not why it exists.

// Calculates the total price with discount applied
public decimal CalculateTotalPrice(decimal price, decimal discountPercent)
{
    return price - (price * discountPercent / 100);
}

The comment is useless: it restates the method name. What we needed to know:

Why is discount a percentage here but a decimal multiplier in other methods?
Is this pre-tax or post-tax?
What happens with negative discounts (price increases)?
Why don't we use the shared DiscountCalculator?

When the implementation changes, comments rarely get updated:

// Validates email format using regex
public bool IsValidEmail(string email)
{
    return _emailService.ValidateAsync(email).Result;  // Now calls external service
}

The comment describes old behaviour. The code now calls an external service synchronously (another anti-pattern), but the comment fossilises the original approach. No one notices because both the code and comment were generated; neither was written with understanding.

A human writing this would have explained the decision: "We switched to the external service because it checks deliverability, not just format." That context is missing when AI generates both code and comments in one pass.

Security: Vulnerabilities With Confidence

AI generates code that looks secure but often introduces subtle vulnerabilities that pass review because "the AI wrote it."

I've seen SQL queries built like this:

public async Task<User> FindUserAsync(string username)
{
    var query = quot;SELECT * FROM Users WHERE Username = '{username}'";
    return await _connection.QueryFirstOrDefaultAsync<User>(query);
}

Classic SQL injection vulnerability. When challenged: "Copilot generated the data access layer." The developer trusted it without question.

Or input validation that looks thorough but isn't:

public IActionResult UpdateProfile(string bio)
{
    // AI-generated XSS "protection"
    bio = bio.Replace("<script>", "").Replace("</script>", "");
    
    _userService.UpdateBio(User.Id, bio);
    return Ok();
}

This blocks <script> tags but misses <img>, <iframe>, event handlers like onerror, and case variations like <ScRiPt>. The code looks like it's handling security, which is more dangerous than no validation at all: it creates false confidence.

I've also encountered API endpoints with no authorization checks:

[HttpGet("admin/users/{id}")]
public async Task<User> GetUserDetails(int id)
{
    return await _userRepository.GetByIdAsync(id);
}

The admin in the route suggests this should be protected, but there's no [Authorize] attribute, no role checking, nothing. When asked: "The AI scaffolded the controller based on the repository." No one verified that administrative endpoints were actually protected.

The pattern is consistent: AI generates code that handles the happy path and looks professional, but misses security considerations that experienced developers would catch immediately. The plausibility makes these vulnerabilities more likely to reach production.

Duplication as the New Default

Before AI-driven development, repeated patterns triggered conversations about shared libraries. In 2025, they triggered duplication. Here are three implementations I found across different services:

In service 1 which shall remain nameless:

public async Task<T> ExecuteWithRetry<T>(Func<Task<T>> operation)
{
    int attempts = 0;
    while (attempts < 3)
    {
        try
        {
            return await operation();
        }
        catch (HttpRequestException)
        {
            attempts++;
            if (attempts >= 3) throw;
            await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempts)));
        }
    }
    throw new InvalidOperationException("Max retries exceeded");
}

In service 2 which shall remain nameless:

public async Task<TResult> RetryOnFailure<TResult>(Func<Task<TResult>> action)
{
    for (int i = 0; i < 3; i++)
    {
        try
        {
            return await action();
        }
        catch (Exception ex) when (ex is HttpRequestException || ex is TimeoutException)
        {
            if (i == 2) throw;
            await Task.Delay((int)Math.Pow(2, i + 1) * 1000);
        }
    }
    return default;
}

In service 3 which shall remain nameless:

private async Task<T> TryWithBackoff<T>(Func<Task<T>> func)
{
    var maxAttempts = 3;
    for (var attempt = 1; attempt <= maxAttempts; attempt++)
    {
        try
        {
            return await func();
        }
        catch (HttpRequestException e)
        {
            if (attempt == maxAttempts) throw;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt));
            await Task.Delay(delay);
        }
    }
    throw new Exception("Retry failed");
}

A human sees this immediately: it's the same exponential backoff retry pattern written three times with minor variations in variable names, delay calculations, and exception handling. Each was generated independently when a developer needed retry logic. Each "works" in isolation.

AI works locally, not systemically. It doesn't see across repositories or recognise that this is the seventh variant of the same circuit breaker pattern. Engineers stop noticing duplication because each instance "works," and architectural drift compounds invisibly.

When one service needs to adjust retry behaviour - say, adding jitter to prevent thundering herds - that fix stays isolated. The pattern continues to proliferate, each copy drifting further from the others.

The C-Suite Blind Spot

One project I witnessed illustrates the danger: a PHP-to-.NET migration built with AI tooling from day one. It's behind schedule, plagued with defects, and overcomplicated with patterns like Mediator, CQRS, and a misused Entity Framework implementation. The team is debugging generated architecture instead of solving business problems.

The core issue: lack of understanding of both the original system and the target architecture. AI didn't accelerate delivery; it amplified misunderstanding and complexity.

Leadership needs to understand that AI doesn't automatically speed delivery. Without engineering maturity, domain knowledge, and architectural discipline, AI produces more undelivered, brittle systems, not fewer.

CI/CD: Where Debugging Goes to Die

I've started seeing deployment pipelines with shell scripts that span 300+ lines. Here's a representative sample:

#!/bin/bash
set -e

echo "Starting deployment process..."
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/$TIMESTAMP"
APP_DIR="/var/www/app"
CONFIG_FILE="$APP_DIR/appsettings.json"

if [ ! -d "$BACKUP_DIR" ]; then
    mkdir -p "$BACKUP_DIR"
    if [ $? -ne 0 ]; then
        echo "Failed to create backup directory"
        exit 1
    fi
fi

echo "Backing up current deployment..."
cp -r "$APP_DIR" "$BACKUP_DIR/"
if [ $? -ne 0 ]; then
    echo "Backup failed"
    exit 1
fi

echo "Stopping application..."
systemctl stop myapp
if [ $? -ne 0 ]; then
    echo "Failed to stop application"
    cp -r "$BACKUP_DIR/app" "$APP_DIR"
    systemctl start myapp
    exit 1
fi

echo "Checking database connection..."
DB_HOST=$(grep -oP '(?<=Server=)[^;]+' "$CONFIG_FILE")
DB_PORT=$(grep -oP '(?<=Port=)[^;]+' "$CONFIG_FILE")
timeout 5 bash -c "cat < /dev/null > /dev/tcp/$DB_HOST/$DB_PORT"
if [ $? -ne 0 ]; then
    echo "Database connection failed"
    systemctl start myapp
    exit 1
fi

# ... continues for 250+ more lines

This pattern repeats throughout: error checking after every command, manual rollback logic, brittle text parsing, and no way to test any of it without running the entire deployment.

When I ask why this isn't broken into testable functions or written in a language with proper error handling: "Copilot generated the deployment script based on our requirements."

A human would have immediately recognised this should be Python or Javascript -languages with:

Functions you can test in isolation
Proper exception handling instead of if [ $? -ne 0 ] after every command
Debuggers with breakpoints
Data structures instead of string parsing with grep and regex

Instead, we have monolithic bash scripts where:

You can't set breakpoints to debug failures
Every change requires a full deployment test
Logic is duplicated because there are no reusable functions
Error messages are generic strings that don't help diagnose issues
Rollback procedures are copy-pasted and drift out of sync

The AI generated what looked like a complete solution. No one questioned whether bash was the right tool. No one broke it into testable components. And now we're debugging deployment failures by adding echo statements and re-running 15-minute pipelines.

The Real Problem: Generation Without Reasoning

AI generates plausible-looking code. It doesn't understand systems, trade-offs, or the specific constraints of your data and domain. That plausibility hides architectural debt, unnecessary duplication, and subtle bugs that surface months later.

AI didn't remove the need for engineering judgement - it made its absence more expensive.

What 2026 Will Reveal

If AI availability becomes constrained, whether by cost, regulation or infrastructure, many teams will discover that their generated systems fail first because understanding is missing.

When the next major outage happens, on-call engineers will discover they can't explain the systems they're debugging. Stack traces will point to layers of abstraction no one understands. Logs will show errors in code no one can mentally trace through. The "temporary" fix will take hours because the engineer has to reverse-engineer decisions that were never made consciously in the first place.

The engineers who succeed will be those who can recognise duplication, rebuild abstractions deliberately, remove unnecessary layers, and reason about systems end-to-end without depending on tooling. Teams that lack this muscle memory will struggle to debug systems they never properly understood.

What We Must Relearn

2026 should bring a return to fundamentals:

Abstraction as a deliberate act, not an AI suggestion
Duplication as a signal, not a shortcut
Performance as a property to reason about, not guess at
Security as a requirement to verify, not assume
Code reviews as critical thinking exercises, not rubber stamps
AI as a tool, not an authority

The engineers who thrive will understand why the system exists, how it behaves under stress, and when to stop and think before accepting a suggestion. Code reviews must return to being conversations between engineers who understand the domain, the constraints, and the trade-offs, not just checkers of whether tests pass.

Final Thought

The most dangerous phrase in a 2025 code review was: "That's how the AI built it."

That is not a decision. It's an abdication. AI can generate code; it cannot replace understanding. And in 2026, that distinction will matter more than ever.

I still use AI tools daily. They're invaluable for boilerplate, exploration, and accelerating tasks I already understand. But I read every line. I question every suggestion. I reject complexity I can't explain. AI is a useful tool, but only in the hands of engineers who know when to say no.

The engineers who thrive won't be those who generate the most code. They'll be those who generate the most understanding.