Experience with AI coding


In the past year, myself and many other people who program for a living started using AI coding assistance. I’m just going to plainly discuss my experience, and ramble about some related issues.

This explanation is primarily for the benefit of people who do not code professionally, and thus have little idea what’s going on. If you code professionally, then you already know, and have formed your own opinion about it, which you are welcome to keep. But it’s worth noting that lots of different kinds of people code for a living. It’s not just software engineers, you know. I code on a daily basis but I’m a data scientist. The reader should be cautious about generalizations because what’s true of one profession may not be true of another.

Agentic AI

At work I use what’s called an “agentic” AI application. In practice, that means that the AI is embedded in the code editor. The AI can edit files and execute code. It’s using the same LLM technology as a chatbot, and you can freely choose which model to use, including ChatGPT, Claude, Gemini, and more.

Having the AI embedded in the code editor streamlines the process significantly compared to just having ChatGPT open in a separate browser window. If I’m just talking to ChatGPT in my browser, I might describe what I’m doing, maybe copy a relevant snippet of my code. It might suggest some code to use, which I can then paste into my editor. But with an agentic application, the LLM can just look through my existing code, and make a bunch of suggestions scattered across a dozen locations.

Here’s how I typically approach the AI. I describe a new feature that I want, as detailed as possible. This usually includes several links to specific documents, such as the files where I want the code to be located, files containing similar code that I want to emulate, and so on. Then I review the changes, approving some and rejecting others. I may ask for further adjustments if I want it to take another approach. Then I test the code, show the AI the error logs, and repeat until the feature is finished.

I’ve seen people skip some of these steps, and sometimes I skip some steps too. For example, instead of reviewing a feature piece by piece, you could just provide the AI with a whole design document, and let it chug. I am not convinced that you’ll get great results this way, but sometimes you only need it to be “good enough”. Or maybe the reason I don’t like that approach is because I’m not as good at outlining exactly what I want from the start.

There are also uses that don’t generate code at all. You can ask it general questions like: “My colleague wrote this code, what does it do?” “Here’s my plan for this project, do you see anything I’ve missed?” “What are the advantages and disadvantages of this design decision?”

Is it effective?

When it comes to using AI coding assistance, we’re pretty much immediately getting feedback on whether it works. People on the internet can write all the thinkpieces they want, but at the end of the day, either the software works, or it doesn’t. Programmers have to have open mind about it because for any specific task, perhaps the AI is surprisingly effective—or surprisingly ineffective. The rest of y’all can confidently spout opinions about AI coding, because you’re not actually doing it, so you never risk the humility of learning that you were wrong.

In my experience, agentic coding is fine. A lot of the time, it just works, and works very fast and clean. Of course, it doesn’t do everything right. Plenty of course correction is required. It’s often unaware of the larger context of my code. And some problems, it just doesn’t know how to solve, so it will talk around in circles proposing solutions that don’t work.  When the AI starts going in circles, I learned to stop wasting time asking it questions, and fall back to more traditional methods.

What about all the invisible ways that AI can hurt the code? For example, there was that study that found that when software engineers use AI, they think they’re working 20% faster, but they’re actually working 20% slower. This is invisible to the typical programmer, because there isn’t an easy way to compare how long you would have taken if you had approached the work differently. This is a legitimate concern, but I don’t put much stock in the specific result. The study is looking at a small sample of software developers, and I’m not even in that category, so how can I know whether it applies to me? This study obviously isn’t the last word; I easily found studies with different findings, and there are surely even more ongoing. Researchers need more time to cook.

There’s also the concept of “code debt”. Code debt is what you call it when you program faster, at the cost of future programmers’ time. For example, the code is buggy, or it’s hard to read, or it doesn’t leave room for expansion. Code debt is a universal problem across all programming. In a large organization, the problem is that programming speed is measurable, while code debt is not. So even if you want to minimize code debt, all incentives push in opposite direction.

Does AI coding make the code debt better or worse? Unclear. It pulls in both directions. For example, one thing it’s really good at is writing unit tests. This is slow and tedious work used to test for bugs. They’re common among many software engineers, although in my line of work we hardly ever use them! But with AI, it’s so easy. And this may result in less buggy code.

On the other hand, if the code is AI generated, you may have less of an understanding of what it’s doing. Therefore, you have less capability to modify it, or to explain to other programmers how to use it.

But really, code debt is an organizational problem. If an organization wanted to fight code debt, they could spend resources on it. If they don’t spend resources on it, how much do they really care? If they don’t care, why should we?

Guardrails

What’s to stop the AI from doing something destructive, like deleting your whole database? It’s worth noting that this is not actually a new problem. Humans do it too. I have an anecdote from when I first started.  One of my colleagues accidentally deleted the team’s data. And then the next day, I did it again! We were both following a tutorial, and at the end of the tutorial there was an overly broad command to clean up the workspace. There hadn’t been much important lost, but after that we started backing that data up.

Every software developer knows that the way to address these failures is not to magically write flawless code. The solution is to implement multiple layers of controls and failsafes. When you hear about some catastrophic failure where an AI deletes a database, that isn’t just an AI failure, it’s multiple layers of failure that you’re not hearing about.

There are a lot of standard failsafes that long predate AI. Like, none of it is running on my computer, it’s all running on a computer on the cloud. Every project is regularly backed up to Github. All our permissions are carefully curated by a dedicated team to mitigate risk of accidents.

An agentic application ought to have additional controls. So, there’s a very long list of rules about what the AI can and can’t do. There’s a long list of files it can and can’t access. And when it does access files, that data does not get used to train future models.

At least, that’s what the AI providers claim, and of course they could be lying or overselling. But that’s not really my responsibility nor my concern, since the risk is to the company not to myself. They pay infosec experts to think about it.

Ethics

Although the ethics of AI is a hot topic, almost all the attention goes towards AI art. If people think about AI coding at all, it’s often as an afterthought, relying on generalizations from art into programming.

People also often conflate the question of ethics with the question of efficacy. People say, AI coding is wrong because the code it generates is slop. Okay, but slop isn’t unethical?

Suppose that AI coding results in worse outcomes. Maybe that study is right, and programmers are 20% slower when using AI. Maybe it also results in additional code debt. So, all these tech companies that are purchasing enterprise subscriptions to agentic applications are basically shooting themselves in the foot. So… what? Why are we crying for tech companies? I get that people are worried about the enshittification of social media. But let’s be real, what’s good for Facebook isn’t what’s good for society, so if Facebook makes a bad investment, then boo hoo for them. I work for a tech company, my wellbeing is pretty directly tied my employer’s wellbeing, but there’s got to be a separation, and even more so for mere users of social media.

Or take the question of jobs. To be quite honest, I don’t think this AI technology is good for our jobs. However, my belief is premised on the efficacy of AI coding. If AI is not good at coding, then I’m not worried about our jobs, at least not in the long run. (In the short run, I’m more worried about Trump’s economic policies, I think that’s actually what’s killing the job market right now.)

One of the contentions about AI art is that the AI is just copying stuff from around the internet and pasting it into a Frankenstein’s monster. That’s not really how generative AI works. But it kind of is how traditional programming works. Traditional programming literally involves copying a bunch of code around the internet and pasting it into a Frankenstein’s Monster. Everyone who programs knows this. It’s part of the day-to-day workflow, and it’s also built into the foundations of how coding even works. We put code into packages, and use package management software to make sure we have all the packages we need. And there’s a whole category of bugs where some of the packages aren’t the right version, or the versions are incompatible, etc. That’s just programming life.

So, arguing that plagiarism is an issue for AI coding is much more of an uphill battle than it is for AI art.

What about the environmental impact? I’m not going to rehash the discussion about environmental impact, and I don’t really have the relevant expertise anyways. My two cents: the environmental impact comes entirely from computation, and everything that programmers do is already computational. As a data scientist, my job is all about processing large amounts of data hosted in data centers.

The reason to be concerned about the environmental impact of computation is that it’s an “externality”. The cost isn’t born by the corporation, but by the world at large. However, the cost isn’t entirely external. Corporations are explicitly billed for computation time. That’s how Amazon and other cloud computation centers make their money. So corporations would already like to reduce computation time if they could. It’s just a matter of not prioritizing it. You could get them to prioritize it more by implementing a carbon tax policy, although that may also be bad for jobs. That’s my favored solution.

Conclusion

This was a fairly positive post about AI coding. It just seems pretty effective in my experience. It’s impossible to ignore that the AI gets a lot of stuff wrong, and yet it still seems useful. I acknowledge and am concerned about the possibility that it is less effective than it appears, but I won’t believe it on the basis of a single study.

This seems to be a common opinion among professional programmers I know. But some people have more negative or positive opinions on the efficacy than others. And people who are not programmers are left to pick and choose whichever opinions they want to believe. Personally, I believe both the positive and negative experiences. It could be that it’s good at some things and bad at others.

Comments

  1. Dunc says

    I don’t really have anything to add to the main thrust of the post, as the truth is I’m not actually writing that much code these days, and what I am writing isn’t particularly amenable to AI assistance, at least not with the tools I currently have access to, but there’s something I keep seeing in these discussions that I would like to pick up on:

    For example, one thing it’s really good at is writing unit tests. This is slow and tedious work used to test for bugs. They’re common among many software engineers, although in my line of work we hardly ever use them! But with AI, it’s so easy. And this may result in less buggy code.

    This is a common attitude to unit testing, and definitely one that I started out with (as I think everybody does), but as I grew a programmer, I came to the opinion that writing unit tests – and more importantly, really thinking hard about unit tests – was one of the most important disciplines for a good software engineer. I’m not necessarily talking full-on Test-Driven Development, but I definitely found that really thinking about testability made for much better design, and really thinking about specific test cases shone light on areas where the functionality or specifications weren’t really thought through enough. Writing tests shouldn’t be a tedious chore that you do at the end, it should be a core part of the process. If anything, I think I’d prefer to think hard about the tests, and then let the AI write the code to satisfy them.

    I’ve seen far too many badly designed unit test suites that technically achieved good code coverage, but didn’t really test anything important, because they’d obviously been written with the mindset of “this is my code, now how do I write something that looks like a unit test that achieves my code coverage target?” rather than “this is what I want to prove my code does, so how should I write it?”. So yeah, maybe not full-on TDD, but certainly more of a test-first mindset…

    I’m currently broadly sceptical of pretty much all claims about productivity (pro and con), because one thing I’ve learned over my time in the industry is that programmers are absolutely terrible at either estimating how long things will take or tracking how long something actually took them. And then there’s the thing that a programmer is usually the sort of person who’ll spend a day writing code to do a one-off task that they could have done by hand in a couple of hours, and call that a win.

    I do have massive concerns about code quality, having lived through several previous waves of automation and had to clean up the resulting messes… The whole premise seems to be that we need to write more code faster and not worry about the quality too much, whereas I tend to take the view that we need to write less code, but of much better quality. The problem with being able to generate a lot of code quickly that basically works is that it’s much easier to generate bad (but working!) code than good code, and then you’ve got a lot of bad code that you need to live with – and probably for much longer than you think. Maybe I have a different attitude here because I’ve stayed in one place for a long time, and have seen some really big, complicated projects right through their entire lifecycle, whereas a lot of programmers never have to deal with the long-term consequences of their earlier choices. (That’s an answer to your question about “if they don’t care [about code debt], then why should we?” by the way – because there’s a decent chance we’re going to have to fix our own mistakes one day.)

    I keep coming back to something Verity Stobb wrote once, talking about whatever the latest hot process of the time was (I’ll have to paraphrase as I can’t actually find it): “Process obsession is an attempt to find a mechanical substitute for thought – but programming is thought, and can brook no substitute”. If we’re only using AI as an aid to thought, all well and good – but if we’re trying to use AI instead of thought, then I see trouble ahead.

    OK, that turned out longer than I had planned…

  2. says

    I want to highlight the point that it isn’t only software developers who program, because I don’t disagree with your points, but a lot of that doesn’t obviously (or obviously doesn’t) apply to my work. We’re basically maintaining an assembly line for bespoke data products. We have tools that we reuse, but we’re always rearranging them and rebuilding them as needed in the current moment.

    For example, I may need to write a sql query to pull data. This is extremely project specific. And then even within the same project something changed upstream so now I need to rewrite it again.

    There’s value in having more durable code, but not for everything. And the human written code has followed a pretty low standard for durability, which may not be so hard for AI to surpass.

  3. Dunc says

    Yeah, those are very good and valid points – “writing code” or even “software engineering” covers a diverse range of disciplines and specialisms, and what works in one area isn’t necessarily applicable to others.

  4. says

    this is the one thing i’d say on the subject – ai coding is huge for total non-programmers, like indie game devs with nothing but self education. if you don’t understand why the code it produces does what it does, you can ask it, and further your education. i’ve seen this work, in the hands of a person smart enough to clock errors and adjust for them.

  5. says

    “And some problems, it just doesn’t know how to solve, so it will talk around in circles proposing solutions that don’t work. When the AI starts going in circles, I learned to stop wasting time asking it questions, and fall back to more traditional methods.”

    Yeah, I get so annoyed with deepseek for this sometimes, the glib and cheerful way it keeps saying things that are just wrong, if it doesn’t know how to solve the problem. Really we should take the AI output as just a suggestion, rather than The Answer, and we decide which parts to use and which parts to ignore.

    “Traditional programming literally involves copying a bunch of code around the internet and pasting it into a Frankenstein’s Monster.” Love the use of “literally” here!

  6. says

    @Bebe Melange,

    I’ve also used AI assistance in a gamedev context (though not agentic). It’s quite useful since I had to learn a new programming language for that and I wasn’t familiar with common patterns. Also helped me debug issues specific to Steam’s backend. But it has a few weaknesses–it’s biased towards programming everything with text rather than using the game engine’s UI. And, it often makes suggestions based on the wrong version of the game engine.

    There’s an open question whether this could be used by a game dev who doesn’t know how to program at all. In game dev circles, sentiment is largely negative–not that you couldn’t or shouldn’t, they just don’t think the results will be good. But, it could depend on the approach and project. I have more confidence in someone trying to use the AI as an education tool rather than someone who just wants the AI to vibe code the whole thing for them.

  7. Hj Hornbeck says

    Siggy @7:

    I have more confidence in someone trying to use the AI as an education tool rather than someone who just wants the AI to vibe code the whole thing for them.

    As an educator of coding, having (possibly) seen AI used as an education tool, I have more confidence in someone using AI for vibe coding than for education.

    I didn’t drop by for that drive-by, though. I’ve got very different opinions on how well LLMs code, but those opinions were formed without actually using an LLM to write code. Even the code example mentioned in my blog post wasn’t generated by interactively asking an AI assistant to code it with me, but just a fluke due to Claude misinterpreting my instructions. It’s entirely possible I’d get better output if I actually bothered to set up and learn how to use an AI assistant, and while that can be a form of The Courtier’s Reply it isn’t always.

    My blog post does lean a bit on that METR study (so thanks for the alternative citations!), but it really places most of its weight on someone else’s blog post. That remains the most convincing argument I’ve seen to date, and even then it doesn’t show AI coding assistants are harmful. I’m not interested in arguing, though, but instead in what kicked off that blog post:

    I was an early adopter of AI coding and a fan until maybe two months ago, when I read the METR study and suddenly got serious doubts. … So, I started testing my own productivity using a modified methodology from that study. I’d take a task and I’d estimate how long it would take to code if I were doing it by hand, and then I’d flip a coin, heads I’d use AI, and tails I’d just do it myself. Then I’d record when I started and when I ended. … I ran that for six weeks, recording all that data, and do you know what I discovered?

    I discovered that the data isn’t statistically significant at any meaningful level. … That lack of differentiation between the groups is really interesting though. Yes, it’s a limited sample and could be chance, but also so far AI appears to slow me down by a median of 21%, exactly in line with the METR study. I can say definitively that I’m not seeing any massive increase in speed (i.e., 2x) using AI coding tools.

    Have you done any comparisons to using AI assistants against not using them? The real lesson I draw from the METR study is not that AI coding agents slow you down by 20%, it’s that the coding experts using them thought they were 20% faster when in fact they were 20% slower. It’s possible you’re making the same error, but it’s also possible your experience and coding projects are better suited to AI coding agents, and that your workflow is made better by including them.

    As an academic and educator in the field, that is of far more interest to me than any debate.

  8. says

    @HJ,
    TBH, I didn’t read your article when it was published, because the title is hyperbolic and I’m allergic to clickbait. Even assuming AI coding is 20% slower does not mean that it is unable to code.

    I’d seen the other blogpost by Mike Judge, though. It’s interesting. I think it would be difficult to set up that experiment in my line of work.

    As that author pointed out, we should see an uptick in the number of new apps for mobile phones, and games published to Steam.

    Oh, people have been talking about the uptick in games for years and years, you just haven’t been paying attention. Nothing to do with AI though. I think Mike Judge is overestimating how obvious the effect of AI should be.

    BTW, I don’t vouch for any of the studies on the effectiveness of AI coding. They’re literally just the top results from Google Scholar. Some of them looked limited, or like they might already be out of date.

  9. Hj Hornbeck says

    Siggy @9:

    TBH, I didn’t read your article when it was published, because the title is hyperbolic and I’m allergic to clickbait. Even assuming AI coding is 20% slower does not mean that it is unable to code.

    Oh no, the headline is accurate. If I may quote myself:

    A computer program is usually the work of multiple authors, and the requirements it has to satisfy often change over time. If the other author put a lot of thought into the higher-level aspects of the code, it becomes easier to add or modify features, or to at least spot when a re-write would be needed. If they’ve instead tossed off a plate of spaghetti code, you waste a tonne of time merely figuring out what’s going on, and often it’s less effort to remove that code and rewrite a cleaner version from scratch. The low-level implementation code flows from a high-level understanding of the problem, and without the latter for guidance the former often becomes an obstacle.

    But this implies that if Claude and other LLMs lack a high-level understanding of what a program is required to do, then they can’t handle the lower-level busywork on your behalf. The code they generate would be worthless, even if it works. At best they’re merely a faster search engine for code, copy-pasting in algorithms other people have written and massaging the variables to (hopefully) mesh with the existing codebase. But at worst, you get exactly what those developers in the METR study found: code that superficially looks fine, but on deeper inspection has serious flaws that demand more effort to correct than code written without LLM assistance.

    If the bit about search engines seems to come out of nowhere, that’s because in prior posts I make the case that LLMs are heavily dependent on memorization. Since it’s almost certain most of these general-purpose LLMs are trained on scraped web pages, AI coding agents are essentially just copy-paste-massaging snippets from Stack Overflow or Github gists. As my last sentence puts it, “if an LLM is no more useful than a search engine at generating code, it’s fair to say it cannot code.”

    I think it would be difficult to set up that experiment in my line of work.

    Dang it! Oh well, you don’t know until you ask.

  10. says

    @HJ,
    Sorry, I continue to find your post hyperbolic. It’s very difficult to take seriously. I respect you, but I have a very negative opinion of your argument, to the extent that I’m unwilling to engage with it.

  11. Dunc says

    The real lesson I draw from the METR study is not that AI coding agents slow you down by 20%, it’s that the coding experts using them thought they were 20% faster when in fact they were 20% slower.

    This comes back to something I said earlier: “programmers are absolutely terrible at either estimating how long things will take or tracking how long something actually took them”. It also reflects the fundamental issues with self-experimentation, which is why we don’t accept “I took x and felt much better!” as valid evidence for the effectiveness of proposed medical interventions.

    A couple of other points I’d take issue with from the OP, having thought about it a bit more…

    In a large organization, the problem is that programming speed is measurable, while code debt is not.

    I’d argue that “measuring programming speed” is actually one of the big unsolved questions in software engineering, at least if you want to do it in any meaningful way, and we definitely do not have any good solutions so far. You can produce various metrics that might look like they measure something, but they all have really fundamental problems and are trivially gameable. This is one of the reasons why people keep inventing new methodologies every few years… Even at the highest and coarsest level, it’s surprisingly difficult to properly answer seemingly trivial questions like “how big is this project?” and “how long did it take?”

    Traditional programming literally involves copying a bunch of code around the internet and pasting it into a Frankenstein’s Monster. Everyone who programs knows this.

    Look, I don’t want to sound nasty here, but… that’s how bad programmers code. The examples you find on Stack Overflow or in the vendor documentation are never fit for actual production use – they can be useful for illustrating an approach for some specific element of some problem, but that’s as far as it goes. Yes, obvious I know a huge amount of this goes on, but that’s a big part of the reason why so much of the software we have is such absolute shit, and is not a state of affairs we should just accept.

  12. Dunc says

    Oh, I maybe should also have mentioned… Just the last couple of days, I’ve been doing some work in an environment that does have agentic AI assistance available, and I have to say it’s been very useful. I haven’t tried using it for anything actually difficult yet, but it’s done a great job of rewriting some really badly structured SQL into something much more reasonable, which is exactly the sort of tedious, error-prone job I want an AI to take off my hands.

  13. says

    @dunc #12,

    Traditional programming literally involves copying a bunch of code around the internet and pasting it into a Frankenstein’s Monster. Everyone who programs knows this.

    Look, I don’t want to sound nasty here, but… that’s how bad programmers code.

    Yeah, well it’s a common joke isn’t it, that programming is just copy/pasting? It’s a bit glib though. How often do I actually do a google search, and how often does that result in copy/pasting a code snippet? How often is the code snippet longer than one line, and how much of it do I change immediately after copy/pasting? Reflecting on my work, more than anything, I’m copy/pasting my own code, and that of colleagues.

    Anyway, in the context of the OP, this is part of my argument about ethics. So, I don’t need to argue copy/pasting code from the internet is universal practice, or even that it’s good practice. Just that it’s normalized, and plagiarism is far away from the top concern. And that’s why all the arguments about AI art don’t really apply here.

    I’d argue that “measuring programming speed” is actually one of the big unsolved questions in software engineering, at least if you want to do it in any meaningful way, and we definitely do not have any good solutions so far.

    Yeah, because nobody knows how long a task “should” take, right? And it’s hard to say when a task begins and ends. That’s a major problem with the METR study, and also why I think it would be really difficult to personally replicate as HJ suggests doing.

    But where you can “see” my programming speed by noticing how long it takes to complete a project, nobody can really see my code debt. And like, I’m not paid to reduce code debt. Proposals that would reduce code debt get shot down all the time!

  14. Dunc says

    Anyway, in the context of the OP, this is part of my argument about ethics.

    Ah, sorry, I missed that context, and only really skimmed that section… The whole question of plagiarism in code is one I’m not really that bothered about, generally. While I’m not a FOSS fundamentalist, I’m more inclined towards the view that we should share good code, rather than hoard it. I’ve occasionally published stuff for that very reason, and I just hope somebody found it helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *