1. 82
    1. 81

      Having reviewed PR’s that were cowritten by ChatGPT and having maintained code that was cowritten by ChatGPT, I’ve found that AI-generated code tends to be less clear, less performant, and less accurate than code written by hand. What I’ve noticed is that it’s the developers that seem to have the most shallow understanding of the projects, the technologies, and the needs of the org that are using ChatGPT the most enthusiastically. At the same time, these developers also tend to crush Jira tickets the fastest, and are generally liked by people outside of engineering.

      These findings strike me as obvious, but at the same time, I’m really excited about this research! Research of this kind is very useful for communicating to non-engineers the impact that ChatGPT has on the effectiveness of our engineering department. I’ll definitely be sharing this with my colleagues.

      1. 23

        At the same time, these developers also tend to crush Jira tickets the fastest, and are generally liked by people outside of engineering.

        Completely true, though not new to AI. Same phenomenon has always existed in the form of “fast, sloppy” vs “slower, good” and the same incentives made “fast, sloppy” more popular among management. Throw out a few phrases like “gets the job done” and describe sloppy as “balancing delivery time and quality” or “a pragmatic approach” and you’re golden.

        1. 28

          John Ousterhout calls such menaces Tactical Tornadoes.

          Put in the wrong place they’re just the worst: they crank out kinda working code faster than everyone else can fix it, they look very productive but actually have negative net productivity, their colleagues look bad in comparison and are more likely to be driven out of the team… keep them long enough and cancelling what project they’re working on is a mercy on everyone else involved.

          They’re great at quick & dirty prototypes though. And if you take over early enough, their under-engineering approach can actually be simpler and more maintainable than many attempts at future-proofing from the start.

          1. 6

            They’re great at quick & dirty prototypes though.

            I’ve noticed this in myself. I struggle with prototypes, beyond the most rudimentary proof-of-concept stuff. I blame this partly on the fact that I’ve never actually seen a prototype thrown away, so I assume my code will go to production. But there’s more to it than that. I’m actually just kind of bad at it, and I’ve come to appreciate devs who can knock out a prototype really quickly. We’d all be better off, though, if managers would more reliably assign the right people to the right jobs :-)

        2. 15

          The thing is, I have shallow understanding of lots and lots of parts of my work. Build tools, Bash esoterica, Jenkins Groovy, Rake, Grunt, Docker, soon enough Kubernetes as well. I’m not shallow - I have true expert knowledge of my core competencies - but the domain of modern programming is so big that everybody in enterprise space is shallow in most areas of their work. That’s where the ChatGPT value proposition really shines; to save half an hour of increasingly frantic googling for tasks where quality genuinely does not matter more than succeeding at all, and at any rate nearly all the effort goes into even just discovering a workable initial approach. These things are not cognitively difficult, they’re simply obscure, and that’s GPT’s strength.

          1. 35

            hmm, I’ve definitely found that people who use ChatGPT are more cavalier about introducing tools that they and other people don’t know and haven’t used in production, because they don’t expect to understand their work. E.g., I was in a room where Groovy specifically was being introduced when people were already succeeding with BitBucket pipelines. When I said “yeah I think it’s beyond our learning budget to add this”, the response was “you don’t have to understand it, you just have to use it; just ask ChatGPT”. It wouldn’t surprise me if overall, the existence of ChatGPT increases the number of tools and technologies you have to deal with.

            1. 1

              I’m telling people to use ChatGPT too, but it’s not like we wouldn’t use these tools if we didn’t have ChatGPT, because ChatGPT didn’t exist when all this stuff was introduced. “Learning budget” is a nice idea but I’ve yet to see it prevent the introduction of a new technology.

              1. 16

                I’ve seen “let’s not add that tool, it’s too much to learn for what we’ll get out of it” used as a line of reasoning very frequently by every high-performing team that I’ve been on. The reluctance to adopt this line of reasoning is something I’ve encountered on every single low-performing team I’ve been on, and I think the relationship is causal. “Learning budget” is a parallel concept to “innovation tokens”: https://mcfunley.com/choose-boring-technology

            2. 27

              tasks where quality genuinely does not matter

              I mean, as long as you’re honest about it, sure. But how often do you encounter such tasks professionally?

              And how often will you be pressured into treating it like a task where quality doesn’t matter when actually it does, because someone wants to use an LLM as a replacement for understanding the problem?

              1. 9

                These things are not cognitively difficult, they’re simply obscure, and that’s GPT’s strength.

                My experience - sample size of one, so YMMV - is that GPT was most prone to hallucinating the obscure stuff.

                1. 3

                  So maybe that long list of stuff is part of the problem, not part of the solution? You need one build system, and you can learn that well. You don’t need a dedicated CI system. It’s just a program doing asynchronous work. You probably are already writing other tasks like that. Just write one to run your compiler and tests. You probably don’t need Docker, just a static build and a policy on how your binary and config files will be laid out. And you probably don’t need k8s either unless your system is big enough and bringing in enough money where you can have a dedicated ops team for it.

                  1. 1

                    “We have 40 microservices using three different build systems. We should all unify it into a simple system to reduce confusion.”

                    A year later: “We have 50 microservices using four different build systems…”

                    That said: Docker and Jenkins are actually enormous timesavers. These tools have paid off in terms of effort to benefit a dozen times over. They’re not as good as they should be, but they’re damn good.

                    1. 2

                      Oh, certainly, if you already have a lot of these things, you’re not getting off them any time soon.

                      Docker probably pays for itself in developer environment setup. In production I feel less sanguine. Jenkins, on the other hand…maybe I’ve spent too much time fighting with it and other dedicated CI systems to have much sympathy for them.

                2. 4

                  My problem is: I have a pretty good understanding of most of the things I do professionally, and I’m decent at googling or finding answers in documentation. I really don’t feel the need to ask AI unless I 1) don’t know it and 2) a quick look at the docs or google proves not fruitful. However, those things are usually obscure enough that LLMs are completely useless. So in a sense, a problem is either so easy that it’s easier to do it myself than to ask ChatGPT, or so difficult that ChatGPT doesn’t know either.

                  For the difficult problems, finding people who know more than me on IRC and asking them is infinitely more valuable than any LLM.

                  1. 1

                    I found the opposite. People who refuse to use ChatGPT and related technologies often don’t have any rational reasons for it, and tend to be the most sluggish these days. It’s easy to overlook the boost they give you because after a while you can only notice it when somebody lags behind and haven’t started using them yet.

                    Of course, there are people who type a short and boring prompt, then copy-paste whatever it spits out without reading it, and think it’s going to automate their copywriting and other soulless tasks with the minimum of effort. Utilising AI is a skill you need to acquire, like anything else. Prompt engineering, knowing what to ask for, which parts to use, how to use it the most effectively are crucial.

                    1. 42

                      did you read the whitepaper

                      1. 5

                        Utilising AI is a skill you need to acquire, like anything else.

                        Where’s the course? I don’t want to waste my time figuring this out on my own, and risk falling into various correctness and copyright pitfalls in the process.

                        1. 1

                          There’s no course. You need to figure it out for yourself.

                    2. 20

                      I have found ChatGPT to be absolutely useless when it comes to anything I have deep expertise in. It’s not just that it is wrong, it is useless, it doesn’t understand the complexity and the subtlety of anything I am doing or asking of it.

                      On the other hand, I managed to write some Java code for a demo with ChatGPT, having never used Java before. It took me 15 minutes with ChatGPT, and it would have probably taken me hours to do it without it.

                      It is absolutely clear that the Java code I (it) wrote is absolute garbage. It didn’t matter in this case, it was just for a PoC, but in general it matters a lot.

                      It seems to me that ChatGPT can’t help you at all if you know what you are doing, but it can “help” if you don’t know what you are doing. The fact that so many people think it helps them suggests to me that people do not know, in fact, what they are doing. And that they can now churn poor quality work even faster scares me.

                      1. 3

                        I don’t do Rust but there was a post the other day about using explicit taking and returning instead of borrowing, so I went back and forth with Claude to write a stub that worked like that. I don’t know if what I ended up with was idiomatic or not. I ended up doing a lot of simplifying of the code it spat out, which was uncompilable anyway. It helped that I could keep compiling the code and pasting in error messages to refine it, and I wasn’t using unsafe, so I trust that if what I wrote compiled it was probably correct for such a simple case.

                        On the one hand, as a learner, I like that I can get a stub to start playing around with from an LLM versus having to Google to try to figure out which specific library or function I need. On the other hand, based on other code I’ve seen in languages I know better, I worry that I’m ending up with subtly bad or broken code and just not knowing it.

                      2. 17

                        Lots of people are focused on AI’s ability to increase quantity of code, rather than quality. I’ve found that, rather than depending on AI to write code for me, pasting some that I just wrote into a chatbot with a prompt like “any way this could be improved? do not rewrite it for me.” is much more useful.

                        Occasionally it’ll point out ways that my code could be made clearer and help me break out of the mindset you get into once you’ve been staring at the same thing for hours.

                        1. 1

                          That’s just one of the ways it can be used. Of course if you mindlessly accept anything it spits out and don’t critically evaluate whether it’s helpful it’s going to lower quality, but I don’t think that programmers are the kind of people who use their tools this way.

                          The metrics they use in the article can also be interpreted as an overall increase in the speed of code editing. Some of the trends they noticed don’t necessarily have to reflect what they assume they’re showing.

                          Spammers might be muddling the waters, and looking at all the public code might not be very helpful when assessing whether AI tools can be used productively when you put some thought into it.

                          1. 20

                            The problem I’ve found with Copilot is that it produces code that looks right. It gets comparisons the wrong way around, flips the order of variables in parameters, and does all of the things that I’d do in an underhanded C competition entry: things that look right on the first or second reading but contain subtle bugs. I have now been using it for a couple of months and I think I have still spent more time tracking down bugs that were caused by thinking I understood what it had done but missing that it was almost right than I have saved from being able to enter code faster.

                            1. 2

                              I would blame your test suite. </s>

                              Kidding, but not kidding: a good test suite would notice all, or at least most, of those underhanded LLM mistakes. So you know, fairly early on, that something was broken, and as you track it down you can remember where your code came from and blame the tool’s output.

                              But if you instead did some “normal” quick manual testing & minimal unit tests (no fancy property based testing!), you would have “avoided” those bugs, and by the time they’re found nobody would think of blaming your tools. Heck, even if they could blame you, they’d easily forgive you since you’re cranking out so much more code than grumpy laggards like me.

                              1. 9

                                Seems a bit circular to me - someone’s gotta write the test suite, and there’s no reason as far as I know to expect the LLM to be any better at that than at writing the code. There’s an argument to be made for test-driven design or whatever it’s called* but I think that’s a completely separate thing.

                                * I am not a professional developer so it’s always possible that I’m using terminology wrong. This also means that I might misunderstand where the bottlenecks are in a real dev’s workflow; I know LLMs don’t help me but I am not representative anyway.

                                1. 4

                                  I wasn’t thinking of writing the test suite with the LLM, just writing it with various degrees of conscientiousness (TDD or no TDD):

                                  • Good test suite: LLM looks bad because we can blame it. But it’s probably not that bad, since we do catch its errors.
                                  • Sloppy test suite: LLM looks good because we catch bugs later, when we no longer know to blame the tool. All we see is that we’re producing more code. But it’s probably the worst, because catching bugs later is actually more expensive.

                                  I might misunderstand where the bottlenecks are in a real dev’s workflow

                                  Common wisdom is, it’s not typing. Typing represent maybe 10% of all work, and that feels like an upper bound for someone who types fairly slowly like me (I just measured at 45 WPM in a random online test). Much more important is reading code, reading API docs, reading (and sometimes writing) requirements, thinking of design… there’s an argument to be made that LLMs may help us with more than typing, but personally I wouldn’t trust them with anything other than telling me about some API I didn’t know about.

                                2. 2

                                  Talking about the test cases, I’ve been asking ChatGPT to generate unit test cases (with pytest). While it worked 90 % of the times well, I’ve had to spent a fair bit of time debugging whether it is my code that is wrong or it’s that ChatGPT generated test cases are wrong. It’s usually the later case.

                                3. 1

                                  This has also been my experience of using ChatGPT 3.5 for coding. Maybe 4 is better.

                                4. 2

                                  but I don’t think that programmers are the kind of people who use their tools this way.

                                  You’ve been very lucky with the people you work with to be able to have this illusion.

                              2. 34

                                Who is shocked by this?

                                1. 34

                                  I have never been less shocked, but it is still good to actually do the study. It’s too easy to fall back to feeling otherwise – for example, all the CEOs that want return to office because they “have a feeling it’s better.”

                                  1. 25

                                    And much like how countless companies are ramming RTO down everyone’s throats in spite of data proving they’re killing their companies by doing so, we will never hear the end of this “AI” nonsense until they’ve unemployed a huge chunk of the industry and forced wages significantly down on the rest of it, no matter what data is shown. Which was, IMO, a goal of these layoffs and the “LLM for code” craze all along.

                                    Companies are “data driven” until the data contradicts leadership’s vibes. We’ve seen this so many countless times in the past decade.

                                    1. 18

                                      Companies are “data driven” until the data contradicts leadership’s vibes. We’ve seen this so many countless times in the past decade.

                                      As a data engineer, I have talked to analysts and data scientists who had to hide or drop findings when they disagree with leadership’s gut feelings. It’s a pretty nasty place to be.

                                      1. 10

                                        Ellen Tenenbaum and Aaron Wildavsky published “Why Policies Control Data and Data Cannot Determine Policies” forty years ago and it remains as correct as it ever was.

                                  2. 8

                                    Having been using Copilot for around 18 months now, I find that the code it writes is a reflection of the code you already have. It’s brilliant for well-documented, well-tested, clear codebases that use idiomatic code and good practices. At times it can write 30-50 lines of code at once, way above the quality I can expect from the average (and often above average) open source contributor, within seconds. Most of the time it writes code I would have written anyway, sometimes noticing problems I missed.

                                    Github is not a tool that either writes all your code, or none. You use it to augment your own coding. It’s also great for refactoring, editing, navigating unfamiliar codebases faster, etc.

                                    When pair programming with my team, I found that I no longer have patience to pair with people who don’t use Copilot, because they need a lot of time to manually write trivial or obvious code that Copilot would have automated. Suddenly having to type a block after block manually is a huge chore I can’t stand. For these reasons, I mandated Copilot department-wide. I haven’t measured the exact speedup, but I can give a ballpark figure of 20-30% faster programming. You no longer have to think about the tedious part and can save your brain for interesting and challenging problems, which is what all programming ought to be.

                                    1. 56

                                      I mandated Copilot department-wide

                                      I respect your right to rule your department however you wish, but thank you for putting a new thing on my radar to screen for in interviews - this sounds like pure authoritarian hell to me, personally. Somehow this manages to be worse (to me!) than “I mandate the use of VSCode by all my engineers”, something a distant acquaintance decreed for their team once.

                                      1. 3

                                        I find it no different than deciding to use any other tool in a standardised way. For example we also use Slack, Jira, Google Docs, and many other tools. It’s an obvious productivity boost, and the faster we adopt it, the faster we can beat the competition who is more set in their ways or otherwise reluctant to be in the early adopter/early majority crowd.

                                        1. 29

                                          Sure, and I also get grumpy about a lot of those tools, too. Slack eats my RAM, Jira is one of the most obtuse and labyrinthine pieces of software I’ve ever been forced to use, etc.

                                          The thing I find manager types tend to not account for is the productivity drop for people who don’t align with the tool in question. If someone doesn’t want a robot telling them what bug-ridden code blurb should get inserted here, they’re prone to getting slowed down or pissed off by the noise. It’s the same reason I turn off the “Press Tab to auto-complete this sentence in your email” nonsense - no “AI” has ever figured out my style of talking or typing, and the recommendations are almost always varying degrees of (1) off-base entirely, or (2) not the style I’d write in. So I disable them - otherwise I have to actively ignore them every time.

                                          So you do you, I’m just thrilled to not work somewhere that assumes they know how I should populate a text file better than I do after so many years of doing exactly that, quite well, and often quite quickly.

                                          1. 3

                                            That’s not my and my team’s experience with Copilot. It’s a far cry from “bug-ridden code blurbs”. There are people who oppose all sorts of technological progress and produtivity boosts, even among programmers - I talked to people who swore that autocomplete or syntax coloring slowed them down and was a nuisance. I don’t treat these complaints seriously, particularly after I saw how unbearably slow and tedious work is without these tools.

                                            1. 24

                                              Sure - but if you’re going to mandate something, this presumes you’re willing to fire them over not doing that thing. And I’d be quite surprised if it’s worth firing an experienced engineer who prefers, say, snippets (whether those are automated through eg. LSP, or just copy-pasting from a personal library of common blocks they keep in a text file somewhere) over hooking up to a cloud service and asking it what to write. Sure, using neither of those might be slow, depending on the engineer and their experience level. But a hard line of “use this one proprietary tool to achieve the ends of churning out common blocks faster, or go be homeless for all I care” just doesn’t sit well with me.

                                              Again, you do you. I’m thankful we had this conversation, if perhaps so that I know what types of managers are now out there. I can save myself, and those managers, plenty of time by filtering early against this type of management. Legitimately - thanks for being open with this opinion.

                                              1. 3

                                                Hey, no need for the passive-aggressive tone. I like to think that I hire cooperative people, and team players, so I don’t think that I’d ever threaten anyone with firing over a productivity tool, and I don’t think that anyone on my team would even make any fuss over this. Is this a normal course of action for you? Why “go be homeless”? Aren’t you a little too dramatic?

                                                1. 37

                                                  It’s a blunt reality: mandating a thing implies enforcement, and as a manager that includes up to the threat of termination. This isn’t meant to be “too dramatic”, it’s the reality of your position of power - and for what it’s worth, whether your reports phrase it that way or not, this reality is something many of them have in the back of their minds, consciously or subconsciously, when any decree (not just Copilot) about how they must work rolls out. There have been many a concern I’ve had in my career that never made it to the ears or eyes of a manager - because I knew the alternative was seeking other work. And sometimes, your reports will seek that alternative work on their own terms, rather than risking you telling them they have to do so, now on a time crunch.

                                                  If you so happen to have a team that entirely is genuinely onboard with Copilot, great! Congratulations! If you manage to continue to hire only those who are also genuinely onboard with Copilot, cool! I’m glad your team meshes well. But I do want to offer the above asterisk to what “cooperation” can sometimes mean in a capitalist hierarchical environment, as food for thought, and as context for why my above comment was worded the way it was. Again - I’m happy for you if your team is truly and honestly onboard and reaping the benefits of these decisions.

                                                  1. 1

                                                    You might be used to very abusive workplaces, and think this is normal. I don’t work in a corporation, so we have a very different work environment than you imagine, and very different relationships. No programmer is threatened with homelessness, even when losing a job, but there are many other ways of working together, than constantly threatening each other. If your job is as you describe, then you should really consider seeking something better. There’s no shortage of companies which are not ran like gulags.

                                                    I have no people on my team who have ever felt threatened by anything I do, and likewise they would never consider leaving just because of a tool. There’s a whole spectrum of communication techniques before you have to resort to firing. In fact there was only one such case in the last 5 years that I had to do it.

                                                    1. 17

                                                      I have no people on my team who have ever felt threatened by anything I do,

                                                      If you truly believe that, I think you might be in serious need of a dose of self-awareness.

                                                      It’s one of the first things you need to learn as a manager: your very position is threatening, and there is nothing you can do about it, except accept that this is so and act accordingly.

                                                      Deluding yourself into thinking this is not so because you are a nice person, or even more laughably because the company is nice, makes it worse. A lot worse.

                                                      1. 2

                                                        I think you’re mistaken about the role of a manager. You, like many people here, might be so used to abysmal working conditions, that you convinced yourself that’s all there is. There are genuinely nice workplaces out there and I encourage you to seek them out. It doesn’t need to be a dog eat dog world.

                                                        1. 13

                                                          I completely agree with @mpwheiher’s analysis. If you are a manager, there is an unavoidable power dynamic. Even if you are not consciously doing anything to intentionally exacerbate it, it is there.

                                                          A lot of the good management training is about how to recognise the kinds of behaviours that arise from this and actively mitigate any potential damage from them. If you are not actively doing that, then you are probably making it worse in a dozen subtle ways every day.

                                                          1. 11

                                                            I think you’re mistaken about the role of a manager.

                                                            I did not talk about the role of a manager. My opinion is that the role of a manager is largely to act as a “VM” for their team.

                                                            What I did talk about was the position of a manager, and that position comes with power dynamics whether you want it to or not. As a manager you need to be aware of that.

                                                            You, like many people here, might be so used to abysmal working conditions,

                                                            No. I’ve had good working conditions and bad working conditions. I’ve been a manager and I learned things about being a manager.

                                                            What I am talking about has nothing whatsoever to do with nice or non-nice workplaces and nice or non-nice people. That is a common delusion: “but I am a nice person, so things are good”.

                                                            No they’re not, not if you’re not aware. In fact, genuinely nice people who are unaware of these dynamics are often worse than actual non-nice people who are. A lot worse.

                                                            You need to realize that, because of the unavoidable power-dynamics, the feedback you are getting from your people will be skewed. Once you are a manager, your jokes will be funnier and people will like you more. That’s not disingenuous or devious, that is how humans are wired.

                                                            1. 0

                                                              I have a great team and very smart and happy people working for me. It says more about you than about me, that you think the above is normal and expected. When you’re so deeply immersed in abuse that it seems like an unavoidable status quo it’s time to do some soul searching. You must be coming from a completely different, alien work culture.

                                                              1. 8

                                                                The vote ratios on all of the comments in this chain say otherwise. Multiple people have been attempting to explain this to you from multiple angles, and scores of onlookers agree with what they’re saying. Take the hint.

                                                                1. 1

                                                                  Truth is not a matter of referendum. That’s why I have post scores hidden with ublock, they cause brain damage. I know better what kinds of people work for me and how they behave than internet randoms.

                                                            2. 7

                                                              The existence of the power dynamic is not inherently abysmal, bad, or undesirable. It’s just how managerial roles work. It’s not an accusation to say that your direct reports are aware of the power you have over their livelihood and adjust their behavior accordingly. A good manager does not fight the dynamic, but does account for it in their decisions.

                                                              I know a lot of people in tech are uncomfortable with power - it’s common to have had bad experiences or to self-identify as one of the workers. That might not be the case for you in particular, but whatever the reason, I have a hard time believing that there’s no recognition of power dynamics in your group or that such a thing would even be desirable.

                                                              I’m not prepared to say that anyone who ignores the power dynamic is automatically a bad manager. It is entirely possible that you are correct that you have very happy people working for you; you are a better judge of that than any of us who don’t know you. There are a lot of different factors that go into good management. But it will almost always make for better management to recognize the power difference.

                                                          2. 15

                                                            That really depends whether the country you live in guarantees a minimum standard or living or not, and how much you can count on your family to support you if things go bad. Without either of those things, your only choices are:

                                                            • Comply with the requirements of some workplace.
                                                            • Be your own boss (that usually requires capital most people don’t have).
                                                            • Burn your savings until you become homeless.

                                                            If you enforce anything (and you pretty much have to, there’s a job to be done), people know that failure to comply will cause problems: you’ll inevitably hold it against them, you’ll be less likely to raise their salary or promote them, and more likely to fire them if there’s a serious enough downturn. Heck, even mere suggestions might be interpreted as enforcement by some, especially if they’re on probation.

                                                            And there’s nothing you can do about it.


                                                            I have an anecdote about someone who didn’t see it that way. She was the director of Human Resources, and as such has tremendous power, most notably the power to veto any hire, and a reputation to use it.

                                                            So we were working in an open plan, over 50 people on the same floor (I’ve counted), and it was quite noisy. There weren’t even any walls between the working desks and the central area were lied the coffee machine and two meeting rooms. I complained twice in our Scrum Retrospectives, and the second time they couldn’t ignore the people playing table soccer almost right next to us. Thus, I was asked to report this problem to HR, and make my suggestions known (raising a few walls, hanging noise dampening panels on the ceiling…).

                                                            I wrote my email, and took the opportunity to warn the HR director about asking people directly: since she holds so much power over them, their answers might be less than truthful.

                                                            Later I got feedback from my Product Owner, telling me she ended up asking around exactly like I said she shouldn’t do, and found that nobody had any problem with the noise in the Open Plan. Apparently she was mildly pissed about my email, and he begged me to tone it down so I could be hired (I was contracting at the time, he wanted me on the team, and feared HR’s reaction over this).

                                                            So. Predictable. I wish I could have looked her in the eye and explain this basic fact about human relations under power imbalances. Especially there in this fast growing branch of the company, where most people were either fresh hires still on probation, or contractors hoping to get hired. My guess is that when she asked “do you feel there’s too much noise in the open plan?”, most people answered something along the lines of “I’m feeling good working for this company”.

                                                            And of course they would. No matter how friendly a face she puts in, there’s nothing she can do about being the boss, having power over people, and people being cognisant of it.

                                                            (Now to be honest I believe many people really didn’t have any problem with the noise level. Especially the youngest ones who had yet to know any other workplace.)

                                                        2. 19

                                                          I don’t think that I’d ever threaten anyone with firing over a productivity tool

                                                          By mandating the use of Copilot you are communicating that yes, you are willing to fire over the decision to not use Copilot when authoring code. If that were not the case, why issue a mandate at all?

                                                          Software engineers are people, and people have preferences for how they work through problems. I personally find Copilot extremely distracting, both solo and when pairing. It is net productivity loss for me and I definitely don’t want to pay for it, let alone be forced to use it because someone else likes it.

                                                          1. 1

                                                            Threatening to fire everyone over minor disagreements is not normal, and it’s not how most software companies or startups function. If yours do, then you’ve got bigger problems than the particular tools you use. That’s why they say that communication is a skill. Now I see why Copilot works better in environments where clear communication is nurtured and encourage - I had no idea anyone would consider it a normal course of action to fire someone over something so trivial.

                                                            1. 40

                                                              If someone refused your mandate, what would you do?

                                                              If your answer is “do nothing”, is it a mandate?

                                                              1. 10

                                                                Based on what your are saying, there is an easy language adjustment I recommend making: change “mandate” to “encourages” or “supports the use of”. This communicates “hey we like this tool and encourage our developers to try it” rather than “as a developer here you must use this tool”.

                                                        3. 3

                                                          It’s interesting that you mention autocomplete and syntax colouring, because that’s the first thing I thought when I saw that research: that the effects might be the same as that of autocomplete and syntax colouring.

                                                          I do use autocomplete, and syntax colouring, and copilot. They all make me faster. And I think I agree with your argument regarding pairing!

                                                          However, from time to time, I write code in nano, with syntax colouring turned off, no autocomplete and in a 80x24 terminal window. I used to do that a lot 10-15 years ago, less today. But I am pretty sure some of my most elegant code has been written this way.

                                                          Copilot is very good at thing such as efficiently passing 6 or 7 keyword arguments to a function. Tab tab tab, done. It’s a real productivity booster.

                                                          But if you write code in a small window with no copy-paste, you’re going to make sure you’ll never need to pass 6 or 7 keyword arguments to a function and either overflow your right margin or eat up half your vertical space.

                                                          Don’t get me wrong, Copilot and all the other tools help you write working code faster, and most of the time it’s all that matters. But when you’re working on something really hard, the kind of code where you need the whole mental model in your head to nail the abstractions just right? You should probably turn it off.

                                                          1. 2

                                                            Thanks for sharing your experience with Copilot! Did you notice any difference in quality between different languages and frameworks?

                                                            I’m thinking of using it, so it helps me with pandas, where I noticed I spend a lot of time on StackOverflow to find the right way of doing things.

                                                            1. 3

                                                              Yes, definitely. It seem to have been trained extensively on Javascript, Python, Typescript, and Rust in particular. It is noticeably worse with C and C++. For languages that have similar syntax, it can also confuse them with each other.

                                                        4. 19

                                                          For example we also use Slack, Jira, Google Docs, and many other tools.

                                                          The 3 examples you gave are all communication tools. That’s not the same as stuff like editors, debuggers or large language models, and you can’t use the former as examples to justify mandating the latter.

                                                          1. 5

                                                            Microsoft was rarely, if ever, first to market with anything, yet they’re still successful.

                                                        5. 19

                                                          Why are you writing block after block of trivial, obvious code in the first place?

                                                          1. 3

                                                            You’ve never needed to write boilerplate?

                                                            1. 37

                                                              I routinely remove boilerplate. And if I find myself repeating the same code over and over again, I make it a function, or, more rarely, a macro.

                                                              The time I take to write and rewrite my code 10 times over is so small, compared to the time I spend reading my code, reading about the API I may use, thinking of the simplest approach, making sure I accounted for all edge cases… that it’s not even worth optimising yet. My slow-ish 60 WPM are more than enough.

                                                              1. 13

                                                                Not really… you mean like writing getters and setters for fields? If you just use, you know, field access then you don’t need to make getters and setters.

                                                                Typically if you have repetitive code, you can put that repetitive code into a function and then it stops being repetitive.

                                                                1. 3

                                                                  This heavily depends on language support. Accessors can give you read-only fields. Now, languages should support read-only fields or even immutable data types out of the box, but many don’t.

                                                                2. 9

                                                                  As a programmer, one of your main jobs is to reduce, remove and avoid boilerplate.

                                                                  1. 3

                                                                    Of course, but if writing similar boilerplate is a large part of what you write, then it’s time to consider some macros.

                                                                3. 9

                                                                  I mandated Copilot department-wide

                                                                  what does this even mean? Does this mean you purchased copilot for everyone or that you require people use it?

                                                                  1. 0

                                                                    I think you should assume that when someone says they mandated something that they mean that they actually mandated it. Rather than asking if they really mean that they didn’t mandate it. :)

                                                                    1. 7

                                                                      I think there is a specificity missing from the language that’s fair to ask for clarification on.

                                                                  2. 4

                                                                    Quick question out of curiosity, how fast do you type and what code editor do you use?

                                                                    1. 2

                                                                      I’ve never used copilot, but at work we have a generative AI code completer. My experience partly matches yours: for Python code it can do some amazing inference and extrapolation that convinces me it understands the code I’m writing.

                                                                      Part of the other time it offers up subtly wrong code that makes me paranoid.

                                                                      For the rest of the time it spits out rather large blocks of garbage that can be annoying at times.

                                                                      1. 2

                                                                        Looking forward to see where your company is in 5 years after having adopted this strategic advantage over competition.

                                                                      2. 4

                                                                        Bad devs will produce worse code faster, good devs will produce better code faster. AI is just going to make the gap bigger, but most devs are bad so overall code will get worse.

                                                                        1. 11

                                                                          Bad devs will produce worse code faster, good devs will produce better code faster.

                                                                          I think the fear is bad devs will produce worse code faster, good devs will produce better code at approximately the same speed. 😬

                                                                        2. 2

                                                                          While the results match this site’s priors (as well as mine), the whitepaper seems to be a bit…minimal? The comparison is not based on two separate bodies of code writing data—this is just a correlation. Doesn’t feel like very rigorous analysis.

                                                                          1. 1

                                                                            The findings make sense given the current architecture of LLMs — the limited context window means they don’t see the whole codebase, and won’t be able to reuse distant functions. The autocomplete/chat interface is also unsuitable for recommending project-wide refactorings.

                                                                            1. 1

                                                                              While I don’t find these studies surprising at all, I also can’t help but think perhaps it’s showing something different than people think? I like Copilot as basically a fancy auto-complete/snippet engine and nothing more. I rarely use the output verbatim, and often don’t use the output at all and instead just as a jumping off point for inspiration. Most people I’ve talked to and worked with use it in a similar manner.

                                                                              Yet these studies seem to make it look like people are en mass just committing the output as is. Additionally, do code reviews not exist in these studies? Poor LLM output is the fault of the AI, poor code quality at the time of the commit is the fault of the individual developer, poor code quality at the time of merge is the fault of the team and organization. We as an industry have multiple checks and balances for a reason and the LLM only injects at the very beginning of the pipeline (pre-commit!).

                                                                              So I wonder if the people in these studies would be writing just a poor code without the use of AI, albeit at a much slower pace that perhaps is easier for the team/org to catch and/or ignore due to limited impact at such slow velocity?

                                                                              1. 18

                                                                                I find it harder to spot bugs in Copilot-generated code than human-written code because the copilot code comes from a statistical model that makes it look like the surrounding code. The errors are in code that blends in with my expectations. That’s sometimes the case for human-written errors but it’s almost always the case for Copilot errors.

                                                                                1. 1

                                                                                  I agree, and have a similar usage pattern. I get most of the value out of copy-paste kinds of tasks, especially writing test cases following an existing pattern.

                                                                                  If I’m writing what you’d call business logic, I tend to not even look at suggestions, because the overhead of me having to understand what it’s doing and checking if it’s actually correct is much higher than just writing code myself, which also tends to be more understandable overall.

                                                                                  In addition to that the results are honestly not that great. Like I said, test cases that are 5-10 lines a piece and follow an existing pattern, generally just fine, and easy enough to verify and tweak if it misunderstands the test prompt. But one of my hopes has been that it helps me in a codebase that I’m not super familiar with (Rails), and it loves to make up non-existing methods.

                                                                                  There might also be an aspect of different languages having different levels of quality. I’ve found that my primary language Rust tends to produce higher-quality (but still sub-par) suggestions that are also easier to verify than Ruby ones. Maybe that’s because the training corpus is more homogeneous?