Thursday, December 26, 2024

Developers with AI assistants need to follow the pair programming model

Programming LanguageDevelopers with AI assistants need to follow the pair programming model


Now that generative AI, large language models, and CodeGen applications have been out for a while, we’ve seen developers figure out their strengths, their weaknesses, and how they can deliver value to customers faster without getting hung up on untangling LLM confabulations. CodeGen applications pump out code fast for pretty cheap prices, but it’s not always good. AI-generated code always needs a strong code review, and that can reduce the productivity gains it offers.

However, there’s a programming model that incorporates continuous code review and produces better code: pair programming. In pair programming, two programmers work on the same code together to produce something that is higher-quality than either of them would produce by themselves.

In this article, we’ll discuss how and why pair programming is so effective, how you can treat your AI assistant as a paired programmer, and the best ways to make this pairing work (as well as the methods that don’t).

Way back in 2016, we published a piece on the benefits of pair programming:

In pair programming one participant is the “driver,” who actually writes code, and the other is the “navigator,” who checks the driver’s work as it’s done and keeps an eye on the big picture.

Studies have shown that, contrary to early objections that the practice would be twice as expensive in terms of “man hours,” coding this way actually adds just 15% more time to the development process, and in exchange returns 15% fewer bugs and defects.

You might think that two people working on the same code with only one keyboard would slow down the process. But the above study linked found that when one person wrote the code, then sent it to another for code review, the process actually took longer than pair programming. The coder had to convey everything that they learned from writing the code, which took twice as long as making the original changes. Pairing and doing both the coding and the review simultaneously saved time by parallelizing the learning—both programmers learned simultaneously. And if the review had to convey something to the coder to correct a mistake, they could do that in real time, too, and the coder could avoid building additional code on that mistake.

In a recent question on the Software Engineering Stack Exchange site, user h22 compared pair programming to having two pilots in an airplane cockpit: “In aviation, then there are two pilots, there is a pilot flying and the pilot monitoring. The pilot monitoring is also fully in the course and can take over at any time. This works very well and is unlikely to change, even if technically these aircraft could be flown by a single human.”

For pairings between junior and senior programmers, the knowledge asymmetry may make the exercise feel like a training session, with the senior frustrated and rattling off commands to “add this, change this, no, not like that”. Pair programming is not the same thing as training or mentoring, and unless you’re telling someone how to defuse a bomb via walkie talkie, you’re collaborating. As user Flater wrote, “Work-related conversations are precisely the point of pair programming; it allows the pair to convey their knowledge to each other and/or helps them work together to learn something that’s new to both of them.”

You’re not telling someone how to code; you’re collaborating on the spot and giving/receiving instant peer reviews as solutions are proposed. If you run into roadblocks, overthinking things to death, user candied_orange suggests underthinking things. “The easy cure for analysis paralysis is doing something stupid and making people explain to you why it’s wrong. Iterate on that until you run out of wrong.”

Now, you may already agree with all this, pairing on the regular as super-productive duos. Let’s take a look at how you can take these methods and apply them to your GenAI/CodeGen assistants.

Many people, particularly people who are less familiar with code, think that CodeGen assistants are tools that write code for you based on natural language prompts. A study by GitClear found that copying and pasting is on the rise: since 2022, more code is being copied and pasted from external sources, while less is being moved (a sign of refactoring), leading to greater churn—code updated, removed, or reverted within two weeks. They conclude that “the rise of AI assistants is strongly correlated with ‘mistake code’ being pushed to the repo.”

CodeGen assistants have been found to write code that isn’t always up to snuff. Isaac Lyman, writing in this blog about AI-assisted programmers, said, “Studies have found that [CodeGen] tools deliver code that is ‘valid’ (runs without errors) about 90% of the time, passes an average of 30% to 65% of unit tests, and is ‘secure’ about 60% of the time.” They’ve included libraries, functions, and variables out of thin air. Imagine a newly-hired mediocre junior programmer who’s read tons of documentation, taken every bootcamp, and checked out every Stack Overflow Q&A page. That’s who you’re pairing with when you use CodeGen.

To set up a pair programming paradigm with CodeGen, you take the navigator role, while the AI is the coder. As the knowledgeable one, you should be planning, thinking about design, and reviewing any code produced, while the tool does what it does best: cranks out code fast. As Lyman wrote, “It’s the AI’s job to be fast, but it’s your job to be good.”

When Replit CEO Amjad Masad came on the Stack Overflow podcast, he talked about how many of the current CodeGen tools run the pairing relationship the other way: “They call it Copilot because you’re still the driver. It is looking over your shoulder and giving you suggestions about what you might want to do next.” But he also pointed out the dangers of not giving the human partner the final say. “The reliability, the hallucination problem, is unsolved. That’s the fundamental problem with neural networks, we don’t know, actually, what they’re doing, and therefore we can’t trust them. There will always need to be a software engineer that is actually verifying and looking at the code.”

Programming fundamentals will become more important than ever, as will seasoned programmers who know the ins and outs of what makes for quality code—by following SOLID principles, keeping code simple and easy to read, and building self-contained components. When Marcos Grappeggia, the product manager for Google Duet, joined the Stack Overflow podcast, he was clear on the limits of CodeGen tools: “They’re not a great replacement for day-to-day developers. If you don’t understand your code, that’s still a recipe for failure. The model is still going to help explain the code for you, to get the high level, but it doesn’t replace developers fully understanding the code.”

As the navigator to the AI’s coder, the syntax and library knowledge may be less important in the long run compared to architecting, requirements detailing (and pivoting), and refactoring. High-level fluency and understanding what makes software well-engineered will make you a better pair partner. When we talked to William Falcon, an AI researcher and creator of PyTorch Lightning, on the podcast, he emphasized the importance of domain knowledge: “If you’re a new developer, you’re just going to copy it. I’m like, ‘I know this is not written by you because it’s too over-engineered and a little bit too complicated.’ You know that there are control flows, you know that there are bad practices around global variables. There’s all these standard things that we all know. It’s like an English lawyer using a translator for French. They’re going to do a great job because they already know the law. But having someone who speaks English that’s not a lawyer try to do law in French won’t work.”

But when you grasp that your partner here is flawed and can adjust on the fly, why, then you can get that mythical 10x productivity. At the end of 2022, right after ChatGPT was loosed upon the wilds, David Clinton wrote about using it to create Bash scripts. While it wasn’t perfect, its imperfections were illuminating. “I began to realize that there was an even more powerful benefit staring me in the face: an opportunity to pair-program with an eminently helpful partner. The AI ultimately failed to solve my problem, but the way it failed was absolutely fascinating. It’s just mind-blowing how the AI is completely engaged in the process here. It remembers its first code, listens to and understands my complaint, and thinks through a solution.”

As we’ll see, approaching CodeGen with this mindset—as a flawed but helpful partner—can help you make the most of the code it gives you.

Are there specific ways to take advantage and mitigate code from a fast and dumb pair programming partner? I reached out to Bootstrap IT’s David Clinton, who wrote the Bash article linked above, to see if he’d learned how to best work with CodeGen partners. “Embrace multiple LLM tools and interfaces,” he advised. “Results can completely change from one week to the next. That’s why we decided to call my Manning book ‘The Complete Obsolete Guide to GenAI.’”

Leaning into the fast part means that you can get a quick draft/prototype of something and build off it. “There are times when I’ll upload a complex CSV file—or even the unstructured data in a PDF—to ChatGPT Plus and ask it to do its own analytics,” said Clinton. “I appreciate the immediate insights, but GPT also gives me the code it used to do its work, which I can cut-and-paste to jump-start my own analytics. I talk a lot about that in my new Pluralsight course.”

While many developers have spent their careers specializing in a few languages, CodeGen knows most of them. Many programming languages operate with similar logic, so if you let your AI partner handle the syntax, you can create code in languages you don’t know. Anand Das, cofounder and CTO of Bito AI, told us about this dynamic: “People who are coming into the project and are trying to solve bugs and don’t really know a particular language—somebody wrote a script in Python and the guy doesn’t know Python—they can actually understand what that script does and logically figure out that there is an issue and then have AI actually write code.”

As I’ve written about before, one of the things that AI does well is scaffolding—applying a known pattern to new data. Under your guidance, you can get your CodeGen buddy to apply known fixes/templates/type declarations to new items. It’s what automatic security flaw patcher Mobb does. CEO and cofounder Eitan Worcel told us: “Our approach is to build a fix and use the AI to enhance our coverage on that fix. It will take the results of a scan, identify the problem—let’s say SQL injection which is a very known one. We have patterns to find that root cause, and with a mix of our algorithms and GenAI, we will generate a fix for the developer, present that fix to the developer in their GitHub so they don’t need to go anywhere.”

But Worcel’s experience developing Mobb speaks to the other side of pairing with CodeGen—mitigating the dumb stuff. “The first few researches that we did around AI were underwhelming to the extreme. We got about a 30% success rate with fixes, and even those, sometimes it fixed the problem in a way that no one should do. Sometimes it actually introduced new vulnerabilities. We needed to put guardrails around the AI and not let it go outside of those guardrails and hallucinate stuff.” In a pairing paradigm, you are those guardrails.

You can provide guardrails in two ways. The first is by including detailed requirements in the prompt, including all your variable names. “Include details like actual column and dataframe names in your prompt,” said Clinton. “That way the code you get back won’t need as much rewriting. And don’t be embarrassed to ask for the same dumb syntax over and over again. The LLM doesn’t care how dumb I am.”

The second is by testing the pants off of any code the AI gives you, including ensuring that libraries, methods, and APIs actually exist and are implemented in a safe way. “For example, I want to access this API and the API doesn’t exist,” said Das. “You don’t want any model that you’re using to suddenly give you an API which doesn’t exist and you think you can use this. When you start running it, there’s no definition for it.”

Right now, CodeGen tools won’t be writing good code without a knowledgeable developer navigating over their shoulder. Maybe they never will. But humans and GenAI work better together, with the humans getting fast first drafts of code and the AI getting feedback and checks on their instant output. When we talked with Doug Seven, director of software development at AWS and the GM for CodeWhisperer, he framed CodeGen tools like this: “CodeWhisperer is like having a new hire developer join your team. They understand the basics of software development, they know how to write code in lots of different ways, but they don’t understand your code that’s in your organization that’s private and proprietary.”

In other words, it’s the AI’s job to be fast. It’s your job to be good.

Pair programming has proven itself to be a force multiplier on the individuals sharing a keyboard. One focuses on the syntax and implementation, while the other focuses on the big picture and provides instant code review. By making a CodeGen tool your syntax and implement partner, you can reduce the feedback window between code and code review to minutes, allowing you to iterate and elaborate on ideas without futzing about with semicolons and type definitions.

That said, you still need to understand any code that you and your AI partner push to code. No matter where code comes from—AI, copy and paste, coworkers—understanding it like you wrote it yourself is essential to keeping a codebase humming along.

Check out our other content

Check out other tags:

Most Popular Articles