GitHub Copilot: The Good, The Bad, The Ugly

Article summary

GitHub recently announced GitHub Copilot, an AI-powered tool that aims to help programmers write better code. It uses data from thousands of GitHub repositories to intelligently suggest code completions in real-time.

As someone who’s always curious about tools that could enhance my workflow, I was quick to sign up for the technical preview. My experiences have revealed some time-saving use cases, unfortunate flaws, and interesting implications.

The Good

Copilot has an impressive ability to suggest exactly what you’re looking for. Writing a unit test? Copilot can auto-fill the test setup for you. Debugging an issue with your code? Copilot can intelligently insert console.logs where you need them. Writing complex queries that use lots of abbreviations? Copilot can reuse patterns in your code to take some of the busy work out of it. In multiple instances, I’ve been so impressed by Copilot that I had to pull over my coworkers to show off how I can fill an entire code block using nothing but the Tab and Enter keys.

Copilot highlights the importance of consistent naming schemes. My current project deals with entities called Market Product Variants. Based solely on contextual clues in the codebase, I can write the following code:

const mpv

…and Copilot will automatically fill in the rest of the assignment statement:

const mpv = await ctx.get(MarketProductVariantRecordRepositoryPort).insert({
  id: v4(),
  description: 'new market product variant',

Impressive? That’s just the tip of the iceberg. Copilot can automatically fill in it.each template strings, populate function bodies, and even generate entire code blocks from a comment alone. Paired with tools like Vim, Copilot helps keep your hands on the keyboard and away from your mouse, maximizing efficiency.

The Bad

Copilot’s suggestions are pretty useful, but sometimes they’re more like a devil on your shoulder. Copilot often suggests code snippets that sound like what you want but actually misinterpret the patterns and domain model described by your codebase.

In my current project, we often use GraphQL DataLoaders to facilitate the retrieval of large amounts of data. Often we want to clear DataLoader caches using the API’s clearAll() function, for example:


However, I’ve often seen Copilot suggest a completely separate, nonexistent function called clearCache() instead. It sounds like the desired function, but the type system says otherwise. Having been trained on a massive set of open-source code, common practices and conventions tend to emerge in Copilot’s suggestions. If those don’t match the ones set in your project, Copilot will produce wonky results.

Additionally, while Copilot is trained to write code, it doesn’t do so well with English text. This makes sense because programming languages (like JavaScript, Python, or Haskell) have much stricter syntactical rules compared to the nuances of the English language. As a result, when attempting to write comments or test descriptions, Copilot will often suggest very generic phrases. Grammatically, these suggestions are usually okay, but I rarely ever accept them because they don’t fully capture the ideas I’m trying to express.

The Ugly

Worst of all, Copilot doesn’t currently support VSCode’s multi-cursor feature! 😱

Even though GitHub markets it as an “AI pair programmer,” I don’t really think it should be treated that way. Any machine learning model based upon human-generated data is subject to error. While I don’t think it poses a threat to our jobs as programmers (at least, not for a while longer), I don’t believe we should fully trust Copilot to write code for us.

The speed at which it allows you to bang out lines of code is surely convenient, but Copilot makes it easy to write mindless code – code that’s poorly thought-through, messy, and buggy. In fact, a recent study found that Copilot produced code with security vulnerabilities in 40% of circumstances!

GitHub Copilot is a useful tool, but it’s by no means a replacement for traditional pair programming. Here at Atomic, our work is heavily influenced by Kent Beck’s Extreme Programming (XP). We believe that pair programming not only produces a larger volume of code but also allows for feedback, refinement, and accountability. Copilot falls short in most of these areas.

It’s a fun tool to use, but if you plan to use it in your regular workflow, make sure to watch your step while you code. I look forward to seeing how this robotic pair programmer grows and develops, while I continue to utilize actual pair programming in my daily work.