Nobody Cares About Your Git History

January 25, 2024

Article summary

Decoding "Clean Git History"
Why Rebase?
The Upside of a Clean Commit History
The Downside of Over-Curation
Utilizing Git History Correctly
The Ultimate Purpose of Git

The topic of Git history, rebasing, squashing, and commit messaging often surfaces, igniting debates as fiery as those over Vim vs. Emacs, tabs vs. spaces, or pineapple on pizza. For some, Git history is the unsung hero of software teams; for others, it’s a persistent thorn in the side. Opinions vary widely, and those who have them often hold them with conviction (it’s me, hi, I’m the problem).

This post isn’t about preaching or gatekeeping; it’s about sharing thoughts and sparking a dialogue. I don’t claim to have all the answers, nor do I consider myself the ultimate authority on this topic. If you think I’m off the mark, I welcome your corrections in the comments. After all, I might just be on the wrong side of history (pun intended).

Decoding “Clean Git History”

The phrase “clean Git history” commonly conjures up images of a linear, immaculate sequence of commits—each one polished and purposeful. Achieving this often involves rebasing or squashing feature branches to create a simple-to-follow narrative. It may also include tagging, releases, and other Git features. But here’s a thought: is this meticulous curation necessary, or even beneficial?

Mastering your tools, especially Git, is something I firmly believe in. However, it’s important to acknowledge that we’re all at different stages in our development journey. For those just starting out, and already drinking from the firehose of professional software development, a complex Git process could be more daunting and dangerous than helpful.

Opinions on Git usage vary: some see code as the ultimate truth and pay less attention to Git history, while others craft and rewrite their commits like poetry. I tend to rely on Git as a tool for sharing code and tracking changes, and I do recognize the value in the history. This post is for everyone, no matter where you stand.

Why Rebase?

So, who should rebase, and why?

Large organizations and open-source projects

These groups often manage complex codebases with numerous contributors, where a neater history can assist in navigating changes and understanding the evolution of the code. Rebasing helps maintain a readable and coherent history, which is more important when many people are simultaneously working on different features or fixes.

Why: A cleaner commit history in these environments aids in tracking changes and features over time. Individual commits are meaningful and self-contained, which makes them easier to review, test, and potentially revert without affecting unrelated changes. It also simplifies the task of pinpointing where bugs were introduced and facilitates the process of reverting to stable states. Fewer, but more thoughtfully crafted commits will reduce noise for the rest of the contributors.
Addendum: This may especially be true for a project utilizing a monorepo strategy, where a number of different teams and contributors are committing to the same codebase. Maintaining a clean history can be useful to reduce the amount of noise that other teams see.

Small, highly experienced teams

Teams that are well-versed in Git and showcase a number too big to fit on their “days since last rebase accident” sign might find rebasing a useful tool to keep their history concise. They’re less likely to encounter the pitfalls that can come with a complex rebase, and they have the necessary skills to resolve conflicts and understand the implications of rewriting history.

Why: For these teams, rebasing can be part of a routine to ensure that the main branch remains clean and that features are integrated seamlessly. It’s also a way to condense work into logical chunks that make sense to others and to their future selves.

Individual contributors working on isolated features

When working alone on a feature or in a branch that doesn’t have frequent interactions with others, rebasing can be a way to tidy up before merging back into main. It allows the contributor to present their work in a clear and structured manner, which can be helpful for some code reviewers.

Why: The rebase here serves to streamline the individual’s contributions into a narrative that’s easier for others to follow. It can help to highlight the development process of the feature or fix, showing a clear progression from start to finish and making it easier for reviewers to understand the rationale behind each change.

The Upside of a Clean Commit History

Let’s look at the pros and cons of each approach. First, the upside of a clean commit history.

A streamlined log

An orderly history can be easier to read and understand. Single, self-contained commits for feature branches can quickly show when things were introduced.

Commit-centric PR reviews: Some find value in this method, though it’s not universally applicable.

My perspective: Reviewing PRs commit-by-commit can lead to scrutinizing outdated code. While reviewing smaller changes might seem helpful, the lack of context of the full PR can diminish its utility. I don’t recommend this approach.

Less WIP clutter

Carefully rewritten commits mean fewer half-baked or broken changes, avoiding the frustration of build errors and test failures when checking out a teammate’s code.

Easier to revert changes

While not the only way to manage this (see: feature flags), being able to quickly revert a commit is easier than going through the codebase and manually commenting things out. Rolling back a whole release can also be an option, but there are times when it’s not tenable.

The Downside of Over-Curation

So, what’s the downside?

Loss of detail

The development process is inherently messy and non-linear. Squashing commits can remove the context around why certain decisions were made, which can be crucial for understanding the rationale behind the current state of the code when revisiting it in the future.

Finding old code

At times, the code that gets removed during the development process can be as informative as the code that remains. When commits are squashed, the ability to retrieve and examine these discarded snippets is lost, which can be a disadvantage when trying to understand past decisions or resurrect a shelved idea. If you’ve written and removed code in a single branch, it can be nearly impossible to retrieve those changes after a squash or rebase.

Tracking the author

In collaborative environments, understanding who wrote what can be essential for seeking clarification or addressing issues. Squashing commits often attributes all changes to the person performing the squash, removing the granularity of authorship and complicating accountability. Simply put: git blame is not effective when commits are rewritten.

Git bisect to the rescue

Bisecting is a lifesaver when isolating issues, and it’s far easier with smaller commits than with a huge squashed feature branch. It’s quick to churn through even thousands of commits to find the offending one, but if it only turns up a squashed feature branch, it will take more time to comb through every line of code.

Potential for errors

Rebasing, especially when done across branches that are widely divergent, can introduce errors that are difficult to trace. A single misstep during conflict resolution can lead to subtle bugs that may not be immediately apparent, creating a hidden cost that manifests later. A botched rebase can wreak havoc, leading to lost work at worst, or wasted time at best. I know plenty of horror stories of colleagues trying to untangle codebases that have been afflicted by the rebase bug. This isn’t to say that this doesn’t occur via other Git usage —- conflicts are a fact of life — but it can be particularly challenging to work through with rebasing. This is especially evident with folks who are new to Git, as the ramifications of rewriting history aren’t as obvious.

When, not if

At some point, even the most experienced of Git users will need to deal with a botched rebase. This is a frustrating experience, a tremendous time-suck, and a risky entry point for new bugs. New errors may be introduced as old commits are rewritten, and finding bugs can be complicated. While a merge may also introduce bugs, there is a single point in the commit history to mark that. Bugs could be introduced at any time in a chain of rewritten commits.

Time consumption

Crafting the perfect history is a significant time sink with debatable benefits. Developers may spend hours resolving conflicts during rebasing, time that could otherwise be spent writing new features or fixing bugs.

Perils of perfectionism

In striving for a pristine commit history, developers might hesitate to commit frequently or push their changes, which can lead to significant portions of work existing only on local machines. This not only increases the risk of data loss but also delays collaboration and feedback. I’d rather a teammate push up a broken WIP commit as they rush out the door than nothing at all.

The value of commit messages

While well-crafted commit messages are helpful, an overemphasis on their importance can detract from the primary goal of version control: collaborating with a team. Developers might spend more time managing their Git artifacts than coding, or worse, avoid committing frequently to avoid having to write messages for “trivial” changes.

They’re just lies, after all

Rewriting old history, especially with rebasing, but even with squashing, is simply that: rewriting history. It’s not a true representation of when changes were made, who made them, how the process went, and most importantly, why they were made. We’re lying to ourselves by saying a commit was created today when it was created days or weeks ago. Some people may choose to do this because it presents a better picture of how something went: no WIP commits, no broken builds, no test failures, and no mistakes taken to arrive at a solution. It’s simply not the truth, but it looks nice on paper.

One rebase, all rebase

Rebasing isn’t a solo activity; it has team-wide implications. When one team member starts rebasing and force-pushing, it requires everyone on the team to adapt to this workflow. This can be disruptive, especially if team members are at different skill levels with Git. Newer members may struggle with the complexities of rebasing, which can lead to errors and frustration. I advocate for managing your Git workflow as you see fit without imposing this standard on everyone.

Why this matters

Rebasing rewrites history, and when that rewritten history is shared, it can cause confusion and conflicts for others who have the old history in their local repositories. Team members must then spend additional time learning advanced Git commands to safely synchronize their work, which can be a steep learning curve for some and may introduce even more potential errors in that process.

The ripple effect

Even a well-intentioned rebase can create a ripple effect requiring others to halt their work to resolve unexpected issues. This can lead to a significant productivity hit for the team as a whole, as members must pause feature development to address problems introduced by history rewrites.

Utilizing Git History Correctly

I seldom pore over commit messages, relying instead on tools like git grep and git log -p for understanding code changes. Commit messages don’t tell the whole story; that is the purpose of the codebase itself. The story of the code is best told first-hand, when commits are introduced, not as an archaeologist or anthropologist going back in time and piecing things together. The original commit message, timestamp, author, and order are all more effective when left untampered.

Respecting diverse workflows

Best practices should be adaptable to fit various team and project dynamics. When in doubt, opt for the approach that’s safer, quicker, and preserves more data.

Avoiding disruption

Frequent context switching between coding and crafting detailed commit messages can hamper productivity. Sometimes it’s best to note it’s a work-in-progress (WIP) and move on. Is it worth revisiting those and fixing them in the future?

Embracing change

As requirements and knowledge evolve, so does our code. Time spent polishing history is pointless if those changes are later altered or discarded.

Finding the right process

Too many processes can get in the way of actually building software. Rebasing can add unnecessary steps to completing a task, and I hesitate to add another one to a constantly growing list in modern development.

Organize and rebase your commits if you like, but consider your team’s preferences, the team-wide implications, and potential drawbacks.

The Ultimate Purpose of Git

How you choose to use Git is up to you, but at its core, it is a tool to share code and teach changes with teammates, not a holy text of changelogs.

Rewriting commit history demands a substantial investment of time, adds complexity and risk, obscures the development story, and offers limited benefit. It’s just not worth it.

While it’s not an apples-to-apples comparison, consider this: we change source code via minification and obfuscation at the final stages of the release pipeline to maintain readability and debuggability during the development phase. Rewriting commit history obfuscates and minifies our source code at the first stage of that same pipeline, which obscures the natural, iterative problem-solving that unfolds as the code evolves.

The appeal of a pristine Git history is understandable, yet I’ve often found more practical value in a detailed, albeit cluttered, commit log. Small commits tracking all changes over time and accurate contributor logs are more beneficial than lumping them into one. Time spent managing this is often wasted effort. It can be put to better use.

Finally, my experiences are mostly with smaller teams focused on delivering functional code rather than maintaining spotless commit narratives. As I haven’t worked in an organization or on an open-source project where this is more important, it’s possible that I just haven’t developed the necessary skills, and I’m afraid of change. If you’ve struck a balance between meticulous Git habits and efficiency, I’m all ears.

I hope this post has provided some food for thought, whether you’re a Git purist or a pragmatist. I would love to continue the conversation in the comments. Please tell me why I’m wrong!

Conversation

Rogério R. Alcântara says:

January 26, 2024

Dear Dan,

I just wanted to drop a quick note to say thanks for your latest blog post – it’s been a great read! It’s not every day you come across something that’s so well-balanced and genuinely insightful. Your take on Git practices really got me thinking.

Honestly, it was so compelling that I’m almost swayed over to your point of view! Almost, but not quite. 😉

So, I thought, why not write a response? It’ll be a fun way to explore our different perspectives.

I’m not sure if I can match the high bar you’ve set, but hey, I’ll give it a shot – just give me a few days!

At the very least, I will learn a lot from your approach to picking your brain on this – which is more than enough reason for me.

Talk soon.

Dan Kelch says:

January 26, 2024

Hi Rogério, thank you for your kind words! Glad that you enjoyed reading. I’ll keep an eye on the blog you linked and wait for your post. Looking forward to reading your blog and learning from your perspective!

Reply

Jimb says:

January 26, 2024

I just wanted to add that in my experience “git blame” often becomes *more* accurate with squashed commits – when working on a large feature branch all of the changes you made and then reverted leave your name in the blame report, despite no actual changes to those lines (at time of merge).

Dan Kelch says:

January 26, 2024

That’s an interesting point, I hadn’t considered that. I wonder if there’s a difference in how changes are reverted, changing lines back manually versus doing a formal git revert, or git checkout develop -- path/to/file. Is it possible that something is changing on those lines, even whitespace, line endings, etc.?

Anyways, I’ll keep an eye out for this behavior – it might be mark in favor of squashing 🙂

Reply
- Alexandru Pătrănescu says:
  
  January 27, 2024
  
  Nope, you can just easily revert a merge commit. Just have to mention the first parent with -m 1
  
  Reply
- Jimb says:
  
  January 27, 2024
  
  Usually for my team by “reverted” changes I didn’t mean a specific git operation, just changes that were made during the course of working on a feature and then changed back to how they were before, as part of another commit by the end as part of regular development / PR reviews / etc. Perhaps if it’s an actual “revert” of an entire commit git blame would understand it fine.
  
  Reply
Alexandru Pătrănescu says:

January 27, 2024

What I usually do, is to use –first-parent when looking for high level changes. That is usually available for all git commands that deals with history.

Reply

Randy K says:

January 26, 2024

I went into this saying “please mention bisect. Please mention bisect” and was not disappointed. Very few people I encounter seems to know this gem exists. It has literally saved me DAYS of work at this point, and having a bunch of real commits squashed and rebased makes this not nearly as useful.

Jimb says:

January 27, 2024

Interesting! Admittedly it’s been _many_ years since I’ve used bisect (for the current project I’m on I know the code base well enough that “git blame” usually is more immediately useful), but when I did I recall it constantly getting confused and going down rabbit holes due to work-in-progress commits where, for example, all tests were failing, and squashing commits before merging was the solution to keep bisect working well. This was quite a while ago though, so maybe it’s been improved in newer Git versions though =).

Reply
- Dan Kelch says:
  
  March 2, 2024
  
  I think that’s still a valid concern – I’ll admit that’s the problem with the laissez faire method I’m advocating here. I do try and avoid committing broken code as much as possible for this very reason, but I’m not perfect 🙂
  
  Might be fun to try going through a bisect on your project – I wonder if it still feels the same as the last time you used it!
  
  Reply
Wolf Merrik says:

January 27, 2024

Bisect is actually something I forget exists way too often… but holy hell is it handy in narrowing things down and finding what I’m looking for. And yeah, as you mentioned squash/rebase really diminishes your ability to use this properly (or honestly find anything).

To OP, great article. Having a clean commit history looks(better? I guess) from the outside, but from a functional/developmental standpoint it can actually cripple you and your team. Even from a user point of view in experiencing an issue, or another team working on an a completely different piece and experiencing an issue you and your team went through…. It’s endless, you know it lol.

Reply
- Dan Kelch says:
  
  March 2, 2024
  
  Thanks for the comment, I appreciate it!
  
  At the end of the day, I just want to reduce the overhead required to deliver new features and fix bugs. I think clean git histories are a great thing to strive for, but the chase for perfection can be more harmful than helpful.
  
  Cheers!
  
  Reply
Dan Kelch says:

March 2, 2024

No kidding, bisect is a lifesaver.

I’ll admit, it can be a bit annoying to go through broken WIP commits while bisecting, it’s certainly easier when the project builds without errors and the test suite runs. But there’s no guarantee that carefully pruned, rebased, and squashed commits will work without some modification. At the end of the day, looking through small commits to find the offending code is much easier than a squashed feature branch.

Thanks for the comment, and I hope you don’t have to bisect for a long, long time 🙂

Reply

Rowan L says:

January 27, 2024

Interesting, balanced post. “quick WIP commit before running out the door”, that’s been me. Worrying about messy history – me. Now retired and mostly develop just for myself, but git is still very important for me. Always learning, never too old. Thank you.

Dan Kelch says:

March 2, 2024

Thanks for the comment Rowan! Git is such a helpful tool for personal projects, just to protect us from inevitably making mistakes and getting back to a good state. No shame in quick, messy commits – don’t work for your tools, make your tools work for you!

Reply

Felix says:

February 8, 2024

I disagree with some of those points. Mostly, because I think, cleaning git history should be done by each developer before “publishing” (pushing) their contribution (commits). It’s like the difference of a pile of notes about a topic and a refined presentation. I don’t just throw a pile of notes into the audience and leave it to everyone else (including future me) to try and dissect them. Instead, I show a refined presentation.

> Squashing commits can remove the context around why certain decisions were made, which can be crucial for understanding the rationale behind the current state of the code when revisiting it in the future.

Better describing and preserving the context of decisions was one of the big reasons for us to introduce clean git history in the first place. Because the chaotic thought process during implementation is not helpful with understanding the rationale behind the effective change afterwards, let alone the “fix bug” commits, which are just a crime. Cleaning up the chaotic thought process and putting together the “story of the change” including reasoning significantly helps with PRs and Sherlock Holmesing issues.

> At times, the code that gets removed during the development process can be as informative as the code that remains.

Put the gist of it somewhere else. Keeping it there distracts from the intention of the change and makes it harder to digest it. It’s a digression from the actual story.

> Bisecting is a lifesaver when isolating issues, and it’s far easier with smaller commits than with a huge squashed feature branch.

Cleaned commits shall stay small. The clean up process is there to make meaningful changes, which add up and should actually help with using bisect. If someone misunderstands “clean git history” as squashing all commits of a feature into one commit, that’s not what was meant.

> Rebasing, especially when done across branches that are widely divergent, can introduce errors that are difficult to trace.

Who would do that?

We try to create “clean git history” on a local, per developer base. A developer shall have no fear of creating many small commits, eg. to support the non-linear thought process. “Commit often” is one of the guidelines (every couple minutes). But before pushing those commits, a developer shall clean up the non-linear thought-process-mess and make the “to be pushed” commits a meaningful, revised, “ready for PR” change. The PR is an offer, which gets accepted better, the easier it is to read and understand. Cleaning one’s local commit-mess is a nice process to review one’s own changes and makes them more intentional. It takes time, which invests into making a change good and concise. The “golden rule” applies: Only rebase local commits.

Probably, many of the mentioned points are not applicable to our environment of native mobile apps, which has small teams and project complexity, ususally with less than 250k LoC.

Dan Kelch says:

March 2, 2024

Thanks for your insights! It’s clear you put a lot of thought into your Git workflow, and I appreciate you sharing all of your perspectives.

Re: presentation vs. notes — I tend to view commit messages more like behind-the-scenes notes, with the final feature being the main presentation. But your point on tailoring those notes for clarity’s sake does strike a chord, and I definitely see the value in refining the pile of notes into a clear, presentable format.

Re: context — I get where you’re coming from on crafting the story of change. My slight reservation is about possibly losing the genuine, messy journey of development, while the over-curated “story of the change” may differ from reality. I think the suggestion to document removed code elsewhere to keep the main story focused is a practical tip, but I worry that it can get lost (or actually maintained). It’s dead simple to run `git log -p` and grep for a phrase or code that I _sort of_ remember, and see all relevant changes, even if the code was deleted. I probably won’t remember to look through old gists, or add one when I’m deleting some code.

Re: bisecting and small commits — You’re right that clean commits shouldn’t equate to mega-commits (squashed feature branches do, though). Bisecting is a lifesaver, but I admit that it’s frustrating to go through a slew of broken commits. That’s absolutely a downside of my laissez faire approach to git maintenance. If we can keep commits small, functional (able to compile, run tests), and do so without investing a ton of time into refining them, I’d say that’s worth striving for.

I really appreciate your golden rule of “Only rebase local commits”. I think a lot of issues are introduced when everyone on the team is force-pushing, especially when there are junior developers with less git experience. Conversely, rebasing can be a destructive operation, and local commits can be lost forever, unless you are well-versed in recovering from the reflog (I’m not!).

Re: project & team dynamics — it’s a good reminder that there’s no one-size-fits-all strategy in development. There are so many moving parts in a software team, and a lot of complicated workflows, and I just haven’t found any payoff with clean git histories… yet. I’m always trying to improve my craft, but perhaps I take this stance because I’m just lazy 🙂 Might be worth giving this a try soon.

Anyways, I really appreciate all of your feedback – you’ve given me a lot to think about. Thanks again for sharing, cheers!

Reply

Chris says:

June 12, 2024

I work in a regulated environment (FAA/EASA). Git has always been a hard sell to certification folks due to the history thing. Sometimes losing code, like you mention how deleted code can be squashed or re based such that you can’t resurrect it, can be a real problem. In general, the commit history needs to be an immutable log (at least as t some level) for code based that are in safety critical environments.

Nobody Cares About Your Git History

Article summary

Decoding “Clean Git History”

Why Rebase?

Large organizations and open-source projects

Small, highly experienced teams

Individual contributors working on isolated features

The Upside of a Clean Commit History

A streamlined log

Less WIP clutter

Easier to revert changes

The Downside of Over-Curation

Loss of detail

Finding old code

Tracking the author

Git bisect to the rescue

Potential for errors

When, not if

Time consumption

Perils of perfectionism

The value of commit messages

They’re just lies, after all

One rebase, all rebase

Why this matters

The ripple effect

Utilizing Git History Correctly

Respecting diverse workflows

Avoiding disruption

Embracing change

Finding the right process

The Ultimate Purpose of Git

Join the conversation Cancel reply

Tell Us About Your Project

Article summary

Decoding “Clean Git History”

Why Rebase?

Large organizations and open-source projects

Small, highly experienced teams

Individual contributors working on isolated features

The Upside of a Clean Commit History

A streamlined log

Less WIP clutter

Easier to revert changes

The Downside of Over-Curation

Loss of detail

Finding old code

Tracking the author

Git bisect to the rescue

Potential for errors

When, not if

Time consumption

Perils of perfectionism

The value of commit messages

They’re just lies, after all

One rebase, all rebase

Why this matters

The ripple effect

Utilizing Git History Correctly

Respecting diverse workflows

Avoiding disruption

Embracing change

Finding the right process

The Ultimate Purpose of Git

Related Posts

Lessons Learned from Implementing an Inline Document Viewer

Here’s How You Get Case-Insensitive Mid-Word Tab Completion in ZSH

Preview Environments with Neon’s Database Branching

Keep up with our latest posts.

Join the conversation Cancel reply

Tell Us About Your Project