What’s Under the Hood in Git?

Git is the ubiquitous version management tool, but most of us work with it only through the higher-level commands. However, under the hood, it uses just a small set of powerful commands.

Today, I’m going to walk you through the process Git takes to go from untracked files to commits on master. I’ll cover some of these commands and show how they power Git, but I’ll skip some of the specific flags and details for brevity’s sake.

Git Objects

The simplest way to think of Git is as a really fancy key/value store. Git Objects are the core of this data structure, with each one representing one key/value pair.

These objects are created with the clearly-named command git hash-object. When provided with some binary data, this command will zip up and write the data to a file. The file is named with a sha-1 hash derived from the binary data. Different binary data will yield different sha-1 hashes, differentiating your Git Objects.

To read the data from a saved Git Object, you can use the command git cat-file -p . This will lookup the sha-1 hash and output the contents, if they exist, to stdout. The -p flag allows the command to infer the type of Git Object.

We haven’t mentioned anything about types yet, but all of the objects we’ve created use the Blob type. The other two types I’ll discuss are the Commit type and the Tree type.

Git Tree Objects

The next type of object to talk about is the Tree Object. Tree Objects are used to represent a “tree” structure of files of directories. The files and directories here are Blob and Tree Objects respectively. The structure of a Tree Object mimics the structure of a Unix directory.

Creating a Tree requires a two-stage set of commands. The first command is git update-index, which is roughly the equivalent of running git add to add a file to your staging area. The second command is git write-tree, which will take the current staging area and write it to a Tree Object. This Tree Object still isn’t quite a commit on a branch, but it’s almost there.

Git Commit Objects

Commit Objects are the last object type we’ll discuss here. They store a reference to a Tree Object, a commit message, and optionally, a previous commit.

To create a commit, we’ll use echo ‘’ | git commit-tree . This will take a Tree Object and create an un-linked commit with the specified message. To link a commit with a link to a previous commit, use the following command echo ‘’ | git commit-tree -p .

The commands we’ve covered so far will get us a commit history, but they’re not attached to any branches. To attach the created commits to master, you’d have to add one more command: $ echo "" > .git/refs/heads/master. The command might seem a little simple, but it’s because branch references are really just pointers to commit hashes.

That’s it. Following these commands will take you from nothing to a set of commits attached to the master branch. I hope this gave you some insight into how Git works under the hood.