What do the diffs in a GitHub Pull Request actually tell you?

by Aidan Harding - July 15, 2022
What do the diffs in a GitHub Pull Request actually tell you?

You’re reviewing a Pull Request (PR) on GitHub. You see a list of diffs from the files changed in the PR. But what, precisely, do the diffs represent? I had a model for this in my head. Recently, I was forced to realise that my model must be wrong. Reviewing a PR started me on a journey into “two-dot” and “three-dot” diffs in git. That journey updated my mental model for GitHub and I hope that it will also be a useful lesson for you when you are working with PRs on GitHub yourself.

First, terminology

Let’s say that we have two branches: main and feature. The Main is our production code, the feature is where we are working on a new feature. Since the feature branch was created, the main and feature have both changed. We have copied the changes from the main (commit: m2) into the feature by simply copying the code (commit: f1). Now we want to merge the changes from the feature (commit: f2) back into the main using a Pull Request (PR).

the commits to main and feature branches

I would note that this is not an ideal situation, but we do not live in an ideal world. And it’s the imperfections that teach us things.

To further illustrate this, I built the changes on GitHub here: https://github.com/aidan-harding/pr-diff/

In that repository, m1 is an empty sfdx project; m2 and f1 are additions to a class called MainClass, and f2 adds FeatureClass.

What was my mental model?

My intuition was that the PR diffs would show me what changes will be introduced if I merge the PR. That would mean that the diffs would be the difference between the target branch of the PR, main, and the source branch, feature. So, I was expecting the diffs in the PR to be just the differences introduced by the actual feature in commit f2.

What does GitHub do?

In the diffs for a PR, GitHub actually shows you the accumulation of all the changes since the source branch diverged from the target. It accumulates commits f1, and f2. So, even though the commits m2 and f1 make the same changes, that is not clear from the diffs. It looks like f1 is introducing the code from m2, even though it is already present on main.

You can see this here on GitHub: https://github.com/aidan-harding/pr-diff/pull/1

In plain git, this would be called a “three-dot” diff comparison. We would run it with the command

git diff main…feature

 

What else can git do?

Git has another notion of how it could compare two branches for merging: the “two-dot” diff. This more closely accords with the mental model I started with. The two-dot diff compares the state of all files at the point where you want to merge. So, in this case, the two-dot diff would show simply the code added by f2. It would recognise that the content of m2 is already in both branches.

git diff main..feature

Atlassian’s docs on git diff are nice and readable.

What should you and I do?

First, make sure that your feature branches are short-lived. The shorter lifetime of the branch, the less likely it is that you will have to care about the difference between two-dot and three-dot diffs.

Second, update your feature branches from the main before you create your PR. That way, the two-dot, and three-dot comparisons give the same results and the mental model holds the diffs in the PR to show what changes feature will be introduced to the main.

If you are going to update the feature from the main, you have two options described below. Note that in the diagrams, the feature branch is called feature-main or feature-rebase instead of just feature. This is so that you could see all the scenarios in GitHub, you wouldn’t really rename the branches for this:

  1. Merge the changes from main to feature (git checkout feature-merge; git merge main). Now you have the extra commit that tells the three-dot diff what’s going on: it knows that m2 is part of the history of the feature branch. You can see the result on GitHub here: https://github.com/aidan-harding/pr-diff/pull/2
    result of merging before the PR
  2. Rebase the feature branch so that it includes the main changes and puts the feature-specific changes at the end of the history (git checkout feature-rebase; git rebase main). The end result is a git history that looks like we did the right thing all along: branched from m2 instead of m1, you can see that here https://github.com/aidan-harding/pr-diff/pull/3
    result of rebase before PR

Both options provide you with a good PR to bring the feature changes back to the main. Again, Atlassian has good documentation on choosing between merging and rebasing.

Final Thoughts

The behaviour of GitHub regarding Pull Requests is documented, but the intuition I had was so strong that I didn’t feel the need to read those docs. And when I did, I could scarcely believe my eyes. I was compelled to read the references and try the above scenarios to confirm that this really is how it works. I don’t think I’ll ever stop learning about git, but this was a nice step forward in that learning.

Have fun and remember:

  • GitHub is not the whole git
  • Keep your branches like your pastries: fresh and quickly consumed

Related Content


Get In Touch

Whatever the size and sector of your business, we can help you to succeed throughout the customer journey, designing, creating and looking after the right CRM solution for your organisation