Recently while I was working on some continuous-integration (CI) test suites for the WordPress Block Editor (Gutenberg) I had to wrestle with some of the details of what happens behind the scenes in a Pull Request (PR) on GitHub and thought I’d share what I learned here in this post. Pull Requests involve some “magic” I have taken for granted for a long time, but there are some interesting details behind the curtain.
In this post we’re going to explore the hidden “merge branch” that tracks a PR’s development and see how GitHub uses that merge branch to provide diff views and previews of how the PR will impact the project once merged. Read on to dive in – this is a rather lengthy post because we build up a repo step by step to examine different scenarios that are possible in different PR situations.
Pull Requests aren’t branches
You may have already known this, but PRs are a separate concept than a branch. For one, branches are transient while PRs leave a trace of their activity. Still, a PR mostly refers to a single branch during the time it is developed, and it can only point to one branch. A branch can be reused later on after it’s merged and deleted, but PRs cannot be deleted.
Behind the scenes, GitHub creates its own branches during the life of a PR. Namely it creates
refs/pull/NUMBER/head (which tracks the
HEAD of the branch under development) and
refs/pull/NUMBER/merge (which previews a merge into the target branch).
The “target branch” is what I often call the base branch. It’s the name of the branch into which the PR will merge when it’s merged. For many people this is usually
trunk, but it can be any available branch. It’s not necessarily the branch from which the PR forked; in fact, we can have PRs targeting other PRs and we can have PRs forked from other PRs which target the main branch.
A PR’s “merge branch” is briefly mentioned in the GitHub Actions docs, but to the best of my searching isn’t defined in their docs beyond a passing mention that diffing views are different when comparing branches than when reading a PR. The merge branch provides a preview of how the target branch will look after merging the PR. There’s a catch when conflicts exist though, which we’ll discuss later.
To that end what a PR is is more or less a view of how a chosen target branch will be once a given changeset has been merged into it, of how a PR will impact the project once it’s done.
We’re going to construct a series of PRs to establish different scenarios where GitHub is interacting with the PR branch. We’ll start simple and end with a conflict and see how the pointers behind the scene track updates in both the target and feature branch.
This post is a bit off-the-cuff, but I hope that by the end of the first or second scenario some of the why and what behind our explorations will start to clear up.
You can also jump to the conclusions if you want a summary of what this all means.
Single-commit PR with no conflicts merged into target branch.
In this case we can see that when we created the PR, GitHub created
dde00c3 as the merge branch; this existed until we merged the PR. At that point it created a new final merge commit,
3a0f58c, and deleted the merge branch. When I deleted the PR’s branch from the GitHub UI, it also deleted the branch
single-commit-pr. It left
refs/pull/1/head though, as that can be used to recreate the PR at a later time if we choose to restore the branch.
- We forked a new feature branch from another branch.
- In creating a PR GitHub created a new
refto track the
HEADof that feature branch. This
refstays throughout the life of the repository as an artifact to reference where the PR was when it was last updated.
- GitHub also created a new commit merging the feature branch into its target branch. This commit was created not on the merge base, from where we forked, but on the
HEADof the target branch. In this case they were the same commit.
- Once we merged the PR GitHub removed our feature branch and its internal merge branch but left the
refs/pull/1/headref/commit for historical reference.
Single-commit PR with no conflicts merged as a fast-forward commit.
There’s nothing that interesting in this scenario. The same merge branch was created until merging the PR, and since it was a fast-forward merge
trunk doesn’t carry the branching history. We’re still left with
refs/pull/2/head pointing at the commit in the PR, but unlike with the normal merge in the first example, this ref now points to the commit in
trunk‘s history since the fast-forward updated
HEAD instead of creating a new merge commit.
Single-commit PR with no conflicts squash-merged.
While similar to the previous scenario we can see that with the squash-merge strategy, GitHub will create a new commit even for a single-commit PR. Our
refs/pull/3/head points to an off-
trunk commit because GitHub didn’t use the commit in our PR, it used a new one it created that contains the same changes as our feature branch contains.
Onto more interesting situations, things get more complicated when the target branch continues to see development while working on PRs.
Single-commit PR when target branch advances without conflict.
This one starts like the first scenario, before
trunk has been updated. The merge branch is created merging our feature branch into the target branch.
A funny thing happens though when we advance
trunk. About a minute after doing so,
refs/pull/4/merge updates on GitHub’s side to a new commit and the old merge branch has been orphaned.
Notice how the merge branch now has followed
trunk, and it did so without any interaction on the PR. This update to the PR was the result of activity outside of the PR. We can see the implications of this by directly diffing the branch and its target.
If we look at the
git diff trunk target-update-no-conflict we see updates from
trunk as if our branch reverts them while the updates from our feature branch are found as expected. This is because our branch has now fallen behind its target and
git stores snapshots, not diffs. When it compares the files in the branch they don’t have those updates from
trunk, thus it looks like we reverted them.
diff --git a/README.md b/README.md index 1253865..8483e4e 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,8 @@ ### PRs - 1. Merging a single commit without conflicts dmsnell/gh-pull-request#1 + 1. Merging a single commit without conflicts #1 2. Merging a single commit without conflcits, fast-forward #2 3. Squash-merging a single commit without conflicts #3 + 4. Target branch updates with non-conflicting changes #4
We can use
git diff trunk...target-update-no-conflict to only look at the commits in our feature branch since the merge-base of the two, but if we
git diff trunk 38345e3 it produces the same diff. That commit SHA is the
refs/pull/4/merge commit, but we can’t tell
git diff to use it since it’s not a local ref (we could create a local ref for it, but that’s not the point of this post).
diff --git a/README.md b/README.md index 1253865..57524d4 100644 --- a/README.md +++ b/README.md @@ -5,4 +5,5 @@ 1. Merging a single commit without conflicts dmsnell/gh-pull-request#1 2. Merging a single commit without conflcits, fast-forward #2 3. Squash-merging a single commit without conflicts #3 + 4. Target branch updates with non-conflicting changes #4
At this point we can observe that whenever GitHub sees a commit in a branch, it iterates over every PR whose target is that same branch and then creates a new merge commit from that PR’s
head branch into the updated target.
For Gutenberg, which at the time I’m writing this has 1,772 open branches, this means that every time a PR merges into
trunk something behind the scenes is creating about that many new merge commits.
Single-commit when target branch advances with a conflict.
Here’s where it gets more fascinating. What if GitHub tries to update that merge branch but there’s a conflict? Let’s set the stage by recreating the start conditions from the last scenario.
Now we apply a conflicting commit to
trunk and…nothing changes! It never updates the merge branch because it can’t.
git diff trunk target-update-with-conflict shows that they are different and we can see the conflict.
diff --git a/README.md b/README.md index 2ad1742..4bc6174 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,8 @@ ### PRs 1. Merging a single commit without conflicts dmsnell/gh-pull-request#1 - 2. Merging a single commit without conflcits, fast-forward dmsnell/gh-pull-request#2 - 3. Squash-merging a single commit without conflicts dmsnell/gh-pull-request#3 - 4. Target branch updates with non-conflicting changes dmsnell/gh-pull-request#4 + 2. Merging a single commit without conflcits, fast-forward #2 + 3. Squash-merging a single commit without conflicts #3 + 4. Target branch updates with non-conflicting changes #4 + 5. Target branch updates with conflicting changes #5
git diff trunk...target-update-with-conflict still shows the changes we introduced in the feature branch, but we can’t apply those in a merge without resolving the conflict.
diff --git a/README.md b/README.md index 57524d4..4bc6174 100644 --- a/README.md +++ b/README.md @@ -6,4 +6,5 @@ 2. Merging a single commit without conflcits, fast-forward #2 3. Squash-merging a single commit without conflicts #3 4. Target branch updates with non-conflicting changes #4 + 5. Target branch updates with conflicting changes #5
The “Files Changed” tab in GitHub shows this last view: what changes have been introduced on this branch since it forked from its target. However, if we run CI workflows in GitHub Actions, we’ll discover that the
GITHUB_SHA still points to that now-frozen merge branch. Frozen? Let’s add another commit to our branch.
Finally we’ve reached the point that motivated this entire exploration. We have a real mess of things. The PR’s “Files Changed” view shows the diff from the commits in the feature branch against its merge-base with the target branch; nothing shows the changes against the latest target branch anymore, and CI jobs will see
GITHUB_SHA pointing to an old version of the feature branch.
If our merge branch had already been updated tracking non-conflicting changes in
trunk, then at this point it would be frozen at the most-recent non-conflicting merge, so would not be showing changes against the merge base. It would show the changes since the last time the PR could be merged into its target without conflict.
I’ll close this exploration by doing something less common in practice because GitHub doesn’t support this flow if you have branch protection rules in place. When I run
git merge target-update-with-conflict it presents merge conflicts to me. I resolve them and then
git merge --continue. GitHub merge commits are usually “empty” in that they combine two parent commits and that’s that. My merge commit contains a resolution to the conflict and so is a bit different than either of its parents. In a way it hides the conflict resolution inside the merge commit.
We have to always remember that git stores snapshots, not diffs, because normal merge commits aren’t truly empty in this sense either. Some people don’t like this strategy because it leaves no trace or clear diff of what changes were made to resolve the conflict – it’s there but you have to examine the diff between the parent commits and compare that against the merge commit to see how they were resolved.
Even so, that merge commit is going to contain the results of the merge in either case. Resolving conflicts in a separate commit does leave an easier marker for how they were resolved, and importantly, lets GitHub resume updating its merge branch so that you can run your CI suite before merge. Many merge commits do involve conflict resolution implicitly like this but if
git is able to automatically determine how to resolve the conflicts then it just performs that resolution without asking. The merge commit is a snapshot of how two different parent commits were merged, not a diff of how we got there.
Technically in these cases you can still run your CI suites, but if they rely on the SHA ref for the PR it’ll reference an old version of the code. You would need to build conflict detection in your script to know that something is wrong (this can be done by examining if the
/head ref for the PR points to one of the parents of the
/merge refs for the same).
When I was working with Gutenberg’s CI workflows I started with a misconception about what PRs actually are and what pointers GitHub provides when referencing them. At first I assumed that we were running our jobs comparing what’s in a branch against what’s at the merge-base with its target branch; I was worried that our tests would not capture upstream changes in the target branch which might lead to different outcomes in the CI suites than we would get if we were to run the same workflows against the target branch after merging a PR.
What actually is happening is better for us, because when we run those tests we’re already incorporating any changes from upstream into our feature work and we get the chance to detect if a change in a new version of
trunk (or whichever branch is our target) presents a problem for us once we merge.
The sticking point is that we have to pay attention when conflicts arise between the two branches because our Ci workflows will continue to run and may even show successes when in fact they are running on an old version of our PR’s code, comparing potentially stale changes in the PR against potentially stale versions of the target branch.
And lastly, I guess, it’s worth repeating, that upstream changes in other branches can impact your PR even if it hasn’t been updated, rebased, or pulled in the latest from its target branch, because the PR in essence is the end-result of the work in the feature branch, not the feature branch itself.
All of the commit SHAs referenced in the text and diagrams correspond to the commits in my demo repository where these scenarios were created in
git and the PRs are available in GitHub.
The diagrams were made with the Mermaid Live Editor using
If you want to explore the branches associated with a PR you can run
git ls-remote. The command accepts a matching syntax to narrow the results. Narrowing can be handy on big repositories such as Gutenberg, which holds 47,084 refs as I write this sentence. Suppose you open PR #12345:
git ls-remote origin 'refs/pull/12345/*'
We have to quote the wildcard matching because otherwise our shell will try to perform expansion before sending it to
git. You can run it directly without quotes if you have a fully-known ref name.
Suppose we want to examine the merge base for our PR. I’m going to explore one of my currently-open PRs. We have to fetch the commit because it doesn’t come over as a branch. It’s just a ref pointing to a commit (despite being called the “merge branch”). We could fetch it by its ref directly, but we need to know the commit SHA anyway for
git log (unless we create a local branch for it) so I like to do it this way and get it into my copy buffer.
$ git ls-remote origin refs/pull/46345/merge f335f7b56c8863538b276ccfe222027a80f35256 refs/pull/46345/merge $ git fetch origin f335f7b56c8863538b276ccfe222027a80f35256 From github.com:wordpress/gutenberg * branch f335f7b56c8863538b276ccfe222027a80f35256 -> FETCH_HEAD $ git log --graph --topo-order trunk f335f7b56c8863538b276ccfe222027a80f35256 ... | | * commit f335f7b56c8863538b276ccfe222027a80f35256 |/| Merge: ca1acf3fe3 9e752e5810 | | Author: Dennis Snell <email@example.com> | | Date: Tue Dec 20 12:00:38 2022 -0800 | | | | Merge 9e752e5810bc4947774363059b0b8c0c442230ee into ca1acf3fe301cfe5fc4ec50edb6e5b3702942bb0 | | ...
Bravo if you made it this far! I hope these ramblings make sense. If you notice something I’ve said that needs correction please leave a comment. Would love to hear from you if you found it useful or if you have a good war story that relates to the merge branch, particularly in conflict scenarios.