This is a follow-up article to A tidy, linear Git history, going into more details of why I think that rewriting your Git commit history (i.e. rebasing) is usually the right thing to do.
Two different history models
When it comes to Git history (or any other version control system history, for that matter), there seems to be two dominant views of what the commit history should represent:
- A record of your work (I’ll call this a work log).
- A recipe that describes how to implement your feature (I’ll call this a recipe).
So, what is the difference?
We can define the work log to contain all the relevant actions that were carried out during the implementation of a specific feature, from the point that the work started. This could include things like:
- Different revisions of your feature (e.g. an early alpha, the draft before code review and the final tested and bug-fixed version).
- Different parts of your feature (e.g. class implementation, build system changes, unit tests, …).
- Merges from upstream branches (e.g. master) and the corresponding conflict resolutions.
- Various fixes along the way (spelling corrections, cross-platform bug fixes, style guide fixes, etc).
Taking this to the extreme, the work log could include all your local edits (e.g. whenever you save or build your project), but that’s usually not very practical.
A recipe, on the other hand, just includes the minimal set of steps required to produce a specific feature. Essentially, the recipe for a feature is a set of logically separate steps, excluding things like fixes and conflict resolutions.
As an analogy, the word recipe is described in Wikipedia as:
“A recipe is a set of instructions that describes how to prepare or make something, especially a culinary dish.”
In our case, the “culinary dish” is more likely something like a feature in a software program.
Say that you are developing a multi-platform desktop application that has functionality for loading files of various formats (e.g. image files), and your task is to add support for a new (hypothetical) file format “XYZ”.
Here’s an example of how the corresponding work log and recipe for the new feature could look (think of this as a Git commit history of a topic branch that is being merged into master):
As can be seen, the recipe is much more to the point – only telling the story about how to build the feature. Another way to see it is that the recipe is a distilled version of the work log.
Also to be noted is that the work log is never rebased – the whole idea is to preserve the history without altering it. Hence there are a few merges from master (instead of rebasing your feature branch on top of master).
How to turn your history into a recipe
There are two key ingredients to building a recipe type history:
- Rebase your topic branch on top of master instead of merging master into your topic branch. This is generally good practice, and it also simplifies #2…
- Rewrite your history (typically using interactive rebase) while working with your local branch, before pushing it to your remote, after a code review is done (squash away any fixup commits), and/or before merging it to master.
Both of these alter your commit history. Some may think that this is borderline blasphemous, but remember that you’re only altering the history of something that has not yet been integrated into mainline. In fact, it’s not much different from excluding all the back and forth edits that you did during development but never included in your commits in the first place.
After all, Git is a distributed version control system, which gives you the power to manage and edit your history before publishing it.
Advantages of the recipe model
The way I see it, there is only one advantage of the work log model: it’s slightly less work for the developer, since you essentially just push your work log without any kind of history editing.
On the other hand, there are a few compelling advantages of the recipe model:
- It can greatly simplify code reviewing (and hence improve the quality of the code review). In the work log case the reviewer either has to wade through loads of irrelevant changes, or review the squashed history (which can be quite overwhelming in some cases). In the recipe case, the reviewer can deal with each commit independently, knowing that each commit does what it’s supposed to.
- It’s a better fit if you want a linear commit history, and it comes with similar advantages (easier to follow history, easier to cherry-pick or revert specific features, etc).
- It generally gives a more compact history compared to the work log model (there is less noise in the history). As a positive side effect, this also translates to a smaller repository (though I’m not sure that it matters that much).
- In terms of preserving information for future maintainers of your code base (or whatever you have under version control), the recipe history tends to be much more to the point and easier to understand.
Which history model to use?
I am obviously a proponent of the recipe model, and I think that whenever it’s feasible to us it – go for it!
If you just need to get things done, and don’t care too much about the history, the work log model can be more efficient at times. But beware: It can be a slippery slope. Similarly to how technical debt can get out of hand, it can be hard to retrofit a recipe model once there’s a critical mass of developers and supporting processes (build systems, code review routines etc) that have adopted a work log model.
In reality, every project and every task will use a mix of the two models, and the trick is often to strike a good balance between the two.
5 thoughts on “Git history: work log vs recipe”
Work log could be continuous integration, perhaps? C.f. http://www.martinfowler.com/articles/continuousIntegration.html
CI, as described by Martin Fowler, is based around a centralized VCS (e.g. Perforce, SVN, …), and focuses on getting things into mainline often in order to do post commit server-side automated tests and minimize integration costs (potentially sacrificing code and history quality).
With a DVCS where local branches are cheap, I don’t see the same need to push unfinished tasks on a daily basis. Instead each developer will keep up to date locally by rebasing on the latest master, and thereby minimize the integration effort (you’re doing local integrations on a daily basis). In git you can also easily do the server side automated builds and tests on your topic branch before merging it to master, which further reduces the risk of breaking the mainline. Interesting article on the subject: Continuous Integration is Dead.
It’s true, however, that if you have a policy to push often (even before your task is finished), you will end up with something similar to the work log model. This is one of the reasons why I’m not very fond of CI (at least some of its aspects).
Nice writeup, I generally agree.
One question is: Do you keep your work log, or do you totally discard it? Sometimes I keep chains of “work log” commits as tags, locally. But then only push the cleaned-up version.
Next question: Do you ever push commits you plan to rebase? E.g. as a “preview” branch for code review? And do you ever rebase after a code review? A problem is that code review comments on github will not be linked to their fix after a rebase.
First answer: For anything but trivial rebases, I keep the work log in a temporary local branch until I’m convinced that the rebase was successful (helps with diffing and redoing the rebase), but then I throw it away.
Next answer: I’ve written down my thoughts in a couple of other posts: A tidy, linear Git history (which goes into details on pushing preview/unfinished branches) and GitHub pull request != code review (which rants about how bad GitHub is at managing rebase workflows). I’ve just started using GitLab, and I’ve heard that it has better rebase support – we’ll see…