It’s pretty hard to keep the commits in a working branch in a good legible state - certainly it takes work to do it.
In 25 years of professional development I’ve never really had a situation where the commits on a branch would have helped me understand what was going on a year ago when the work was done. That includes pretty big bits of project work.
I’d much rather have a trunk with commits at the granularity of features.
I on the other hand have never come across a scenario where I run git bisect to find a commit that broke something, discover a small commit as a culprit and wish I had instead found a commit that's hundreds of lines long.
What has happened a whole lot though is the exact opposite.
It might be better to view a commit as a natural unit of working code. There are a lot of units of working code which would be tedious to be introduced as a only a few lines.
As such, a new codebase is likely to grow by large unwieldy commits and a mature one by targetted small commits.
Our strategy is to squash on merge and ensure the JIRA ticket reference is in the MR title. You have the granularity of the feature which is going to help guide you on the intention. It's also much easier to enforce. People like to write and commit code in their own way.
I've had separate commits come in handy several times when `git blame`ing when working with people who actually described what changes were about in their commits (which, unlike comments, don't go out of date).
In 25 years of professional development I have several counter examples where some bit was either a trivial git revert of a single commit - among multiple ones in a branch - away, or an absolute pain because the squash-merge commit had flattened too many concerns together, concerns that were perfectly split in the topic branch but that branch was long gone by virtue of being auto-deleted on PR merge.
Coincidentally, every single squash-merge commit advocate I've had the unfortunate debate with was a regular practitioner of public tmp / tmp / try again / linter / tmp / fix / fix / haaaaaands commits.
Note that I'm not against squashing/history rewriting e.g rebase -i and stuff (which I'm a heavy user of so as to present sensible code aggregation reviewable per-commits), only squash-merge.
I take it you haven't had the pleasure of working with your average ("dark matter" as they're called here) developers. I wouldn't call myself an "advocate" of squashes, but it's often the only practical way of keeping git history somewhat usable when working with people who refuse to learn their VCS properly.
I chunk my changes into tiny commits ("linter"/"tmp"/"wip"), but then rebase aggressively, turning it into a set of logical changes with well-formed commit messages. git bisect/revert work great with history written in this way even years layer.
But: most of the people I've been interacting with also produce lots of "wip"/"tmp", but then skip the rebase. I can only offer my help with learning git rebase for so long before it starts taking too much time from the actual work. So squash it is: at least it produces coherent history without adding thousands of commits into `--ignore-revs-file`.
And sometimes, a patch is just that big. especially in UI works where a single change can cascade down to multiple layers.
> I chunk my changes into tiny commits ("linter"/"tmp"/"wip"), but then rebase aggressively, turning it into a set of logical changes with well-formed commit messages. git bisect/revert work great with history written in this way even years layer.
In a PR based workflow, it has become easier to have the PR be a logical unit than to `rebase -i` all the time on my end.
If you work with a ticket system, squash-merge gives you the same granularity, where a commit would refer to a single ticket.
A ticket should be atomic describing a single change request. PR in this case are the working room. It can be as messy or as clean as you want. But the goal is to produce a patch that introduces one change. Because if you would rebase -i at the end, you would have a single commit too in the PR.
No, you wouldn't. git rebase -i is to remove noise, which is about merging commits that, well, make more sense together than apart. Which is mostly about summarizing trivialities (e.g. several typo fixes) and squashing fixups into commits that introduced a problem in the same branch.
A typical bugfix branch might look like this after rebase -i:
Those looks more like noise to me. A squashed merge (or a final squash before PR) would be:
TN 43 - Fix mismatched interface between Foo and Bar
We've moved the X property to a more appropriate place and
improved the documentation for Feature Foo. We've also found and fix
an O(n^2) implementation in feature Bar.
The the ticket TN-43 will have all the details that have lead to the PR being made: Bug reports, investigations, alternative solutions,...
The commit message is what's more important. I don't think I've ever needed what is in a merged branch. But I've always wanted the commit at one point to have tests passing and a good description of the patch. And all the talk in the engineering team are always about ticket. It does makes sense to align those.
They aren't noise at all and have found them useful a bunch in the past when I worked at a place that didn't squash. Commits at this level act as immutable comments that don't get out of date. Provided you do --no-fast-forward merges, the merge commit is the feature commit and you can get the "clean" feature history with `git log --merges --first-parent`. Best of both worlds! Being able to `git blame` and get a granular message about why something was done can be really handy, especially when looking unfamiliar code.
I get where you came from, but I prefer having a more holistic view of a change, especially from a product perspective. So even when git-blaming, either I’m reading the current file or I go straight to the log of the commit (with message and diff).
I prefer granularity at a product or team level decision. Not workflow details.
I'm not trying to convince you to adopt or anything, but I'm saying you can have all of that without squashing with the caveat that you would need an alias to jump to the merge commit. Otherwise, you just treat merge commits as you would a squash one. Merge commits are just like regular commits that can have a custom message and show a diff.
> If you work with a ticket system, squash-merge gives you the same granularity, where a commit would refer to a single ticket.
With GitHub you can squash any PR merge. The link to the PR will include the complete history of the feature branch prior to the merge. Even the commit history prior to force pushes is tracked.
100%. I don't want to know how the sausage was made. It's similar to research papers, or history books, where the way we arrive at results or outcomes in the real world is often quite different from the way it's presented in the final form.
A good commit history is more like a well-written sausage recipe than like a TV documentary about scandalous sanitary conditions at Foo sausage factory ;)
I'd much rather reduce the risk of mutation to the trunk, by having small easily reviewable commits direct to trunk.
It's less about reviewing commits from a year ago, than making change low-risk today. And small commits can easily be rolled back. The bigger the commit, the more likely rollback will be entangled.
It better to have partial features committed and in production and gated behind a feature flag, than risk living in some long-lived branch.
Each commit should be small, have a descriptive commit message and be stand alone. I consider the Linux kernel a good example of how to do commits and messages right. Often the commit message is longer than the code change.
I strive to do that when making commits for work too, and that helps when going back in history and looking at history to motivate why a change was made.
While working I rebase all the time to move changes into the relevant commit, I don't find that particularly hard or time consuming. Doing this upfront is easy, splitting commits after the fact is not.
I consider this standard practice, at least in the sector I work in (industrial equipment control software, some of which is considered human safety critical).
> In 25 years of professional development I’ve never really had a situation where the commits on a branch would have helped me understand what was going on a year ago when the work was done.
My professional experience contrasts with yours. I've even worked at a company where commit history and PRs were so central to understand and explain changes that PRs were even used as the authoritative sources on how to implement features and use frameworks.
Maybe a slight misinterpretation of what I meant. The commit that goes with a PR is definitely useful context, but I’ve found more granular than that is seldom useful. Even big ones like “move from angular to react” - the details of someone getting something wrong in there don’t matter, it’s the scale of it that just makes me go “oh yeah, this is bound to be a mistake”.
Maybe different in other places, but after 15 years in my codebase, I’m still happy with a simple linear history.
> allows functions to read the context they’re called in
Can you show an example? Seems interesting considering that code knowing about external context is not generally a good pattern when it comes to maintainability (security, readability).
I’ve lived through some horrific 10M line coldfusion codebases that embraced this paradigm to death - they were a whole other extreme where you could _write_ variables in the scope of where you were called from!
I can write code like:
penguin_sizes <- select(penguins, weight, height)
Here, weight and height are columns inside the dataframe. But I can refer to them as if they were objects in the environment (I., e without quotes) because the select function looks for them inside the penguins dataframe (it's first argument)
This is a very simple example but it's used extensively in some R paradigms
Seasoned developers who would not make such a mistake could also be lead to think the llm is writing safe code if they don't ever read it line by line.
Vibe coders who are not seasoned developers, not sure if they would even know that this isn't safe code even if they read it line by line.
I definitely noticed this trend of article chaining, bu it must have been something else in this case, because i have absolutely 0 memory of seing that post yesterday. Actually, i think my thought came from an instagram video in my feed of a guy showing human division algorithm using sticks on a whiteboard.
I guess the internet was looking for something different to my “kick-[ass open]-source software”.
reply