HN2new | past | comments | ask | show | jobs | submitlogin
Git's initial commit (github.com/git)
351 points by olalonde on Nov 24, 2014 | hide | past | favorite | 121 comments


Well, while we're looking at FIRST POSTS, here's Mercurial's, self-hosting a month after git, and like git, also created to replace bitkeeper:

http://selenic.com/hg/rev/0#l10.1

The revlog data structure from then is still around, slightly tweaked, but essentially unchanged in almost a decade.


Mercurial is impressive for making Git's UI look intuitive.


Other way around


C'mon, man, you make branches by cloning the repository[1]. That's insanity.

[1] http://hginit.com/05.html


git at revision 0 worked the same way. You can see that there are no references in git at that time either. They're both copying bitkeeper, which worked the same way.

Nowadays git has references (branches), and hg has bookmarks which are the same, plus hg also has the option to label every commit with a permanent branch name. They also still have branching-by-cloning, and if you listen to Linus's original Google code talk about Git, you can see that he conflates "branch" and "clone" because that's what he originally envisioned! Even in 2007 he was still thinking in bitkeeper terms too. I bet that branching with references was Junio Hamano's idea, after Linus did the code hand-off.

I find branching-by-cloning a bit more natural in hg, because you can push to any repo. It's useful for quick, throwaway, local, easy testing out of ideas. In git, you can only push if your push doesn't modify HEAD, which typically translates into only being able to push to bare repos.


Interesting, thanks for the info. I've only been using Git since 2009 or so. I love Git's model of commits being objects in their own right, allowing you to cherry-pick them across branches, or rebase them to reorder or squash several commits together, for example.

My usual development routine is to make a ton of small commits that add up to a small set of good commits, to promote bisect-ability. I do dozens of rebases, squashes and amends when working on a topic branch. I have to use Mercurial for one of my clients, and it's a nightmare doing my development model in an SCM where I can't toss commits around willy-nilly like I can in Git.


> I have to use Mercurial for one of my clients, and it's a nightmare doing my development model in an SCM where I can't toss commits around willy-nilly like I can in Git.

Yes you can. `hg histedit` is a lot like `git rebase -i`, and `hg rebase` is like `git rebase` without -i and `hg commit --amend` is a lot like `git commit --amend`.

There are also some really cool things that we're working on with hg:

https://www.youtube.com/watch?v=4OlDm3akbqg


hg and git have feature parity at this point

hg just starts out more user friendly, and puts the rest in extensions. I like it more!

ok, hg is a bit slower


I love checking out very early versions of projects. You often get to see the essence before the real world came in and ruined the beauty of it.


I do this as well. It really should be more widely broadcast.

(I've also spent some time thinking about how it's kind of a hack, and what we can do to make it better: http://akkartik.name/post/wart-layers)


There is The Architecture of Open Source Applications series of book http://aosabook.org/en/index.html were one of the author of the software explain the essence of the program.


I know what you mean. The SystemD controversy motivated me to take a look at the initial version of NetBSD's init rc script, which was nicely simple.


My god... the comments. Looks like the reddit culture (i.e. fun for in jokes but not particularly professional)


"A marathon of clicking 'next page,' but the view is worth it." So, this commenter practically worships git, but apparently doesn't actually understand it well enough to know a better way to find the hash of the first commit and punch that into Github. Or, it was just a joke and they got there the quick way, but still felt obliged to post a dumb joke to inflate their own ego by "leaving their mark" on git. Maybe I'm being too mean, but yeah, I also think a lot of the comments are pointless.


> Maybe I'm being too mean, but yeah, I also think a lot of the comments are pointless.

Yeah, I think you're being a little mean. If you browse to that user's GitHub page, it looks like it's just somebody new who's excited about software. Good for them.

The comments are pointless, sure, but also harmless. Similar comments might crowd out productive discussion if they were on (say) the head of the master branch, but I doubt that any serious development is happening on git's initial commit anyway. Let the new people have their fun.

As far as newbie disruptiveness goes, it could be far worse. When I was getting started with Linux, I posted this cringeworthy gem to LKML, now enshrined in the archives for all eternity: https://lkml.org/lkml/2000/10/22/69 If newbies today are merely posting "yay, git!" and "thank you!" to a secondary forum where it doesn't disrupt development, I'd say they're doing pretty well in comparison. :)


Yeah, fair enough. Good on you for linking your own cringey post. I think a lot of developers have those early cringe moments, especially if they were young when they started.

As far as disruption, it did occur to me later that somebody may be getting notification emails about these comments. But it's not too bad, as I assume they could just send the emails to /dev/null, since Github is not the official host of git. (As a tangential note, I sort of wish Github would handle this better. So many Github-mirrored projects end up with something like "don't submit pull requests or open issues here, they will be ignored" in their repo description.)


AFAIK you can't search by commit hash. You have to do some URL manipulation.


  git rev-list --max-parents=0 HEAD | tail -1


Without having to pipe:

> git rev-list --reverse HEAD


It's probably the "I F*cking Love Computer Science" sub-reddits.


It's lots of subreddits. There are some serious ones, but the main-stream ones all contain the usual memes, injokes etc.

I enjoy diving into reddit every now and again. But I use github for work (and code for fun, although it's 'serious' fun). Although open-source collaboration is a fundamentally social activity, I think that mixing source control with a social network does inevitably leads to these kinds of comments. And I wouldn't dream of mixing that up with my professional identity.

Maybe it's just a marker of how versatile github is, and the community of people who write programs and put them in source control.


Interesting fact about Git is that it was self hosting in two weeks, IIRC.


How can something that isn't a programming language be self-hosting?


Version control systems are self-hosting when they are used to manage the primary repository of their own source code. This shows confidence because if the program breaks, then it breaks its own configuration management, which could be a headache to unravel. For example if the repository format changes, then the change has to be managed so that the old versions remain accessible through the new version of the software. If this is not managed, and old compiled binaries of the version control system disappear from existence, then it may become impossible to recover the old sources.

Thus, successfully self-hosting a version control system is some measure of evidence that the developers know what they are doing and can manage the changes. (And thus they understand change management and we can trust them to be working on version control software.)

http://en.wikipedia.org/wiki/Self-hosting

"Other programs [than compilers] that are typically self-hosting include kernels, assemblers, command-line interpreters and revision control software.


Overloading the term. The OP presumably meant that the source for git was under git source control.


'Hosting' means 'contain', 'serve'. A building can host a department or a convention, and a married couple can host a dinner party, with neither being required to be a webserver or programming language.


To add to that, IMHO self-hosting for VCSs is closer to the original meaning of the phrase than for compilers.


Yes, that's what I meant.


Maybe I've been drilled too hard by a couple of programming gurus, but I immediately noticed there are quite a lot of repeated yet unnamed magic constants in the (otherwise pretty clean) code. According to wikipedia [1] the rule to not use them is even one of the oldest in programming. Curious what kind of profanity Linus would come up with when confronted with this :]

[1] https://en.wikipedia.org/wiki/Magic_number_%28programming%29...


I've read so many git tutorials, I wish I had seen that README file before.


This. I find that learning from original documentation tends to be much more efficient than learning from third party blogs/tutorials which try to "simplify" things, and usually do the opposite.


It's so short.

The readme is the best explanation of git I've seen.


Does anyone know if the structure of git has changed much? I would like to read this thinking this is pretty close to the current implementation but I would have no idea. anyone?


It seems to be mostly the same, except that "Changeset" is now called "Commit" and "Current directory cache" is now called "index", but they are functionally the same.

It's actually really great to see that the model hasn't changed much (there must have been a long phase of thinking before though)

If you want to go deeper, you can check out this page:

http://www.git-scm.com/book/en/v2/Git-Internals-Git-Objects


You can just see the structure with git cat-file

    -> % git cat-file -p 8c48d1a36c3d11db44c75a431d4f09cb0035222f
    tree 288c2d5379768f685f391bdbffd31b8965318c63
    parent 002ae35061beef02453b7fb1045a50fa2f7f30f8
    author Denis Bilenko <denis.bilenko@gmail.com> 1246939605 +0700
    committer Denis Bilenko <denis.bilenko@gmail.com> 1246939605 +0700

    MANIFEST.in: include libevent.h and libevent-internal.h
    -> % git cat-file -p 288c2d5379768f685f391bdbffd31b8965318c63
    100644 blob 6e543dc13df1b556fd95530061ac0c77a9178309.hgignore
    100644 blob 79c7beb2227ce149c7a71e58e2f7379071b7a189MANIFEST.in
    100644 blob 0d05178544942a035a82599900bec27fbac1c9c5README.eventlet
    040000 tree edb8f37fa622315dcf7bf4f7316d5e85c48cfdbdexamples
    040000 tree 64cf252d77a4162099442bb0153985fc20ed5ba3gevent
    040000 tree 261052e04b4aece469b2e767e394aafbc9d88a32greentest
    100644 blob 488e805c563dfeeb6af5e7a1a8953b706d9676e3setup.py
    -> % git cat-file -p 6e543dc13df1b556fd95530061ac0c77a9178309
    syntax: glob
    *~
    *.pyc
    *.orig
    dist
    gevent.egg-info
    build
    htmlreports
    results.*.db
    gevent/core.so
And yeah it's still very similar though it currently doesn't store the objects individually but rather packs them together.


I wrote about the format of git trees (and other object types) here:

http://alblue.bandlem.com/2011/08/git-tip-of-week-trees.html


While it looks arcane, this comes in handy enough when grepping through history that I actually have "cat-file -p" aliased to 'cf'.


One noteworthy difference is that in the original repository format, a tree object was just a list of named blobs. Nowadays each subdirectory of a tree is its own nested tree object, which means that when you're comparing two trees, you can skip over the directories that are identical.

I'm not sure when that change was made but it must have been very early on, because the repository format has been basically stable for many years now.


Thread from 829 days ago. https://hackernews.hn/item?id=4395014


Good memory!