Hacker News new | past | comments | ask | show | jobs | submit login
Distributed systems theory for the distributed systems engineer (the-paper-trail.org)
265 points by shakkhar on May 13, 2016 | hide | past | favorite | 23 comments



For people not already into distributed systems but want to get started, I blogged my (very) short list of things to read last year [1].

Today I would add a fifth item to that list: "Why Logical Clocks are Easy", which is one of the best explanations of causality I have seen so far [2].

[1] https://blog.separateconcerns.com/2015-07-07-four-easy-reads...

[2] http://queue.acm.org/detail.cfm?id=2917756


Then add it ;)

We can't keep everything we ever write up to date, but there's little point in reading someone's 2015 list of what to read when they've decided there's a good addition in 2016.


You are right, I will.


Practially speaking, I've learned much more from Aphyr's Jepsen Test framework (and write-ups about test results) than from any other single source.

Ref: https://aphyr.com/tags/Jepsen


I recently took a distributed systems course (https://roxanageambasu.github.io/ds2-class/) in school and our professor referred us to Prof Steve Gribble's videos which, IMHO, are extremely informative and fun to listen to.

Couldn't recommend it more - http://courses.cs.washington.edu/courses/csep552/13sp/video/

Class Webpage - http://courses.cs.washington.edu/courses/csep552/13sp/


I'm not sure how it would read to someone who hasn't been previously reading anything on the subject, but I like aphyr's notes on a two day course on distributed systems as a high level overview of the topics involved: https://github.com/aphyr/distsys-class.


Great timing and submission, thank you for posting! I've been meaning to get more in depth knowledge on distributed systems, but despite having access to several academic (text)books, I felt overwhelmed and didn't know where to start exactly and what sub-topics I might want to focus on.

Just downloaded and sent to Kindle 'Distributed Systems for Fun and Profit' as a free PDF written by an engineer currently working for Stripe, a book recommended in the article. It's only 62 pages and doesn't feel intimidating!


I'm looking forward to the publication of Martin Kleppmann's book Designing Data-Intensive Applications.

http://shop.oreilly.com/product/0636920032175.do?sortby=publ...


Join safari books online and start reading it -- it's good.


I want to but I never read incomplete books. And its still missing few chapters.


> But I’ve come to thinking that recommending a ton of theoretical papers is often precisely the wrong way to go about learning distributed systems theory (unless you are in a PhD program). Papers are usually deep, usually complex, and require both serious study, and usually significant experience to glean their important contributions and to place them in context. What good is requiring that level of expertise of engineers?

Bingo! We need some "O'Reilly style" distributed systems material. Most of us are not going to be designing new algorithms, but plugging in various pieces. Having a generic understanding of those pieces and where they work well, and when to actually go to the research are kind of missing right now in that world.

Some other links that people might find interesting:

http://videlalvaro.github.io/2015/12/learning-about-distribu...

http://book.mixu.net/distsys/single-page.html

http://dancres.github.io/Pages/


Pretty much the same as most aspects of IT. How often does anyone write their won sort routine? How often does that sort of thing get asked in interviews?


Probably one of the more used books (by universities) on the topic is "Distributed Systems: Principles and Paradigms" by Tanenbaum and van Steen. I just finished a class that used this book and I understand that there are criticisms of it, but it did seem to me to be reasonable given the breadth of the subject. And most, if not all, of those papers are covered to some degree in this book.

Something I'm looking forward to, Pearson has returned the copyrights of the book to the authors and they are supposedly updating it. Could be interesting: http://www.distributed-systems.net/index.php?id=distributed-...

The main web site says the 3rd edition is nearing completion.


Can anyone familiar with the linked material comment on whether there is a standard model used in the proofs there and in the DS literature?

I'm thinking of something like Lamport's global time model from "On interprocess communication".


No there is not.


MIT biology courses teaches very fine distributed systems theory.)


> Gwen Shapira, SA superstar and now full-time engineer at Cloudera [...]

Gwen is at Confluent, the Kafka company. Doing a great job there!


The post in the OP is from 2014


(Before you down vote: I have a PhD in distributed systems and fault tolerance. Okay, now you can down vote for the duchebaggery of this prescript)

I think a fundamental and very underrated paper and concept (which actually predates Paxos, yet Lamport ignored or was unaware of) is the notion of randomized consensus protocols. Simpler than "structured" leader type algorithms. Believe Ben Or's algorithm was first.


> Believe Ben Or's algorithm was first.

Ben-Or's "Another Advantage of Free Choice" beat Rabin's "Randomized Byzantine Generals" by a couple of months in 1983. These algorithms show how much people over-extend results like FLP. The result is about a very particular system model, and the addition of even a very tiny extra piece (in Ben-Or's case, a random oracle) makes the consensus problem possible again.

I wouldn't say that these algorithms were really ignored by Lamport when he wrote the Paxos paper. Again, they're solving a different problem in a different system model. If you want to pick on Lamport, talk about Liskov's Viewstamped Replication.

If anybody has a digital copy of Ben-Or's paper that isn't partially cut off, please make it available. Both the copy in the ACM library and the only copy the author himself has are missing some of the right hand side.


I disagree - an ex-colleague at Cornell wrote a paper proving equivalence. Will have to dig that up..


An illustration is bees vs. flies.

A bee trapped indoors will go for the sun: a great algorithm for a forest/thicket but fatal in a house or car with windows. A fly will randmly try until it succeeds, meaning it is slower at escaping a thicket but will eventually find the open door even if it's opposite from the window.


Where did You study your degree? I would be intereses in doing one.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: