Hacker News .hn (a.k.a HN2)new | past | comments | ask | show | jobs | submit | jsrfded's commentslogin

This was a skunkworks project from a 2-person team in mozilla labs doing some thinking around how to get user keystrokes out of the browser-navigation workflow. The extension can be slightly laggy, but when it hits the preview is cool.


It's easy, just add the site you want to delete from the web to your /spam list or the /slashtag that you're negating out of the results.


Partitions are based on a hash of the primary key. The number of buckets in the system has to be a power of 2. But we can split buckets to increase the number, or even have some buckets that have split and some that haven't yet. Each bucket is stored on 3 separate servers (and the assignment makes sure the three servers are on separate racks).


Paxos would be good for electing a master, but we wanted to avoid having any masters in the architecture. There are also scenarios where paxos can be slow or fail to reach a consensus. We wanted high availability from each node in the cluster regardless of whether 2/3 of the rest of the cluster were down or unreachable; both parts of a partioned cluster should also be able to continue to function as best they could.

Individual nodes can often make "personal" decisions about what to do in subobtimal situations. If you can answer an incoming request, even with partial or out-of-date data, do so; it's better than not replying. For the repair agent, each node can see its own view of "holes" in the 3-level replication, and offer to make copies of <3 buckets to bring back up to three copies.


This article and you guys' comments in this thread are really interesting and suggestive. I hope you continue writing and talking about it. It's great marketing for blekko too :)


Within the datastore, there are 3 copies of each piece of data. When a get() request is made, it goes out to the "closest" copy; if an answer isn't heard from by some threshold, a 2nd request is made to one of the other replicas. Whoever gets the data back first wins.


Greg has planned a whole series about the combinator architecture behind blekko's datastore. Greg and I have both presented aspects of the system at various conferences, but we're happy to chat about it with you directly too. I think this might be the first time it's been published on the web though.


There's a video of one of Greg's talks (from Surge 2011) here: http://lanyrd.com/2011/surge/shzth/


It didn't make it into the article, but if you use Do Not Track with Firefox blekko will turn on "SuperPrivacy" and not save any logs at all, or pass the query to any third parties.


His name is spelled Marc Andreessen, please fix the title.


The big deal for us with this update is automatically applying slashtags to boost quality in 110 categories.

You can see the full list of categories we're doing this for in a screenshot in the Search Engine Land post: http://searchengineland.com/blekko-slashes-more-spam-with-zo...


Random highly-ranked hacker news threads: http://www.skrenta.com/hn/


Yup, this is the one I mentioned in this tree a little earlier.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: