Hacker News new | past | comments | ask | show | jobs | submit login

An interesting comparison point: a single core on a late-2014 MacBook Pro can achieve runtimes for the same graph that are within a factor of 4 for WCC (461 seconds for FlashGraph versus 1700 seconds for the laptop).

http://www.frankmcsherry.org/graph/scalability/cost/2015/02/... (previously on HN: https://news.ycombinator.com/item?id=9001618)

There are also results for PageRank on that graph, which make the difference more pronounced. FlashGraph runs PageRank in 2041 seconds (I'm assuming for 30 iterations, per Section 4 of the paper), whereas the laptop takes 46000 seconds for 20 iterations.




Absolutely spot on - between FlashGraph and Frank McSherry's COST work, the two have really pushed the envelope on efficient large scale graph analysis.

Frank McSherry wrote a "call to arms" for the broader graph community at [1]. The main point of interest is that academia generally compared their work with existing distributed graph processing systems, celebrating when any achievements were made, yet not aware of the significant overheads brought on by the distributed approach. Both Frank's work (run on a single laptop) and FlashGraph (run on a single powerful machine) run far faster than the distributed approach and have very few disadvantages.

Note: I'm a data scientist at Common Crawl and Frank's graph computation discussion article was a guest post at our blog.

[1]: http://blog.commoncrawl.org/2015/04/evaluating-graph-computa...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: