Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin
Hacker News Data Analysis (rjmetrics.com)
430 points by robertjmoore on Oct 17, 2012 | hide | past | favorite | 73 comments


#1 Lesson from all of this: Instead of talking about your product to your prospect, talk about something your prospect cares deeply about to your prospect.

I had no idea what you did and didn't really care until you used it in context of something I did care about: Hacker News. Now I know what you do, understand how it applies to me, and best of all, I'm starting to visualize how else I could use it.

We should all approach our prospects like you just did here. Nice job!


I don't see that as an "instead of." He talked directly about his product, he just picked a very good, relatable example. And then showed specific details about how his product helps you do it. Most of Apple's commercials are like this: they show their product directly, and they show how you might use it for specific things. Compare that to some vague "we think you're a good person, and we're good people too" branding commercial that tries to relate without showing what I'd actually get out of using their products, or a product-featuring but otherwise short on examples commercial with people throwing Surface tablets around like frisbees.

So absolutely tell your prospects how your product will let them do things they want to do—if you don't understand what they want to do well enough to be able to do this, they aren't very good prospects yet—but always focus on your product, because that's what you have to sell, and that's what people are going to be willing to pay for.


I would be curious to see how 'meta-submissions' (i.e., posts about Hacker News itself, about Y-Combinator, and about YC-backed startups) rank in the charts versus the five categories shown in the rjmetrics blog post. (The blog post, which made it to the top of the HN front page, is itself a meta-submission!)


The HN Community is fascinated by the outside view of HN and eagerly gazes at the held-up mirror. One of our shared interests is emerging online social structure, of which we're an example.

We're also attracted to self referential expression, such as HN comments about HN self-organization, HN self-policing, and HN self-referential comments.


Which really could just indicate that we all have an interest in HN being a good resource for us to read.


I enjoy finding out what I like, and exploring why I like it. I posit that this is an inherent property of appreciation which allows one get more out of something.

The "meta-HN" stuff is an incarnation of this, and shows that HN is appreciated by a critical mass of people. This is a huge boon to the community as it leads to things such as people using Markdown syntax for clarity, proper spelling/punctuation, well thought-out comments, and often (but not always) a positive and constructive viewpoint aimed at advancing the discussion.


People generally seem to be drawn in by meta-content, self-reference, recursion and the like. Remember how people went crazy about the central premise of "Inception?"

Self-referential comments do, however, have a tendency to derail conversation from that which is being discussed. In a threaded comment system, that tends to garner frustration after the initial novelty of the ideas has worn off. Case in point: Reddit has a weird attitude toward self-reference. Over the last year or so, there's been a (personally perceived) decline in self-referential memes, and redditors seem to see it as derivative now.


Translation for those that missed it: This post is a (very good) sales pitch.


No, this post is the best kind of sales pitch. Instead of pitching his product, he uses it to create value for his clients. I found it an excellent read.


>#1 Lesson from all of this: Instead of talking about your product to your prospect, talk about something your prospect cares deeply about to your prospect.

Better yet, make your product something that your prospects will care deeply about. That way you can kill both birds with one stone.


Excellent analysis of this analysis. :) Gives me ideas for my own blogging...


Put another way this is classic "story selling" - wherein you use a captivating story to demonstrate all the points you would in a traditional pitch. Good stuff


Actually the reason his posts stopped making it to the frontpage is that the last 3 before this all set off the voting ring detector.

I don't know how accurate his other conclusions are, but it seems unlikely that new signups are down, considering the trend in traffic: http://www.archub.org/hntraffic-17oct12.png


I've never before gotten to see data like this from a popular website, so my questions might be a bit ignorant or naive.

Do popular sites normally have the big (daily?/weekly?) swings seen in the HN traffic data?

Have you done any research correlating the high/low days versus the submissions present on those days? (i.e. are the swings content driven?)


Most websites are busier during the week, while we are all procrastinating at work.


The traffic map most closely matches his active users map, which makes sense. If the number of new signups were still increasing, you should see a more exponential traffic graph.

Was there an error with the voting ring detector?


You can never be certain about something like a voting ring detector, but I'm pretty sure not.


I work with robertjmoore. I, along with a few other colleagues at our small office, voted for his past three blog posts. Many of those votes probably came from our single office IP address. Many were probably also placed after clicking a direct link to the HN posting. There couldn't have been more than about 10 votes like this, because we don't have many employees. And, in the interest of full disclosure, I don't vote for many HN posts besides those written by authors I know.

Is this the behavior that HN's vote ring detector is trying to discourage? I understand that these things are a slippery slope - but if so, it's too bad, because quality content like robertjmoore's last three posts is getting lost, and I would imagine that other authors at small companies like ours are unwittingly falling into the same trap.

If there have been other posts explaining the DOs and DONTs of the HN vote ring detector, I apologize in advance for not having read them.


Just my opinions here:

>Is this the behavior that HN's vote ring detector is trying to discourage?

I hope so. It's great and all that you have 10 people to vote up an article as soon as its posted... but I don't. Shouldn't the content be voted up based on its own merits vs. how many people you know? An instant 10 votes is a huge unfair advantage.

> if so, it's too bad, because quality content like robertjmoore's last three posts is getting lost

If the problem is the content getting "lost" because you guys are voting up the articles... then stop doing that. If the content is vote-worthy it will get votes.

Additional comment: sure, I get that this wasn't clear to you guys... but come on. On some level you can see how this would be unfair- right?


I would like to continue seeing high-quality content on the front page too. However, upvoting submissions based solely on the author is poor form. Submissions should live or die on their quality alone.

Giving a "boost" to your friend or colleague, even though your intentions are good, is unfair to other submitters.


Another possibility: people have tired of your formula. Andrey Karpov used to submit blog posts with the results of running his fancy commercial static analyzer on various open source code to Reddit. The first several got a lot of upvotes; a while later it became clear that it was mostly hocking a product. The more your blog comes to resemble an infomercial the less you can expect to be on the front page.


"If anyone out there suspected that the 'old guard' had given up on HN, this chart proves them wrong."

Of the people here since the first year, probably only 25% still participate regularly. Occasionally I'll stumble across some discussion from the early years in Google, and it's crazy how different the site was back then. There are still good comments now, but back then there were entire conversations that were good. I don't even bother to write the kind of comments that I used to, because they wouldn't work at all on the site as it is today.


Alex, based on your kind reply to DanBC, you are tired of people calling you out for relying on Google University for your knowledge on controversial subjects. That may be tiresome, but it may also be good for the overall level of factual discussion here.

My general observation of what excites people on Hacker News is that negative metathreads get more upvotes, by two orders of magnitude, than positive metathreads. I would love to see more replies to the old thread "Ask HN: What do you like about the Hacker News community?"

https://hackernews.hn/item?id=4399678

from 60 days ago, if people are so inclined. I posted that soon after a metathread that complained about comments that were insufficiently kind and affirming, from someone who has asked advice about external website designs in days past. I figure if I ask for advice about a website, people are very well going to give me advice, and I might as well man up and take the advice. But, yeah, one thing I like about HN is that people look things up in good-quality offline sources in many instances, and ask other participants here to check their facts. And there are other good features of the community here that help me learn and develop in my work, in my community citizenship, and in my family life.


"you are tired of people calling you out for relying on Google University for your knowledge on controversial subjects"

I think it's funny how the more books I read, the more 'controversial' my knowledge becomes. I realize this is about as self-serving an argument as possible, but honestly I don't think I'm inherently interested in controversial topics, I think it's just that the more you know about something, the more wrong you're going to sound to the average person.

To give an example that most people on HN will agree with, why do you think the general public thinks eVoting is completely secure, while CS folks are generally horrified by the products that are actually used in elections? The same phenomenon applies to virtually every other area of life. Doesn't matter whether you're talking about medicine, agriculture, religion, climate change, education, etc. The more you know, the more wrong you're going to sound. (C.f. the presidential debates.)

Also, I'm usually fairly anti-Wikipedia these days. That's why I generally avoid linking there in the first place, though also to encourage people to actually read quality books or academic research. Wikipedia is often useful for finding primary sources, but

A) it's rarely comprehensive. Usually they just link to one or two primary sources at random, rather than the best ones or all of them.

B) it generally does a poor job at properly characterizing the arguments for or against something. Usually you just get 'some people believe this, other people believe that' without any indication that the case for one side might be much stronger than the case for another side.

C) It also tends to be out of date at any given time. Often not seriously so, but if you look up any of the statistics that the CDC or BLS publish on an annual basis then more often than not you're getting last year's data.

D) I agree with Jaron Lanier's point of view in his Digital Maoism essay. Specifically I think knowledge is inherently tied together with authorship, and that an article of 'facts' without a voice is just a "faux-authoritative, anti-contextual brew."

Granted Wikipedia has a lot of advantages, but I think it's a poor substitute for reading actual books. And I especially think that it's generally a dick move to pretend that you're an expert on something after having only read the Wikipedia article, except for in certain niche areas where there is no authoritative source.

And again, I'm not trying to claim that I know everything by any stretch of the imagination, just trying to explain why some of my comments may seem 'controversial'.


> I don't even bother to write the kind of comments that I used to

Please do. Good content is always welcome.


I mean it seems like the vast majority of comment replies I get are people asking for citations of basic facts and research they could find themselves in 30 seconds of Googling. So posting anything more complex would pretty much be a non-starter.


If you look at my submission history of my blog then I think it's clear that HN likes things that are original and/or well thought out. My weaker blog posts go nowhere, but ones that are detailed make it. So, if there's a formula for appearing on HN, it's write something original and/or deep.


And then hope some people see it before it gets pushed from the "new" page, after which it doesn't matter how original or deep it is.


Which can happen very quickly. Sometimes I am not certain whether a posting just isn't interesting, or the wind is so strong the voice gets lost in the roar.


I can think of times when I've been the single up-vote and/or single comment on a submission of yours. The most recent one that comes to mind easily is your post on a claim about total memory in ancient computers (I think it was 53K in 1953 if my memory serves).

It wasn't that the post lacked detail or depth, instead, it's a problem of varying interests; few people have an interest in ancient computer system design and history, let alone know what "mercury delay line" memory is.

When you combine a more esoteric interest with the fast queue of the HN '/newest' page, the result is an exposure, visibility, and discovery problem, rather than any real flaw in your content.

Also, robertjmoore, thanks for a great article with data.


Very useful analysis. After running Hacker Newsletter for the past 2+ years I have seen basically this. However, the analysis seems to miss looking at things on a smaller scale like the day and time you post it which has proven to be a big factor [1]. I know even on a weekly basis (which is what I do for the newsletter), it seems some weeks have an abundance of high quality articles compared to others.

[1]: https://hackernews.hn/item?id=3251877


I do think that the day/time an article was posted and also who posted are fairly large contributors to being on the front page. I've written a few articles that have made the front page this year.

In at least two instances, I posted the article myself with no upvotes. Then another HN user reposted my articles a few days later (my blog is republished by a couple tech sites), and the same exact content makes the front page. Same article content, same title, just posted by someone else and linking to the mirrored site.

Good post Robert. If you're looking for help growing the RJM team, look me up.


I once worked out there were 100:1 visitors to voters for a link.

Most of the people I know who peruse HN regularly are not registered users. They are happy to let others do the commenting (which they read).

http://williamedwardscoder.tumblr.com/post/18839832580/reddi...

It was super-surprising to see my own blog getting an average of 55pts on HN; I hadn't wondered about that before.


Certainly fits in line with the 1% rule http://en.wikipedia.org/wiki/1%25_rule_(Internet_culture)


I suspect the NYT/WSJ gap is more a result of WSJ's much more restrictive paywall.


Yeah. I don't understand how the OP came to believe that the average quality of WSJ articles is drastically lower than that of NYT articles unless he never reads the WSJ or the NYT.


I suspect it is thanks to the technical aspects and quality of data visualization, built by jashkenas and others.


"Interestingly, if you look at the number of upvotes cast each day, the trend is similar. For the past two years, the same number of stories have been competing for about the same number of votes each day." This statement, backed up by the analysis in the submitted blog post, is interesting. I visit the new page

https://hackernews.hn/newest

as many times per day as I visit the front page, looking for good new submissions to upvote. The limit on the number of users who cast upvotes on new stories appears now to set a limit on the number of new stories that have been submitted in the last two years. As the blog author points out, if HN largely stays on topic, there are only so many new stories each day that fit HN's topic.


I found it very interesting that the contributing userbase has gone up but the vote count has not. I've certainly noticed the userbase expanding, but would have guessed that voting followed.


Fascinating. However, one must be careful about jumping to conclusions from analysis like this. I see a few items where the author that might have come to the wrong conclusion.

- New user growth. I don't think its b/c a 'saturation point' has been hit for the HN community as the article hypothesizes. There was a period in the last few years where there was an conscious choice by HN to restrict user growth in order to maintain a higher signal to noise ratio. Newbies are now marked with green and there is no register link on the homepage. for a while there wasn't a way for new users to sign up.

- The NYT more favored compared to the WSJ? most likely not due to the quality of the writing but b/c WSJ articles are not available to non-subscribers by default.


Granted, it is natural to want people to hear what you have to say, but I did not think the reason for posting on HN was so you could try to make it to the front page. The blog post could have been titled "How I'm trying to get my submissions to the front page of HN".


My takeaway --- from the fact that Matt Might's domain is second only to pg's --- is that you should write up easy to understand lecture notes on deep PL-related topics.


The fact that this story went straight to the top suggests that a meta-post about HN is the way to go.


Data-based posts about how to get to the front page, sure.

But otherwise, I'm not sure it really pays off.


My interpretation of how this one shot right to the top: Hacker News loves posts about itself. :)

Nice analysis - the user engagement stats were very different from what I was expecting (I think I would have agreed with Jake before I saw the data).


The retention rate actually seems relatively low as an absolute percentage, though the way it plateaus is interesting. I did an analysis of the retention of the oldest Slashdot users (http://www.kmjn.org/notes/early_slashdot_users.html), and it was much higher: about 70% after 2 years, rather than 30%. Took about 10 years to drop to 30%. Granted, that's for the earliest users, so retention rates are probably (much?) lower among later signups.


you say the two possible reasons you are not making the front page are: your content is weak, or people's taste's have changed. The fact that the number of submissions has not changed suggest to me a third and more plausible option: The quality of submissions, and therefore the competition for the "front page" has increased.


Doesn't "My content is weak" contain the implicit qualifier "compared to the average within the domain we're discussing"?


No. You can have a domain of discourse where ALL content is weak. In fact, you can have content in this domain that is stronger than most of the content, but it might still be weak.


great point


Maybe I'm understanding it wrong. But the data seems to be saying that HN has succeeded defeating the eternal september effect. That'd be big news!


> Also interesting is the enormous gap between the New York Times, whose content tops this list, and the Wall Street Journal, whose content performs among the worst.

I think this might actually be more related to the WSJ paywall. If you dont have a subscription, you can't view many WSJ articles, whereas the reverse is true for the NYT.

On an unrelated note - I wonder how the category of HN related posts do, relative to other (basically same analysis of the "Pinterest" category). Judging by the success of this post, I suspect HN + Data are a good mix. Are posts about "Data" just as successful?


I wish that he had included the stats for titles containing the words "Hacker News".


"I chose to categorize content by the mention of things like big companies (i.e., Amazon, Google), Hot Startups (i.e. Pinterest, Instagram), Sensationalism (i.e. Best, Worst, First), Programming Languages (everything I could think of), and Profanity (which was fun)."

What happens to stories that use sensationalism and profanity? Or sensationalism and a new startup?


I assume those stories would be included in both averages?


The non-exclusive sets may overlap with interesting results, if there is a multiplicative effect. That's a hypothesis worth testing, perhaps through a quick stratification of the data. There is also an issue about multi-colinearity. For example, if one were to consider modifiers vs nouns. Is it the specific noun that is of interest? Or the subset of that specific nouns, delineated by the modifier? what about the modifier generally, applied to a general noun? etc. Are there specific combinations that are significantly different from the average? etc.


This is an interesting analysis yet the information can be derived from using your site's analytics and your observational skills to come to the author's conclusion.

It's like a painting - the subject matter is important, yet the stuff around the main subject is what makes it stand out.

Analyze what your stats don't have, or seem to have 'less of', as compared to other content.

I think the data analysis could have been more interesting to a broader audience by making it more 'newsworthy' rather than a raw analysis targeted at a relatively small community (compared to a more general audience).

By 'newsworthy' I mean something along the lines of 'NYTimes, WSJ used by technical users too' - or something like that - or something like - 'Hackers in controversy - observers and participants'.


I think the basis for evaluating the quality of a community lies in the discourse and communication. Submissions are a part of that, but the discussion that follows (i.e. comments) is the most important indicator of change. Personally, there seems to be an influx of reddit-style comments (little substance, meme-oriented) this year, but that could be a general evolution of the English language given the heavy influence of the internet.

That said, evaluating change in the number of comments along with comment upvotes vs. sentiment analysis seems like the only logical way to demonstrate any sort of quality meta analysis. I'm not really versed in qualitative research, so here's my ASK HN: is this even possible?


I think what you are doing is challenging in the sense that you have made your goal to write a post that will go viral on HN. Remember, every story here, pretty much, is content from somewhere else. You are right that you aren't hitting your audience, but your audience isn't HN, its those reading your blog. If someone in your audience is also on HN then maybe they will find it relevant to post.

Writing to be a big story on HN is like betting a number in roulette. You had beginners luck at first, now its time to find a new game...


Is it really surprising that Hacker News doesn't care about Pinterest?


It's not a YC startup and it's wildly successful.


I've noticed that some submissions drop off the news feed like a rock, while other submissions of the same story posted just a few hours later can gather considerable discussion, with submission time being the only apparent variable.

This leads me to speculate that there may be an optimal submission time or times throughout the day. I'd like to see analytics that look at the variation in the average number of comments/upvotes for submissions (or some other metric) to see if this theory holds any weight.


There is a tool that attempts to predict when it is a good time to submit something to HN, http://hnpickup.appspot.com/

It's been discussed several times on HN, for example here: https://hackernews.hn/item?id=4058492 and (original submission I think): https://hackernews.hn/item?id=3251877


Just a heads-up: your site works really poorly on mobile. The text column is too narrow, while the charts are too big, and their interactive features make it hard to scroll the page. They also don't work right (touching a chart seems to mess up the y axis labels), but the impediment to scrolling is more annoying. I only ever read HN on my iPhone, so this is an upvote you're not getting simply because of technical problems with your website.


These are excellent visualizations, I'm glad they put this together, showed it to the hn community while also demonstrating one of rj metrics use cases.

A note about the product. How do they differentiate themselves from other DW analytics companies like datameer? (http://www.datameer.com/) I can tell they specialize in e-commerce, but couldn't any DW analytics service give you that AND more?


I work at RJMetrics, thanks for the feedback. While we specialize in e-commerce, we have many clients that are outside of that industry. As robertjmoore demoed in the post, we can take almost any data you can throw at us, and help you find actionable insights in easy to understand charts.


I have to shake my head in admiration. What a powerful story - to start with not one failure, but three failures, and then to use the same tool you were trying to hock in those failures to figure out why you failed...and then, remarkably (at least for me) succeed wildly.

At least in this case, your tool provided some very valuable insight.


Thanks to these metrics, I've cracked the HN code:

We should always publish our content on paulgraham.com

That's the takeaway of these metrics, right?


This looks cool and now I want to mess around with it. I wish there was a torrent for that dataset.


Wow, I'm number 10! I don't know if I should be happy that people like my stuff, or scared that I've spent so much time submitting and commenting on Hacker News this year...


As someone who'se mildly colorblind, a few of your charts were near impossible to read. Especially the bottom three lines in Average Score by Category. Just a heads up.


Does this article correct for increased thresholds to perform some actions? The down-vote used to be easier to get, for example.


Just a suggestion, he should compare MongoDB and Riak on Hacker News. For laughs




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: