Hacker News new | past | comments | ask | show | jobs | submit login
Academics argue over how many people read and cite their papers (2014) (smithsonianmag.com)
56 points by testrun on Oct 31, 2016 | hide | past | favorite | 41 comments



I think that is something worth investigating - because if it is true that 50% of all papers written never get an audience, then I would see that as a pretty major flaw in the way we do science. (As if we don't have enough of those already...) I mean, many people having been saying for years now that the scientific community is much too focused on churning out papers, but this would take things a step further by saying that all those extra papers are essentially useless. And if they are indeed useless, then they represent a phenomenal waste of time and money that needs to be stopped ASAP.

(Personally I doubt those numbers, but I have no hard evidence to back that up.)


'I know half the money I spent on advertising is wasted, but I don't know which half.'

A lot of research is supposed to be new, uncertain, high-risk, high-reward. So it should fail often.

[A lot is also to be meticulous fact-checking, building certainty to later build new breakthroughs on.]

IMO, there is a real problem with people doing bad research they know is bad, but I wouldn't say that the 50% unread figure is the thing to focus on.


Your quote is thought-provoking, but I do not quite follow the rest of your argumentation:

> A lot of research is supposed to be new, uncertain, high-risk, high-reward. So it should fail often.

Setting aside the question of whether or not science ought to be the way you describe it, what do you mean by research "failing"?

IMO, there are two levels on which research can fail. The first (and I get the impression that you are describing this type) is the failure of some new theory to be corroborated by the facts, or the failure of a new approach to fulfill its promises, or a later experiment contradicting an earlier one. But this is part and parcel of the scientific method. In fact, this leads to an advancement of science, as we then know which explanation is not correct, or which observations need more detailed scrutiny. And so, such a failure of research is actually a success of science.

However, this scientific process is based on scientists exchanging their findings freely so that they can cross-check each other. Thus, the second and much greater failing of research would be to fail to communicate itself. And that is exactly what (appears) to be happening. If scientists aren't reading what their colleagues publish, that is not a failure of research - the research itself might be fantastic, but if nobody reads it, who is going to know? This is not a failure of research, this is a failure of science. And that is a much more serious issue altogether.


>whether or not science ought to be the way you describe it,

I'm assuming a charitable readership - but its frequently stated, and I agree, that industry is for high value things we know will work, science/research/academia is geared for less certain / more speculative projects difficult to commercialise yet.

> If scientists aren't reading what their colleagues publish, that is not a failure of research

Well, maybe?

Or maybe the paper authors hoped the results would be amazing, but they were only mediocre [research issue]; but they decided they might as well publish after getting the results (which is good), and some people skimmed the abstract; but it wasn't as interesting to others as the authors hoped (diverse perspectives; good); or the whole field went a different direction; or the specific competition found a better technique/result in the interim [research issue] and got all the love.

That's all just how research goes - doesn't mean anyone is failing to communicate necessarily.

> This is not a failure of research, this is a failure of science.

Maybe; maybe not.

I agree there's lots of problems in science, but based on my limited experience, I'd expect and think its OK for lots of stuff to be unread; lots of dead ends.

Lots of startups should fail too, that's not a bad thing; its like there's simply a high bayes error.

IMO real failure is things like people making up data, or doing research they know is bunk (maybe they hacked their p-values or left out data, or something) or continuing research they discovered is useless but their Adviser is politically wedded to etc.


Yes we should totally incentivize publication of negative results / failed experiments. If anything, we need more publication of negative results, that are likely to not be cited ever for the most part. It's a big problem that many failed experiments go un-noticed because researchers are dis-incentivized to publish negative results.


This brings the interesting question: how many failed experiments are being "replicated" just because the past failed attempts are not published? Said differently, how many researchers are wasting time on something that is bound to fail, like all the researchers that happen to have tried the same idea in the past failed, but didn't publish, because who wants to publish failure?


> A lot of research is supposed to be new, uncertain, high-risk, high-reward. So it should fail often.

Except that has nothing to do with papers not being read. Papers that detail failure are usually never written, and if they are, they're not published. But in any event, reports of failure should be read almost as much as reports of success.

In addition, bad research and unread papers are two completely separate problems, that may not be related in any way whatsoever.

Papers are not read but because researchers must churn out a lot of papers even for results that are partial or not very interesting on their own, so we are flooded with what basically amounts to noise. It obviously makes sense that a lot of research would turn out to be not very interesting, but that shouldn't result in papers being written and published. I guess that even those who publish those papers would rather have more time to bring ideas to the point where they become interesting, or conclude that an approach has failed, but the system doesn't work that way, and that's a failure of the system.


While I agree that reports of failure can be important, in many cases experimental research is trying to develop a protocol to perform a measurement. A protocol that does not work is simply not as interesting as a protocol that does work.

For example, I am sure there are many plausible gene-editing protocols, but none are as robust as CRISPR. Of course the papers that describe CRISPR are going to be cited more, since that protocol is going to be used in many other studies.


Exactly. Take the same thing, but applied to startup culture. It's true that 50% (actually much more than that) of startups fail spectacularly. But I don't see anyone saying we should stop doing startups.


Nobody is saying we should stop doing science - we just want to see it done better. As for the concept of "failure", see pron's and my comments above.


> A lot of research is supposed to be new, uncertain, high-risk, high-reward. So it should fail often.

Not being read is not a failure in research, it's a total failure in the system.

It's like paying for ads where 50% are never even published. It's a scam, not the cost of doing business.


One problem is measuring what we mean by "paper". I can quite believe this is we include all the fake journals, which feature huge numbers of junk papers.

By the same argument, 99% of webpages are never visited by anyone but bots, but that isn't a "crisis of the internet", just a lot of junk.

It would be nice (and I don't know how) to provide a clearer division between junk and real journals -- I now basically ignore all new journals, unless I know someone involved personally (either as editor, or author). That's the only way I can stay sane, but it doesn't seem like a good system.


Good point. Unfortunately, the links in the Smithsonianmag's article are broken and they don't provide much other information to find the original papers that reported this, so I don't know how the authors worked. Would be interesting to see which sample they took.

[Edit] I found the original papers (see above), but unfortunately the relevant one is closed-access...


> if it is true that 50% of all papers written never get an audience

This can not be assessed. Even if you could somehow determine whether a paper was read or not, how long after publication do you wait until you declare it "never read"?


If it's not read in even a year or two, I think that can still be considered a failure.


Many of the most important advances in science are ignored for decades. To take one well known example, Mendel’s work on breeding of pea plants was published in the 1860s in an obscure journal and completely ignored by biologists at the time, but was rediscovered and popularized 40+ years later and now forms the foundation of the modern understanding of genetics.


That is far too shortsighted.


Yeah, especially when you consider all the current buzz about Neural Networks. The original papers on MLPs, CNNs, etc were published more than two decades ago but were not given much attention until recently.

And I certainly wouldn't consider that a failure.

I'm sure there are other examples of research catching on decades later.


> saying that all those extra papers are essentially useless

Having a deliverable is useful. It focuses your efforts, provides clarity of thought, and gives you a target.

It's often that only when explaining something we realize we don't understand the thing.

Even papers nobody but the author reads are useful.


> Even papers nobody but the author reads are useful.

But to a very, very limited extent. Why do we publish papers? Because we want to share the knowledge we generate. Papers are the very essence of the ideal of open science - science that is accessible not just to the people who did the work, or those who paid for it, but (in theory) to everyone. Papers are there to communicate information, but communication always requires a recipient as well as a sender. Ergo, if our papers aren't being read, there is no recipient, there is no communication: our papers have failed their most basic raison d'etre. And because they are our most important method of communicating, you could say that open science itself has failed.

Yes, there is a value in writing up your thoughts and results just for yourself. But then why go through the hassle of publishing? Why waste paper and ink (in a printed journal); and more importantly, why waste the time of the reviewers, editors and typesetters involved in the process?


Why do we publish papers? Well, I guess for the vast majority of authors, THE reason is that the number of published papers is one of the key metrics at your job. From the lowest levels of PhD students to the PIs, the one things that you need to show is how many papers you have published and where. Even if the paper is crap (you often realise that years after you've published, of course) it increases the metrics by one point, so... And, even if your work is fantastic, small number of published papers will effectively put you behind and kill your career. Sad but true.


Unfortunately, you are correct. And if you refer back to my first comment, you will see me saying something very much like that ;-) However, that is not the original intent of a paper, which is what my later argumentation referred to.


Let's not forget that all of these papers passed review, meaning the reviewers/editor felt it made a contribution to the field.

Also, just because it didn't get read right away doesn't mean it's a failure. Ask some gray haired academics how often they've seen (first hand or otherwise) an obscure paper from decades ago become key in solving a modern problem.


> Also, just because it didn't get read right away doesn't mean it's a failure. Ask some gray haired academics how often they've seen (first hand or otherwise) an obscure paper from decades ago become key in solving a modern problem.

True, but don't tell me that all of nearly a million unread papers published this year (accepting the Smithsonianmag's figures as true, for the moment) are going to solve a key problem of science in 50 years...


Unread and uncited are very different things. It appears that the 50% number is some sort of arbitrary (short?) time scale "has it been cited yet" metric. Citing something is much harder than reading; you need to do some research, write it up and go through the publishing process. If 200 people read a paper but not cite it, is it a failure?


I am pretty sure that the vast majority of cited papers have not been read by the people who cite them.


I've often tried to find the "canonical" paper to cite for a certain fact, and found that it barely even mentions the relevant fact at all. It just somehow became canonical because everyone needed a citation for the same thing later on.


Very likely not all :) How do you figure out which of the nearly a million are the useful ones?


Sure, but then you don't need to go through the whole process of publishing it. You could just post it on your blog.


It depends on the journal and not all journals are equal, e.g. every Nature paper has been widely read.


> Hopefully, someone will figure out how to answer this question definitively, so academics can start arguing about something else.

The reason academics argue about citation counts isn't necessarily that they care, but that, at many institutions, citation counts are directly tied to hiring decisions.


So what's the modern academic clickbait? I've only read the greatest hits like Part time Parliament or Goto Considered Harmful. But those were really successful.

Is there something widely read that is not particularly insightful?

Is there a buzzfeed of journals?

Not trying to sleight an author, just curious what attention culture looks like in academics. Perhaps it doesn't exist, or it's driven by author reputation or some other effect.


Nature and Science, kinda. Those articles, though very good, can be very buzzfeedy in their 'zeitgeist-ness'.

I got to talking with one of their editors a while back, they know they are the 'gatekeepers' to a scientific career, and they hate it. The guy said that in the first week of the year, the submissions that they have for just the first week for the first issue are so much, and of such high quality, that they could just shut down the rest of the submissions for the year and see no drop whatsoever in quality. Getting into Nature or Science is so prestigious, it will make your career if you are grad student. Unfortunately, it is effectively a lottery to get in them.

Not to say you can't "make it" in other journals, but Nature and Science are sure things. If you are a post-doc in some fields, you pretty much have to get in Nature or Science, sometimes twice, to be considered as a faculty recruit worth mentioning. It's a giant god-damn mess of a system. It is very 'up or out' and as such, we waste billions training students that have no chance, doing work that languishes in unranked journals, and generally loosing the trust of the public that pays us.

NIH is trying to help, but it and the NSF are so inundated with Bullshit Artists and Yes-men that I doubt they can get away from their sclerotic bureaucracies and change.


Here are the original papers cited:

Meho 2007: http://iopscience.iop.org/article/10.1088/2058-7058/20/1/33/... (not open access)

Evans 2008: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.183...

Lariviere & Gingras 2008(?): https://arxiv.org/pdf/0809.5250v1.pdf

The third paper is especially interesting, as it brings forward what seems to be a pretty solid analysis that goes against what the previous two authors (and the Smithsonianmag) say about the state of science. So maybe there is hope yet :-)


Curious if this takes into account online publishing and online views. These days, with search engines indexing everything, while something might not seem immediately useful doesn't mean it's worthless.

Sometimes you're just ahead of your time, or in too niche of a field.

All that being said, the writing papers does seem to be the academic version of the corporate hamster wheel.


Look in to the budding field of altmetrics if that kind of thing interest you.

https://en.wikipedia.org/wiki/Altmetrics


It's just a symptom of how dysfunctional the current funding & promotion practices are. For many years now these decisions have been based directly on measures more or less equivalent to "inbound links" to ones publications, to an extent that such measures have become a de-facto form of capital -- to be fostered and protected accordingly.


It's not surprising, but it's also not a problem.

First, the need to publish is due to the publish-or-perish strategy, that forces authors to put out 1-2 papers per year in order to stay afloat. If you work in research, you realize at some point that advancing the state of the art in any field is hard. It takes intuition, dedication, some luck, and most importantly a LOT of time. That is, /years/ of work where nothing comes out. The world-class papers you see in Nature are works of years, not months.

Ignore bad research for a moment: imagine you had only great researchers, which is what we'd all like to sustain. This is /fundamentally/ incompatible with the current model. You have two strategies as a researcher: put out crappy work to sustain a larger project, or publish bite-sized pieces of your larger work over time, which leads to incredibly specific papers that taken by themselves seem useless. If you take the second path, it's obvious that you'll also take advantage of the weak spots of the system, and cite yourself on each subsequent paper you publish. This is common knowledge. Because you need whatever you have to stay afloat.

The first strategy (keep the bigger task going) is almost never attainable. In fact, if you need infrastructure, such as a lab of any kind, it's impossible to pull off. This is something that only a tenured researcher working in a very successful team or with good connections might be able to do. Aside from your vision, and any grant you might have, you need mindshare in your fellows and your supervisors to simply let you do it. This is much harder than it needs to be, with too many factors out of your control.

As for the "unread" output, I'm not worried. I assume there's a good 30% of papers which are just turmoil due to the above system. As said, this very system has pushed authors to publish even more, increasing artificially the number of publications one would have written. This is also placing a burden on new authors that need to do literature research. Finding all relevant articles about any subject has become a massive task, even ignoring the problem of getting the articles themselves.

But how can you tell if the published works are worth it or not? It's impossible. Consider the evolution of all the odd fields of math, most of which would have seemed just ludicrous at the time of publication for anybody but the author. Stuff like origami to fold an antenna on a space telescope. Again, if you considered all past works and projected forward, you'd realize there's no way an honest reviewer could tell "this is useless".


A idea, propably a bad one- what if, not by accident, but by intention, there was a flaw in every paper, and a reader was only allowed to cite, if he could state the line and type of flaw?


Someone would quickly set up a database that would tell for each paper where the mistake is.


Relevant XKCD: https://xkcd.com/1447/.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: