HN2new | past | comments | ask | show | jobs | submitlogin

Articles about issues like this have been submitted several times to Hacker News, and I'm glad to see that this article is getting some discussion. Right after I saw the submission here, I looked up articles on the replication problem in psychology, and I came back to the PsychFileDrawer.org top 20 list of studies that are nominated as needing more replication.[1] I'd like to see more replication of several of those studies myself.

Another recent commentary that I don't recall seeing submitted to Hacker News is "Psychology's real replication problem: our Methods sections,"[2] which suggests (quite plausibly to me) that many publications in psychology journals describe the methods of the study so inadequately that it is hard to know whether or not the study can be replicated.

A scholar of how scientific research is conducted and of statistical errors that show up in many peer-reviewed scientific publications, Uri Simonsohn, has a whole website about "p-hacking" and how to detect it.[3] Simonsohn is a professor of psychology with a better than average understanding of statistics. He and his colleagues are concerned about making scientific papers more reliable. You can use the p-curve software on that site for your own investigations into p values found in published research. Many of the interesting issues brought up by comments on the article kindly submitted here become much more clear after reading Simonsohn's various articles[4] about p values and what they mean, and other aspects of interpreting published scientific research. And I think Hacker News readers who have thought deeply about statistics will be delighted by the sense of humor while making pointed remarks about experimental methods that you can find in the papers of Simonsohn and his colleagues.

Simonsohn provides an abstract (which links to a full, free download of a funny, thought-provoking paper)[5] with a "twenty-one word solution" to some of the practices most likely to make psychology research papers unreliable. He also has a paper posted on evaluating replication results[6] with more specific tips on that issue.

"Abstract: "When does a replication attempt fail? The most common standard is: when it obtains p > .05. I begin here by evaluating this standard in the context of three published replication attempts, involving investigations of the embodiment of morality, the endowment effect, and weather effects on life satisfaction, concluding the standard has unacceptable problems. I then describe similarly unacceptable problems associated with standards that rely on effect-size comparisons between original and replication results. Finally, I propose a new standard: Replication attempts fail when their results indicate that the effect, if it exists at all, is too small to have been detected by the original study. This new standard (1) circumvents the problems associated with existing standards, (2) arrives at intuitively compelling interpretations of existing replication results, and (3) suggests a simple sample size requirement for replication attempts: 2.5 times the original sample."

I should add that slamming the entire discipline of psychology as a discipline with sloppy methodology goes a bit too far. I have learned about most of the publications that take psychology most to task from working psychology researchers. There are whole departments of psychology[7] that largely have a scientific orientation and are trying to improve the discipline's methodology. Crap psychology abounds, but it is gradually being displaced by science-based psychology based on sound methodologies. It is of course more methodologically difficult to study the behavior of our fellow human beings than to study clouds or volcanoes or insects, but many scientifically oriented psychologists are working on the problem with good methods and sound statistical analysis. Some thoughtful psychologists have been prompted to stress careful replication by the failed studies that have come before.[8]

[1] http://www.psychfiledrawer.org/top-20/

[2] http://psychsciencenotes.blogspot.com/2014/05/psychologys-re...

[3] http://www.p-curve.com/

[4] http://opim.wharton.upenn.edu/~uws/

[5] http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588

[6] http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2259879

[7] http://www.psych.umn.edu/research/areas/pib/

[8] http://www.psychologicalscience.org/index.php/publications/o...



I'm sorry that I don't have the required knowledge to add to what you've said here, but I have a digressive question: what fields have you worked in and what is your current occupation?

I ask because the topical breadth of your comments is absolutely astounding to me.

Please don't take this in too saccharine or fawning of a manner, but I could only hope to be as generally knowledgeable and contributory as you appear to be were I to buckle down and work my inquisitive ass off for the next 5 decades.

Anyways, I always look forward to your comments. Thanks for them, regardless of how you manage to dish them out.

If you have a blog, I'd be a happy subscriber.

Edit: I read your extensive profile, which answers my questions as to your experience. No need to reply if there's nothing of interest to add :-)


I would like to suggest that the general sentiment here is true in some other modern scientific disciplines. In other words, I don't think it's because psychology is transitioning to a more formal scientific approach that causes these difficulties. I think it has to do with a misalignment of incentives in the present academic world, and the difficulty of the statistical inference framework.

These problems of reproducibility occur in basic cancer research articles as well. Amgen proved this and wrote a paper detailing issues [1].

Academics are valued for their papers, citations and grants. And papers are difficult to publish if one fails to find an effect. The researcher has some incentive to find the right numbers because their job is, in a way, on the line.

There's lots of writing about the tension between "publish or perish" [2] and scientific integrity. It manifests in p-values often, but p-values are the dominant statistical tool right now in most fields. I think you will see the same tension regardless of your tools for doing science.

[1] Drug development: Raise standards for preclinical cancer research, http://www.nature.com/nature/journal/v483/n7391/full/483531a... [2] http://en.wikipedia.org/wiki/Publish_or_perish


> "It is of course more methodologically difficult to study the behavior of our fellow human beings than to study clouds or volcanoes or insects,"

I have always despised this semi-excuse. It should not excuse, or even explain anything at all. It is a great deal harder to study the interior of Europa than it is to study how humans behave, but nobody would say that "crap astronomy abounds". At least not without some serious backing evidence; and they certainly wouldn't simply concede it to the hypothetical masses who already suspect it.

"It is hard to find the funds to perform this experiment, so excuse us for making shit up!" is something that would never fly, but "It is hard to find ways to perform this experiment ethically, so excuse us for making shit up!" is heard all the fucking time.


the replication problem is more general than Psychology, it's a problem across biology. Perhaps more acute in psychology where the statistical methods lag a bit, but there are serious problems with the volume of science produced and the statistics used to analyse the aggregate. I feel like we might have a foundational crisis emerging.

http://neuroblog.stanford.edu/?p=3451


The neuroscience review cited in this blog post is originally from nature reviews [0]. While I agree with the general notion that neuroscientists need to get better at using statistical methods and tools, I wonder if the author of the blog you linked (Zalocusky) maybe makes the problem out to be somewhat different than as it is described in the original review.

Specifically Zalocusky is trying to link the neuroscience results to an earlier article by Ioannidis [1]. Whereas article [0] makes the conclusion that neuroscience results are unreliable and hard to reproduce, Zalocusky is applying the older methods and results from [1] that this means neuroscience results are not only unreliable but also false, which is a stronger statement. To support this, Zalocusky adds in a couple of back-of-the-napkin calculations.

I think a bit more analysis and caution is necessary before making the leap from Ioannidis' 2013 claim about neuroscience research to Zalocusky's stronger claim about neuroscience research.

Incidentally there was a lot of discussion about Ioannidis' 2005 paper, some which can be seen by Ioannidis' 2007 response to earlier criticism of the paper [2]. When evaluating claims about entire fields of research, it is important to be careful about how we interpret these claims.

edit: We need to be especially careful when using the word false. Does that mean "not true"? Or "insufficient to describe the truth"? Or "directly opposite to the truth"?

[0] http://www.nature.com/nrn/journal/v14/n5/full/nrn3475.html

[1] http://www.plosmedicine.org/article/info:doi/10.1371/journal...

[2] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1896210/


Yeah I agree caution in interpretation is important.

However, given enough scientists all three forms of false are being published in parallel (as well as genuinely true results), which is the key problem. Without bounds on publication bias and publication quantity you can't really rigorously associate a probability of truth from a p-value* in isolation.

(hmm, I suppose meta-studies are a kind of remedy to that so maybe it will all work out anyway)

* or whatever


I think this is related. Something I have just become aware of in the last year is, researchers in the psychology fields having a backlash against "replication bullies" (a quick Google search will show what I'm talking about.) They claim that they are being battered with unreasonable requests for replication of work. This is so far outside of my knowledge, I wonder if you have an opinion on that.


Robert Kurzban gave a short talk on this topic at HeadCon '13.

http://edge.org/panel/headcon-13-part-iv




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: