HN2new | past | comments | ask | show | jobs | submitlogin

There was a question yesterday about Evidence Based Medicine vs Science Based Medicine. The SBM criticism of EBM is the over-reliance on Randomized Controlled Trials that meet p=0.05, without looking at the prior probability that a treatment would help.

For example, EMB would say that if you have an RCT that shows that a lucky rabbit's foot works, then you have reasonable evidence to put that into practice. The issue is that, as this article points out, even with a statistically significant result for such research, there's no plausible mechanism by which a rabbit's foot makes you lucky. Therefore the RCT is just one part of the whole picture, and subject to a specific type of manipulation.



Something related to keep in mind when it comes to p-values is the false discovery rate:

> If you run lots of phase 2 trials with different drug candidates where only a minority (lets say 10%) actually work, then with standard trial statistics (80% power and 5% false positive rate) you will get 4.5% false positive and 8% true positives – so less than 2 out of every 3 positive trial results were real. A much lower success rate than the 5% error rate commonly assumed.

http://www.forbes.com/sites/davidgrainger/2015/01/29/why-too...

This recent (open access) review in R. Soc. Open Sci. the Forbes article was largely inspired by is worth a read too: http://rsos.royalsocietypublishing.org/content/1/3/140216


> The SBM criticism of EBM is the over-reliance on Randomized Controlled Trials that meet p=0.05, without looking at the prior probability that a treatment would help.

The strength of significance testing is that it purposely doesn't try to tell you how likely something is to be true, only how likely the data you got was the result of chance assuming the treatment is no better than placebo. You're still taking the prior probability into account when trying to figure out the truth, you're just not putting a number on it.

My concern with bayesian approaches is that, like with frequentist approaches, the truth is still fundamentally unknowable, only now you're encouraged to put a number on that and pretend that it's science. While bayesian approaches totally make sense in trying to determine a patient's likelihood of having some disease when there is already data available for the prevalence in a population and the sensitivity and specificity of the tests, using bayesian logic to weight clinical trials strikes me as being highly dubious.

It would be one thing if SBM actually developed a framework to give a weight to each methodological feature of a trial, but so far I haven't seen much work to build a functioning system. Though if you're really honest about all the ways that you can have positive results without something actually being true, it seems like almost no amount of research will ever have a significant effect on the prior.


You're absolutely correct that Bayesian approaches are not magical and do not suddenly supply you with vastly more information than frequentist approaches (particularly when you have a really poor prior, in which case the Bayesian approach will be similarly poor). Bayesian statistics is certainly very popular right now, but it should not be looked at as some sort of panacea for all statistical problems.

However, I would say that Bayesian approaches do have a big advantage in terms of helping with the interpretation problems that plague frequentist significance testing. Namely, as the OP article points out, Bayesian approaches reformulate the testing question in a way that is more intuitive, i.e. "what is the probability of the hypothesis given both the prior probability and the new data?". So yes, Bayesian methods surely do not fix everything, but since interpretation of statistics is such a major concern, they can be quite beneficial.


You're still taking the prior probability into account when trying to figure out the truth, you're just not putting a number on it.

This is a danger sign - you are doing the same things the Bayesians do, just informally, less explicitly, and probably incorrectly.

The fact is that to make a good decision, eventually you need to compute a single number. This is an elementary fact of topology:

https://www.chrisstucchio.com/blog/2014/topology_of_decision...

That number will be based on some unproveable assumptions. That's a fact of Godel's incompleteness theorem, if nothing else. So given this, why is it "dubious" to make those assumptions explicit and obvious?


> The fact is that to make a good decision, eventually you need to compute a single number.

Your linked blog post states that if you make a good decision, then there is a process computing a single number which is equivalent to your process. This is not equivalent to what you claim. As a matter of fact, it's the same kind of confusion that exists around the p-value.

It's not the case that a process explicitly computing such a number automatically makes good decisions, which is what you seem to claim implicitly.

Also, Gödel has nothing whatsoever to to with this.


Eventually you need to compute a number which is either above or below your go/no go threshold. That's the number I'm referring to.

I don't claim you can't arrive at it by some perfect heuristic. I merely claim that you are better off being explicit about your assumptions and formalizing your reasoning. That just makes mistakes more obvious, makes your strong assumptions more clear, and makes it more likely that you will correctly update your beliefs rather than incorrectly discounting/overvaluing evidence.

You are right about godel, it's a separate theorem I'm referring to which says you need unproveable axioms. I misremembered, sorry, wrote that before my coffee.


I see where you're coming from, and I agree with you in large part, specially about making your assumptions explicit.

However, I think it's important to notice that an explicit formula for your thought processes can be difficult (computationally expensive) to find. Our brains have evolved to use heuristics and "gut feelings" to make decisions, and the approach you propose forces you to throw all that away and use the much slower general purpose processing part of your brain to emulate those processes. So there's a tradeoff there.


> That number will be based on some unproveable assumptions.

Given that, would you support using a random number generator as part of the drug approval process to remind people of the importance of the unknown and unknowable?


[edit: I previously said I didn't understand Alex's point.] I now understand the point you were trying to make. A better way to put it - if a random number generator were used in a decision process, I'd favor making the algorithm and random seed explicit.

Any procedure you use will have assumptions. You can't escape this. The only question is whether we show or hide them. Can you give an argument in favor of hidden assumptions and non-explicit procedures?


> Can you give an argument in favor of hidden assumptions and non-explicit procedures?

So as counterintuitive as it sounds, I think there are actually a couple of good arguments that can be made here:

1) With significance testing, the burden of supplying the assumptions and determining meaning is largely on the reader. With bayesian, it's transferred to the author. While it might make sense to use Bayesian for things like the Cochrane report, it's not obvious to me that each person who designs a research study and collects/analyzes data should also be in the business of trying to say whether some phenomena is real when looking at all other studies.

Essentially each study now becomes a metastudy, with all of the practical and epistemological problems that entails. The fact that it's difficult to figure out what that even means should be a red flag. (And yes, I realize this is the Chewbacca defense.)

2) So TokenAdult actually turned me onto this book Measurement In Psychology, which is all about the epistemological problems with assuming that anything you can assign a number to is a measurement. That is, having the property of being meaningful when interpreted on a ratio scale. The exact argument is kind of esoteric, but the basic takeaway is that it's very easy to trick yourself into thinking that just because you can assign a number to something that it's a measurement, to the point where assigning numbers to things in the first place tends to lead to worse decision making than if you had just used a green/yellow/red system or whatever.


Regarding (1), with significance testing the burden of supplying assumptions is not placed on the reader. The assumptions are implicitly built into the NHST rather than explicitly built into the prior.

As for each study becoming a meta-study, that's silly. This is indeed the chewbacca defense. Rather, each empirical study provides Bayes factors which the reader can then use to update their posteriors.

Regarding (2), obviously not every number is a measurement. In Bayesian stats, numbers representing probabilities are quite explicitly opinions. They are meaningful on a ratio scale, and are even asymptotically known to be correct. But they aren't measurements.

(They are correct if your priors are absolutely continuous w.r.t. reality. If you hold a religious belief so strong that evidence can't change it ("100% certainty"), that's not an absolutely continuous prior.)


But how could a randomized clinical trial make this mistake? What experimental design could make the rabbit's foot look good?

"EMB would say that if you have an RCT that shows that a lucky rabbit's foot works, then you have reasonable evidence to put that into practice."


You have heard "19 times out of 20" described in the news? That is the 0.05 restated for laypeople. 1 time out of 20 you will get a false positive, in this case that the rabbit's foot worked.


Sorry maybe I'm being dense, but who would take 1 out of 20 success to mean they should start buying rabbit feet?


Nobody. The problem is that if you conduct 20 trials of the efficacy of rabbit feet, you'd expect 1 trial to show a significant effect. (if you're interpreting P-values the way many people do, incorrectly.)


When you do a single trial you might show that rabbits feet are effective.

You don't yet know if this is a 1 in 20 result or a 19 in 20 result.

That's why you replicate.


You'd start buying rabbit feet because only the study that was "successful" is published.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: