As n gets bigger, it goes from 0 to 1.
When n equals x, it is 0.5
As n gets bigger, the difference between n and n+1 gets smaller
For two sufficiently large n's, the results are equal.
Say somebody told you about a new cafe in town and that it is completely awesome. The best cafe ever. What probability do you assign to it really being an exceptionally awesome cafe? If your x is 3, then the probability after one person praised it is 25%:
1/(1+3) = 0.25
And if another person told you about that cafe being awesome, the probability becomes 40%:
2/(2+3) = 0.4
And after 3 people told you the cafe is awesome, chances are 50% it really is:
3/(3+3) = 0.5
The changes in probability are pretty strong at the beginning. But after 1000 people reported about the awesome cafe, the next report makes almost no difference anymore. It only ups the probability from 0.997008 to 0.997011.
By changing x from 3 to 4, your formula becomes more "suspicious", by changing it from 3 to 2, it becomes more "gullible".
I wonder if this formula has a name already. If not, "the trust formula" might be a candidate.
One way to view this formula is to use the fact that the Beta distribution is a conjugate prior for the binomial distribution.
Essentially if you have a Beta(a, b) prior then your prior mean is a/(a+b) and after observing n samples from a Bernoulli distribution that are all positive, your posterior is Beta(a+n, b) with posterior mean (a+n)/(a+n+b). So in your example you effectively have a Beta(0, x) prior and x (“suspicious”/“gullible”) is directly interpreted as the strength of your prior!
Yeah, that's a lot of jargon associated with Bayesian statistics, but at it's root the idea is simple. How to merge information you have before observing some data (a.k.a. prior) with new information you just observed, to obtain updated information (a.k.a. posterior) that includes both what you believed initially + the new evidence you observed.
The probability machinery (Bayes rule) is a principled way to do this, and in the case of count data (number of positive reviews for the cafe) works out to give be a simple fraction n/(n+x).
Define:
x = parameter of how skeptical you are in general about the quality of cafes (large x very sceptical),
m = number of positive reviews for the cafe,
p = m+1 / (m+1+x) your belief (expressed as a probability) that the cafe is good after hearing m positive reviews about it.
Learning about the binomial and the beta distribution would help you see where the formula comes from. People really like Bayesian machinery, because it has a logical/consistent feel: i.e. rather than coming up with some formula out of thin air, you derive the formula based on general rules about reasoning under uncertainty + updating beliefs.
> Can this way to view the formula be expressed without the terms
You're asking "Can this way of viewing the formula in terms of Bayesian probability be expressed without any of the machinery of Bayesian probability?".
Also, in case anyone is interested, the uninformative Jeffreys prior for this in Bayesian statistics (meaning it does not assume anything and is invariant to certain transformations of the inputs) is Beta(0.5, 0.5). Thus the initial guess is 0.5, and it evolves from there from the data.
This reminds me of a simple algorithm to determine which product to choose if all have similar ratings but varying number of votes - add one positive and one negative review and recalculate.
Good point! The formula indeed assumes a base probability of zero. That's actually why I put "The best cafe ever" in there and that it is called an "exceptionally awesome cafe". I got a bit lax later in the text just calling it "awesome".
For a cafe aficionado, who spends most of their time in cafes, reading HN and thinking about formulas, the probability that some random cafe becomes their new favorite is virtually zero.
In other words: The more cafes you already know, the closer to zero the chance that a random one will be the best of them all.
So yeah, it is a formula for cafe lovers. Not for the casual person who is happy with a random filtered coffee swill from the vending machine. Those would have to add a base probability, turning the formula into something like b+n/(n+x)*(1-b).
I think Laplace's Rule of succession [1] could be better here. It assumes there are binary "successes" and "failures" (e.g. thumbs up/down). Let s be the number of "successes", n be the total number of data points (successes+failures), and 1/x the prior probability of a success. Then the probability that the next data point will be a success is:
(s + 1)/(n + x)
E.g. for prior probability 1/2 (success and failure initially equally likely), x=2, so
Interesting philosophical question: is awesomeness intrinsic or extrinsic (a matter of perception)? Can anything be intrinsic?
If it's intrinsic, then yes, the probability that it is awesome should not be zero if you've never heard of it. It's awesomeness exists independently of any measurement. But, by definition, you can't know it's awesomeness until you measure it, so awesomeness quotients only matter after they've been measured. And a measured value value must be expressible/observable outside the system (i.e. extrinsic).
I would view this as Laplace smoothing or Additive smoothing for binary distributions (https://en.wikipedia.org/wiki/Additive_smoothing). I use it all the time when I'm working estimating rates of some events from a limited amount of samples.
I think the jump from 1->2 ppl telling me it's the greatest cafe ever is a bigger jump than the jump from 0->1. Thus I think it would be more like a logarithmic curve.
Mine is
It has a bunch of interesting aspects: Say somebody told you about a new cafe in town and that it is completely awesome. The best cafe ever. What probability do you assign to it really being an exceptionally awesome cafe? If your x is 3, then the probability after one person praised it is 25%: And if another person told you about that cafe being awesome, the probability becomes 40%: And after 3 people told you the cafe is awesome, chances are 50% it really is: The changes in probability are pretty strong at the beginning. But after 1000 people reported about the awesome cafe, the next report makes almost no difference anymore. It only ups the probability from 0.997008 to 0.997011.By changing x from 3 to 4, your formula becomes more "suspicious", by changing it from 3 to 2, it becomes more "gullible".
I wonder if this formula has a name already. If not, "the trust formula" might be a candidate.