A New Algorithm for Controlled Randomness

dahart · on Aug 31, 2019

> Together we came up with a solution that solves this with O(n) memory usage where n is the number of choices, and O(log n) CPU cost. And it allows you to decide if you want to be more random or more deterministic.

I think you can do the same thing and solve all the same problems in O(1) time per sample using the alias method plus shuffling while blending between regular uniform samples and jittered uniform samples.

- Build your alias map of M buckets, the distribution you want to sample

- Take a set of N uniform samples, and use a knob to blend between regular and jittered. It looks like: r=lerp(knob, 0.5, rand())/N

- Shuffle the uniform samples, e.g. std::random_shuffle(). This can be done incrementally, per sample (https://en.wikipedia.org/wiki/Fisher–Yates_shuffle).

- Plug the uniform sample into the alias map to find your sampled choice.

- Repeat when you run out of the N uniform samples.

https://en.wikipedia.org/wiki/Alias_method

http://www.keithschwarz.com/darts-dice-coins/

Strilanc · on Sept 1, 2019

For some reason alias sampling is not well known, even though it strictly dominates using a search tree when generating randomness (it's faster, it uses less memory, and IMO it's easier to code). It even works better when generalized to the case of preparing superpositions on quantum computers ( https://algassert.com/post/1805 ).

ur-whale · on Sept 1, 2019

Alias sampling is a really beautiful algorithm, but implementing it correctly and efficiently isn't as easy as you seem to imply.

I would also very much like to know what a generalization of this algorithm to continuous distribution would look like.

Strilanc · on Sept 1, 2019

You make a list of below-average bars and a list of above-average bars. Iteratively pop the top of both, donate from the larger to the smaller until the smaller is average, then put the larger back into the appropriate list based on its new value. Stop when there's no too-large bars left. Compared to writing a red black tree it's trivial.

ur-whale · on Sept 1, 2019

A correct implementation wrt probabilities being exact given a frequency table (the "average" can be a very large rational number), as well as efficient when constructing the data structure to be sampled from during sampling phase is IMO not as trivial as you seem to imply.

I have asked people to implement this algorithm during engineering interviews on a regular basis for the better part of five years, and I can promise you, strictly none of the candidate who were subjected to it would describe their experience as "I was asked a trivial question".

But then everyone has their own bar for the word "trivial". I know for example that I have my own bar for the word "arrogance".

dahart · on Sept 1, 2019

Oh wow, how do you prompt the interviewees? Do you tell them about the average value threshold or the guarantee you only need 2N buckets? If not, I’d guess this is probably way too hard of an interview question, even though a decent implementation really isn’t that difficult. Is this like a bonus difficult question for someone who aces all your other questions?

The implementation isn’t hard once you’ve read the Vose paper or the Keith Schwarz link I posted. There’s only one tricky part about handling rounding that is pretty easy to fix.

I would never expect anyone who hasn’t heard of the idea to be able to implement it on the spot without being more or less given a recipe for the math and how to use it. If you ever find someone who can, hire them immediately!

This is one of those problems that only feels somewhat easy in retrospect, but almost nobody could derive any of it on their own. The alias method is extremely clever, and not at all trivial.

ur-whale · on Sept 1, 2019

> Oh wow, how do you prompt the interviewees?

Back when I was still doing this, I used to start the interview like this:

        - given: a non-uniform, finite, frequency table with actual observations of some real-world phenomenon (and if that's too generic, just tell the candidate you've thrown a dice a million time an recorded the outcomes).
        - given: a uniform pseudo random number generator
        - please design a pseudo random number generator that will, if sampled for long enough, produce the same distribution (non generic version: RNG that statistically behave like the dice).

Unfortunately, a non-negligible proportion of candidates fail to produce any solution at all :(

Decent candidates quickly produce the standard: "build the CDF once and then at sample time, draw a uniform number in the correct range and walk the CDF until we're above that random number, return the bucket where that occurred"

I then ask for time and space complexity for the sampling phase.

Slightly better than average candidate notice that the CDF is naturally sorted and can be binary searched.

I then ask those if they can improve what we have to a O(1) solution (independent of the size of the histogram).

The top 20% candidates usually manage to put their intuition at work: from the fact that O(1) almost always implies a simple table lookup, they come up with the: "build a very large array, where the index of each bucket B of the histogram is replicated histo[B] times and then sample uniformly from it".

We then discuss when this algorithm is/isn't practical.

The candidates that get to this point within ~15mn can then be gently walked towards redesigning the alias method using some hints.

The starting hint goes like this: what if you were to completely ignore the fact that the histogram is not uniform, and just stubbornly sample from it, can you quantify how much of a mistake you would make ?

The second hint is: for some buckets, picking them will be an overshoot, and for some we will undershoot. Explain when that is.

The third hint: say we picked a bucket that's chosen too often, can we decide to only keep it some of the time (as in: with some carefully chosen probability)

The fourth hint: and if we decide not keep it, why waste the dice throw, why not pick another, carefully chosen bucket

etc ...

The nice property of this question is: it's very gradual, starting with a problem most people with a CS degree should be able to solve, then going to a solution that's less standard and requires more than base regurgitation of CS lessons, to finally get to what I consider to be (at the risk of repeating myself) the non-trivial state of the art solution to the problem.

The point where the candidate start to struggle gives you a nice reading on their strength.

And there's the rare few new grads that would vaguely recall having seen the alias method and could rebuild it cleanly on the white board from scratch and without help. These rare occurrences were very enjoyable.

Strilanc · on Sept 1, 2019

I agree that the discretizing step is hard to get right. I agree that you don't have enough time in an engineering interview to get Vose's algorithm right (nevermind think of it in the first place). I was definitely imagining that you had a day to implement it and test it, not an hour and a whiteboard.

dahart · on Sept 1, 2019

The closest continuous distribution analogy would be inverse transform sampling - analytically determine the CDF and invert it. Then you have an O(1) function to map a uniform sample into your continuous PDF. Unfortunately, it’s not always possible, lots of continuous functions can’t be integrated & inverted.

ur-whale · on Sept 1, 2019

There's actually an interesting experiment to run, which is as follows:

       - take some non-uniform a continuous distribution with finite support.
       - discretize it into a N-bucket histogram
       - apply the alias algorithm
       - display the result
       - increase N
       - rinse and repeat

The resulting animation is likely going to be interesting.

I suspect it won't be stable at all and might veer into the fractal at some point.

dahart · on Sept 1, 2019

Should be easy to try, maybe I’ll do it if I understand correctly what you’re thinking about...

It sounds like you’re thinking of a feedback loop, where I pick, say 1000 uniform samples, then run them through the alias map, and then run those results back through the alias map, etc., while each time increasing the number of buckets for the alias map?

I would assume the output distribution range is either [0,1] or we re-normalize the samples every time through the alias map?

Is that what you’re thinking of, or are you suggesting to visualize the map directly, and not a bunch of samples?

For a feedback loop on samples, I guess I might expect the samples to eventually converge with all samples at the function’s maximum. But if you mean to track and visualize all the little intervals in the alias map through itself several times, yeah, that might be a bit wild to see.

TeMPOraL · on Sept 1, 2019

Thanks for the links. The last one is long, but made it easy to understand how and why alias method works (and it's awesome for providing mathematical proofs along with algorithms).

That said, I'm having trouble understanding the solution you described. I can understand how the alias method can help me efficiently sample an arbitrary discrete probability distribution, but I can't see how adding jitter helps with the problem from TFA - namely, to generate samples that conform to a given distribution long-term, but avoid hitting "annoying" low-probability outcomes like strings of consecutive lower-probability options. Could you please clarify / expand on your explanation?

dahart · on Sept 1, 2019

Sure, my thinking is that N stratified uniform samples plugged into the alias map is the part that prevents the annoying low-probability sequences. This is very similar to the simpler and more common technique of picking from a shuffled list of outcomes, but plugging it into the alias map allows the outcomes to have arbitrary probabilities not quantized by the number of choices.

The shuffle and jitter are the things that control randomness. Shuffle controls the order of events, and jitter controls which events are chosen. Using both you can have different outcomes each time through the N samples.

Regular sampling would give you the same set of events, where jittering will change which events are chosen depending on your parameter choices. But! Jitter will only change the number of each event chosen by a limited amount that you control.

The choices for the blending knob and the number N of uniform samples, along with your goal distribution, let you control how often and by how much the events change. For example, by picking an N that gives you uniform strata that are the same size as the smallest item in the goal distribution, you can guarantee that you get at least one choice from each item in the goal distribution. I think this would also ensure that adding jitter changes the number of choices for each goal bucket by at most 1. It might be by at most 2, due to the alias map’s splitting of large buckets, and this cap might need strata that are half the size of the smallest bucket. I’m not sure of the exact details without trying it, but I am pretty sure this gives you control over the amount of variation; the number of events you can possibly have above or below the expected value.

That’s the idea though - choosing N and using the knob allows you to decide how much variation you want on top of the guarantee that you never get sequences that feel wholly improbable.

TeMPOraL · on Sept 1, 2019

Thanks for elaborating!

I did some related things like implementing stratified sampling for RNG used in Monte Carlo simulations, so I understand some of the concepts here, but I still don't have a good intuition for how jitter and shuffling affects the outcome set, or whether it could replicate manually preventing repetitions in results from happening. However, you provided enough information that I think I can figure it out now.

Tell me if I'm understanding this correctly:

- You generate a sequence of uniform random numbers using stratified sampling. This increases variation by ensuring the [0, 1) space is sampled more-less evenly, while not breaking the uniformness of resulting distribution.

- With strata size set to the lowest probability in the desired outcome set, we can ensure even the lowest probability outcome will happen n times (with n = samples per stratum). This allows the low probability events to be observed like a naive expectation would suggest, while still not breaking the overall distribution.

- Alias map is used to translate the stratified uniform sequence to desired distribution (is this the discrete equivalent of rejection sampling?).

- Shuffling is applied to the stratified uniform sequence, and ensures events don't happen in order of decreasing probabilities (modulo shuffling introduced by alias map breaking probabilities into pieces).

- Jitter lets us add additional slight unpredictability, while still not breaking the desired distribution (because jitter is uniformly distributed, so it averages out).

Do I understand this correctly?

dahart · on Sept 1, 2019

Yes, spot on, I think that describes it exactly.

And yes, the alias map serves a very similar function to rejection sampling of a discrete distribution, except that you don’t have to waste any samples. And the alias map serves as a deterministic mapping function; the same input will always yield the same output.

There’s another method for discrete distribution sampling that is closer to alias mapping, and better preserves stratification than the alias method but costs O(log n) per sample: compute the discrete CDF, and then invert it by picking a uniform sample and using that to binary search on the (sorted by definition) CDF to recover the choice, which will be then distributed according to your original discrete PDF array. For sample set size that is known in advance, you can avoid the binary search, and do this in O(1) time per sample.

The method in the article feels similar-ish to the binary search method, maybe it’s equivalent, but I’m not sure.

ribrars · on Aug 31, 2019

Nice, seems like a great solution especially in the use case where this service is needed by a game server and good performance is a requirement.

klipt · on Aug 31, 2019

> I don’t know if this problem has a proper name, but in game development you usually don’t want truly random numbers. I’ve seen the solution named a “pseudo random distribution” but “pseudo random” already has a different meaning outside of game design, so I will just call it controlled randomness for this blog post.

The correct term is probably quasirandom or low-discrepancy sequences: https://en.wikipedia.org/wiki/Low-discrepancy_sequence

dahart · on Aug 31, 2019

While those are good terms for deterministic uniform random number generators, that is not what the author’s talking about.

He’s referring to how in games, you sometimes don’t want random patterns. But even when you do, you don’t want the pattern to be truly random, because true randomness sometimes feels unfair and sometimes feels un-fun.

It’s a concept with a broader scope than low-discrepancy sequences.

klipt · on Aug 31, 2019

They don't have to be deterministic, the wiki article has a section on constructing low-discrepancy sequences from independent random numbers by imposing negative correlation between subsequent numbers: https://en.wikipedia.org/wiki/Low-discrepancy_sequence#Rando...

Which sounds pretty similar to what the author is doing - removing some of the independence from a sequence of independent random numbers to make them feel "fairer" (basically making them more in line with misconceptions people have about truly independent random sequences https://en.wikipedia.org/wiki/Randomness#Misconceptions_and_...)

dahart · on Aug 31, 2019

> They don’t have to be deterministic.

Right. Agreed. Still, the idea here is about the perception of randomness at the expense of actual randomness, it’s related to but different from quasi-random ideas.

Edit: Actually I don’t know, maybe you’re totally right about this article, maybe quasi-random is the same thing in this case. I was projecting a broader idea based on having worked on games and run into this problem... my philosophy on it is that often you don’t want any randomness, but sequences that are random seeming at first but are predictable and learnable by the player. Depending on the game feature, that’s sometimes more fun.

In this particular case, they are trying to preserve the intended distribution, and sample a given distribution without shuffling. A low discrepancy sequence does do that, so I’m changing my mind here.

kzrdude · on Aug 31, 2019

Maybe they have misconceptions about randomness, but are they wrong about their expectations for something that models the real world? I'm thinking the space of possible random events might be too big, or the distribution is wrong.

o09rdk · on Sept 1, 2019

Also hyperuniformity?

https://www.quantamagazine.org/hyperuniformity-found-in-bird...

gwern · on Sept 1, 2019

Poisson disc sampling! https://www.jasondavies.com/poisson-disc/

richard_todd · on Sept 1, 2019

By the time you are tracking misses on each class of event and adjusting the probabilities, you might as well just use random permutations of the desired distribution of outcomes. Like if the sword is supposed to hit 90%, just run through a 10-bit sequence with 9 ones and a zero, and shuffle it after every 10 attempts.

1wd · on Sept 1, 2019

Famously Tetris typically uses this "grab bag randomizer" (or various other controlled forms of randomness). I was surprised Tetris was not mentioned at all in the article.

https://tetris.fandom.com/wiki/Random_Generator

https://katta.mere.st/different-tetris-algorithms/

XCSme · on Sept 1, 2019

Are you talking about randomly shuffling the bits or getting a new permutation? Any suggestions on how to quickly randomly shuffle the bits of a number?

richard_todd · on Sept 1, 2019

Randomly shuffling the bits _is_ a new permutation (of those bits). There may be a faster way to shuffle the bits of whole bytes/words on some architectures, but for the general case I suspect a normal shuffling algorithm (like Fisher-Yates) and shifts+and+xor swapping is best.

XCSme · on Sept 1, 2019

Oh, sorry for not making it clear, when I meant permutation I was referring to the the immediately next permutation. I guess after current set is over you could also take the Xth permutation of the current state, where X is a random integer. (so you randomly skip some permutations to not make it predictable).

richard_todd · on Sept 1, 2019

Yeah jumping ahead by X permutations would be good as well.

b0blee · on Sept 1, 2019

That sounds a lot easier to me.

unnouinceput · on Sept 1, 2019

At one of my projects I had a similar problem.

The problem: My customer was renting items (to his customers) and wanted those items to be given random, but not too random to avoid giving the same (free to rent) item too many times because it will worn out too fast vs. rest of them. Replacing that item in the overall pool of those items was more expensive than replacing the entire pool (think doors on small lockers, and the entire panel of lockers was cheaper to just get replaced altogether - don't ask me why, modern society is weird)

The solution: Mark each item with a score that goes up each time the item gets rented. And apply random to those that are at bottom. In the beginning all items had zero as score, one gets rented and its score is now 1. I applied random to only those with zero score until all of them were rented once. Of course, renting times were variables so overall the algorithm that I've implemented in the end was more complex, but this was the basic idea.

bmm6o · on Sept 1, 2019

How is that better than just renting out the next one with the fewest rentals? Maybe it's easier to pick at random than to track an order? But then the randomness is for ease of implementation and not a fundamental part of your solution.

TeMPOraL · on Sept 1, 2019

I'd guess the scoring was not linear with number of rentals, because wear isn't. E.g. 2 long rentals will generate more wear than 2 short rentals; probably even more than 3, depending on how much wear is caused by shipping & handling.

I wonder why not derive the score from human input? I assume someone is handling returns; that someone could be trained to eyeball the condition of the item and input that into a database.

unnouinceput · on Sept 1, 2019

No human to oversee this, was the goal. The only human was cleaning lady that was cleaning them at the closing time, and she definitely was not trained to do anything else, hence everything algorithm driven. When something was broken a technician was send to repair, or replace, but no person to stay onsite.

ribrars · on Aug 31, 2019

> How do you arrive at that 5% base chance? I actually don’t know an analytic way of arriving there, but you can get there experimentally pretty easily by just writing a loop that runs through this a million times and then tells you how often it succeeded or failed. Then you build a lookup table that contains the percentages.

I'm no stats expert by any means, but couldn't you calculate the percentages of each event occuring (and the amount to increment) using bayesian modeling?

Simply taking the product of a string of actions and calculating those seems like an acceptable solution, however I'm not sure.

The author seems to use a calculation approach, which if I understand correctly is a valid method and is sampling the distribution. But then again, not sure.

fenomas · on Sept 1, 2019

An easy way to do the odds analytically is to calculate the average expected number of trials needed for a success, and then note that the overall average success rate is the inverse of that.

E.g. in the author's example, where the rate starts at 5% and increases by 5% with each failure, the expected number of trials needed for a success would be:

    sum = 1 * 0.05 +                // hit the first trial
          2 * 0.95 * 0.10 +         // missed first, hit second
          3 * (0.95*0.9) * 0.15 +   // missed first two, hit third
          // ..and so on

Summing up that loop gives 5.29 for the expected number of trials, the inverse of which is 18.9%, matching the author's observed result.

andreareina · on Sept 1, 2019

The problem is the inverse: starting with 18.9%, how do you get the 5+5%

fenomas · on Sept 1, 2019

The author doesn't examine that question. They comment that one can run a large number of trials to estimate the 18.9 given the 5+5, and my post is how to get that answer analytically for any sequence of probabilities.

For the reverse case, there are arbitrarily many such sequences that would average out to any given value. In practice someone using this technique for gamedev wouldn't necessarily only want X+X%.

zaroth · on Sept 1, 2019

Imagine a binary tree. The root node represents the probability of 5%. If you take the left child to mean a positive result, the probability stays at 5%. If you take the right child to mean a negative result the probability rises to 10%.

Every left child produces a tree from that point which is exactly the same at the root tree.

Every right child produces a probability of +5% until you get to 100% after which the right node is forced back to 5% — meaning it will match the root from that point.

I feel like once you’ve defined the tree to depth 20, you could average all the probabilities and it should equal 19%, because everything from that depth further is just a uniform repetition of the probability tree, but it’s probably slightly more complicated than that.

gwern · on Sept 1, 2019

> I'm no stats expert by any means, but couldn't you calculate the percentages of each event occuring (and the amount to increment) using bayesian modeling?

No. Probability theory is about working forward from a given model to a set of possible observations. And Bayesian statistics is about working backwards from given observations to which model in a set of models produced them (which is why an old name for Bayesian statistics is 'inverse statistics').

In this case, there is no given set of observations; there's not some game out there which has produced a dataset and you're trying to work backwards to figure out what model that game is using. Instead, you have a bunch of models and you're trying to work forwards to figure out which of them will give you the kind of outputs you want. So, that's probability theory.

bjornsing · on Sept 1, 2019

AFAICT this is what’s called conditional probability in the scientific literature. The algorithms described would probably be called autoregressive samplers.

wruza · on Sept 1, 2019

Oh, I remember good old fallout 1-2 when you have 42% hit chance but still miss 7 times in a row.

There is also a good Numberphile video on what we think random really is (it is not) https://youtube.com/watch?v=tP-Ipsat90c

ryacko · on Sept 1, 2019

It all comes down to generating deterministic permutations instead of purely random numbers, since the intent is to avoid bad luck.

The most common permutation used is the AES S-Box, commonly available through various ISA extensions. There are many ways to generate permutations though, and it wouldn’t require a large complex algorithm to avoid a 1 in 10 chance event happening three times in a row or generating the entire permutation to be stored in memory.

HelloNurse · on Sept 1, 2019

S-boxes compute fixed permutations of the bits in a small fixed length buffer, which is entirely inappropriate for accurately picking random permutations of a variable and usually large number of choices.

ryacko · on Sept 1, 2019

For a RNG, the maximum run length of same values would be the period length divided by the output size (in general).

The blog post is horrifying, it feeds the output of the Mersenne Twister into another function to reduce the run length of seemingly biased outcomes. I think to generate a single value the whole CPU’s cache is cleared from the size and complexity of the code. The AES S-box, with the seed XORed before and after, would still be a better solution (especially since the probably of certain runs of similar values should be somewhat low).

The theory behind LCGs is a lot simpler, and using a suitably small prime, I think, one could have an appropriately short period without excessive predictability.

ggggtez · on Sept 1, 2019

I dunno, seems silly. I'm sure there are already proven methods, and the mathematical analysis in the article leaves something to be desired. You should probably just lean on whatever is used in Progressive Slot Machines.

daenz · on Sept 1, 2019

Super interesting. It seems to tap into the psychology of some people who gamble... they think the more times they lose, the more likely they win the next time. Why are humans wired this way?

throwawaywego · on Sept 1, 2019

Human brains are optimized for prediction of future events, because this helps with survival (eg: you can predict a winter coming up, so you stock up on food).

Randomness is by (some) definition unpredictable. But humans are so eager for pattern recognition that they will see, or expect, patterns that just are not there.

"Pareidolia is the tendency to interpret a vague stimulus as something known to the observer, such as seeing shapes in clouds, seeing faces in inanimate objects or abstract patterns, or hearing hidden messages in music." and also: https://en.wikipedia.org/wiki/Apophenia#Causes

On a similar note, humans are terrible at coming up with random/unpredictable sequences. If you ask a group of test subjects to pick a random number between 1 and 10, you get a huge edge when you guess 3 or 7.

neuralzen · on Sept 1, 2019

I suspect it has to do with the same facilities that evoke magical thinking, and other pattern-event associative behavior. And like those pigeons in the famous experiment which were fed randomly and evolved various ritual-like behavior in association, we think we did something to achieve a win condition (or food condition). In the case of gambling, what we 'did' to win was gamble when 'the cards were hot'.