Thanks for the write up, Lucas. It was very intuitive and I learnt a lot. I noti...

meetapoorvgupta on May 28, 2017 | parent | context | favorite | on: View Counting at Reddit

Thanks for the write up, Lucas. It was very intuitive and I learnt a lot.

I noticed that you used 5000 buckets to store the frequency of 7000 non-unique words in the section on 'Counting Bloom Filters'. How is that better than using 7000 buckets and a uniformly distributed hash function, which would maintain frequencies perfectly? We would be using fewer buckets by an order of magnitude in a real-world implementation to save memory.

lucasschm on May 28, 2017 [–]

Yeah, I should have given more thought to that number. Updated the example for N=300. Thanks