13 Comments
User's avatar
Tony's Blog's avatar

Great post, seeing some things come up again and again.

Expand full comment
Zeyad Sharaf's avatar

Very informative , Thanks for sharing !

Expand full comment
Abhay Bhandari's avatar

Insightful.

Expand full comment
Prathamesh Chavan's avatar

Nice 👍

Expand full comment
Gaurav Poudel's avatar

Nice one

Expand full comment
Vaibhav Sabharwal's avatar

I mean from this it looks like bloom filters are not about giving quick responses for "is this username taken", because it doesn't. We still hit our cache/db in those cases and I imagine those calls are optimised because of good indexing.

So bloom filters are about not making a db call in the case when username is available.

Which seems a bit counterintuitive. When you add a username its a slower call either way, you PUT in your db.

When you're looking for a username and it already exists and we get to know that immediately --> that's what I'd associate with a fast call. But that does not happen. If anything bloom potentially slows us down because there's an extra check.

So why are bloom filters associated with speed? They should be associated with efficiency. Not making unnecessary db calls. Its a subtle difference.

Expand full comment
Omkar Rane's avatar

Wouldn’t the in-memory bloom-filter have consequences? Like during scale-out or server restart, the application needs to have the bloom-filter initialised. Plus all the servers need to be in sync for when new user registers themselves.

Maybe have the bloom-filter in redis itself?

Expand full comment
Bhargav Katabathuni's avatar

Generally, how many hash functions do we need to use for handling billions of usernames?

Expand full comment
Haseeb's avatar

Amazing 😍

Expand full comment
Ajay's avatar

Great post. However, I am not sure about level 4 where trie is used for suggesting similar usernames. In my understanding, they are not efficient at a billion-users scale. 🤔

Expand full comment
shivasaipasham's avatar

What if data is stored in a non-relational database such as mongodb or dynamodb? In that case assuming username as the partition key is the db lookup faster or using bloomfilter? Also if bloomfilter or trie is stored in memory will is not be cleared on server restart?

Expand full comment
Ravi Yadav's avatar

When you talk about fast look ups with Trie, how does it take less time checking in DB?

If DB itself uses Trie data structure, then it makes sense.

If someone explains it here, it’ll be really helpful.

Expand full comment
Ashish Pratap Singh's avatar

Trie is usually stored in-memory while DB fetches from disk. That's why trie is usually faster. Also, trie works in O(m) where m is just the length of the string which is lot faster. Many relational databases O(log N) time for lookups after indexing, where N is the number of records.

Expand full comment