The secret architecture behind "username…

Oct 1

When you try to sign up on a platform like Instagram and type in your username, the system almost instantly tells you whether it’s available or not. If it’s taken, it even suggests alternatives on the spot.

Read →

19 Comments

Tony's Blog

Oct 1Edited

Great post, seeing some things come up again and again.

Expand full comment

Hugo Guerrero

Oct 23

Awesome post!!

Expand full comment

Zeyad Sharaf

Oct 1

Very informative , Thanks for sharing !

Expand full comment

Abhay Bhandari

Oct 1

Insightful.

Expand full comment

Prathamesh Chavan

Oct 1

Nice 👍

Expand full comment

Gaurav Poudel

Oct 1

Nice one

Expand full comment

Omkar Rane

Oct 1

Wouldn’t the in-memory bloom-filter have consequences? Like during scale-out or server restart, the application needs to have the bloom-filter initialised. Plus all the servers need to be in sync for when new user registers themselves.

Maybe have the bloom-filter in redis itself?

Expand full comment

Reply (1)

__variable07__

Oct 26

A good question indeed, maybe the bloom filter is cached because it will be quite expensive to generate it again

Expand full comment

Vaibhav Sabharwal

Oct 3Edited

I mean from this it looks like bloom filters are not about giving quick responses for "is this username taken", because it doesn't. We still hit our cache/db in those cases and I imagine those calls are optimised because of good indexing.

So bloom filters are about not making a db call in the case when username is available.

Which seems a bit counterintuitive. When you add a username its a slower call either way, you PUT in your db.

When you're looking for a username and it already exists and we get to know that immediately --> that's what I'd associate with a fast call. But that does not happen. If anything bloom potentially slows us down because there's an extra check.

So why are bloom filters associated with speed? They should be associated with efficiency. Not making unnecessary db calls. Its a subtle difference.

Expand full comment

Bhargav Katabathuni

Oct 1

Generally, how many hash functions do we need to use for handling billions of usernames?

Expand full comment

Jayesh

Nov 8

Consicse and informative

Expand full comment

Bharat

Nov 2

Hey chat, Can anyone tell me what are the other account i should follow as a backend engineer.

Expand full comment

Samir Kaushik

Oct 26

Great post, I did not know about in memory bloom filters and how useful some in-memory data structures could be for fast lookup. I have a few questions:

1. How do we warm up the bloom filter? I mean for the very first request it is all initialized to 0 which means every username is available according to the filter.

2. We have multiple server instances (since we have a load balancer). Let's say, one server's in-memory bloom filter is updated to say "abc" username is taken. How is the same change synchronised with the other server's in-memory bloom filter?

Expand full comment

Haseeb

Oct 19

Amazing 😍

Expand full comment

Ajay

Oct 8

Great post. However, I am not sure about level 4 where trie is used for suggesting similar usernames. In my understanding, they are not efficient at a billion-users scale. 🤔

Expand full comment

shivasaipasham

Oct 1Edited

What if data is stored in a non-relational database such as mongodb or dynamodb? In that case assuming username as the partition key is the db lookup faster or using bloomfilter? Also if bloomfilter or trie is stored in memory will is not be cleared on server restart?

Expand full comment

Ravi Yadav

Oct 1Edited

When you talk about fast look ups with Trie, how does it take less time checking in DB?

If DB itself uses Trie data structure, then it makes sense.

If someone explains it here, it’ll be really helpful.

Expand full comment

Reply (1)

Ashish Pratap Singh

Oct 1

Trie is usually stored in-memory while DB fetches from disk. That's why trie is usually faster. Also, trie works in O(m) where m is just the length of the string which is lot faster. Many relational databases O(log N) time for lookups after indexing, where N is the number of records.

Expand full comment

Reply (1)

Ravi Yadav

Oct 23

Thanks

Expand full comment