A notification service is a system responsible for delivering timely and relevant information to users across various channels such as SMS, email, push notifications, and in-app messages.
This was quite amazing and super super helpful. I am an experienced coder but have not had many opportunities to design high scale systems at work or in practice so this is going to help me a ton.
This is awesome content. Just few points to add on if you agree:
1) Syncing mechanisms between Cache and User Preferences NoSQL db to ensure consistency
2) How does Cache payload look like along with the key
3) Cache TTL, eviction strategy, invalidation strategy, cache writing strategy, dealing with a data level conflict (discrepancy between current cache and NoSQL db entry)
4) Deduplication of notification so that users don’t get same notification again
very nicely explained, specially the details of request message and types of db usage, the way notification service is crafting the message for individual channel. learning alot from you.
How we are performing rate limiting based on user preferences is not clearly mentioned in the above design. How the service checks the number of notifications sent to a user and compares it with its preferences that flow is absent. And also the bottlenecks related to it.
That's what I think, if a failure occurs will we loose all the data/events that was supposed to be pick by the Notification Service ? Can anyone help here ?
I learnt a lot. One query here is it channel processor fails to deliver the notification due to some reason , will it take responsibility of retrying again or it can just update in the scheduler database with scheduled time as the next time using exponential backoff strategy so scheduler picks it up for the next retry. What would be the expected approach should be ?
thank you so much for this amazing explanation. can this notification system design meet my situation where I have 30k IoT devices, and users can create custom conditions? For example, user1 has device1 and device2 and may create condition1 (for example, notify me if the temperature > 89) for device1 and condition2 (temperature > 60) for device2.
Great post Ashish, well articulated main components and flow.
I noticed that message content generation is done by Notification Service. Would it make more resilient and robust to put behind inbound queue as to manage the spike in load on Notification service? Each request to Notification service to generate message will also take up connection from the web server to handle the request until generated message is placed in the queue. What if error occurs during generation of message content, will the request be dropped?
That's what I think, if a failure occurs will we loose all the data/events that was supposed to be pick by the Notification Service ? Can anyone help here ?
Great insights. This article helps how to think widely considering every aspect while designing a scalable and reliable system. Can you provide some example for scheduler service readily available to use? Also, you can provide a deeper explanation of how a scheduler service should be designed.
How will you partition the Kafka topics to support such a large workload? Also, we are sending large email attachments on the Kafka queue, won't this affect the performance, anything we can do to mitigate it?
This was quite amazing and super super helpful. I am an experienced coder but have not had many opportunities to design high scale systems at work or in practice so this is going to help me a ton.
Really happy to hear this, thank you so much!
Nice one Ashish 👍🏻
Excellent study case, Ashish!
Impressive! You are truly a gem. Thank you for dedicating your time and sharing such valuable content with crystal-clear clarity.
Thank you 😊
Love to hear this.
Great content brother
thank you!
Thanks for sharing.
you are welcome!
Ashish,
This is awesome content. Just few points to add on if you agree:
1) Syncing mechanisms between Cache and User Preferences NoSQL db to ensure consistency
2) How does Cache payload look like along with the key
3) Cache TTL, eviction strategy, invalidation strategy, cache writing strategy, dealing with a data level conflict (discrepancy between current cache and NoSQL db entry)
4) Deduplication of notification so that users don’t get same notification again
very nicely explained, specially the details of request message and types of db usage, the way notification service is crafting the message for individual channel. learning alot from you.
How we are performing rate limiting based on user preferences is not clearly mentioned in the above design. How the service checks the number of notifications sent to a user and compares it with its preferences that flow is absent. And also the bottlenecks related to it.
That's what I think, if a failure occurs will we loose all the data/events that was supposed to be pick by the Notification Service ? Can anyone help here ?
I learnt a lot. One query here is it channel processor fails to deliver the notification due to some reason , will it take responsibility of retrying again or it can just update in the scheduler database with scheduled time as the next time using exponential backoff strategy so scheduler picks it up for the next retry. What would be the expected approach should be ?
Thanks.
thank you so much for this amazing explanation. can this notification system design meet my situation where I have 30k IoT devices, and users can create custom conditions? For example, user1 has device1 and device2 and may create condition1 (for example, notify me if the temperature > 89) for device1 and condition2 (temperature > 60) for device2.
yes this will surely a way to implement this usecase. Try to start with few client prototype and then scale to 30k number.
Let’s talk about implementing the notification service , with best design / architecture / patterns .
I would like to do it in rust .
Anyone interested to tag along ?
I am thinking to open source such components so lot others can benefit
Great post Ashish, well articulated main components and flow.
I noticed that message content generation is done by Notification Service. Would it make more resilient and robust to put behind inbound queue as to manage the spike in load on Notification service? Each request to Notification service to generate message will also take up connection from the web server to handle the request until generated message is placed in the queue. What if error occurs during generation of message content, will the request be dropped?
That's what I think, if a failure occurs will we loose all the data/events that was supposed to be pick by the Notification Service ? Can anyone help here ?
Great insights. This article helps how to think widely considering every aspect while designing a scalable and reliable system. Can you provide some example for scheduler service readily available to use? Also, you can provide a deeper explanation of how a scheduler service should be designed.
How will you partition the Kafka topics to support such a large workload? Also, we are sending large email attachments on the Kafka queue, won't this affect the performance, anything we can do to mitigate it?