18 Comments
Sep 14Liked by Ashish Pratap Singh

Your Website is slow, but content is great

Expand full comment
author

Hey, I am actually using Substack for my blog.

Expand full comment
Sep 29Liked by Ashish Pratap Singh

Amazing detailed article. I didn't expected Leader Election and other concepts to get covered. Great Work!!

Expand full comment
author

thank you!

Expand full comment
Sep 16Liked by Ashish Pratap Singh

how does a node will know which segments it has to pick from, is it hardcoded.

If the current node fails, how will segment's job will be executed after sometime.

how does the failure of node, can be taken care of, when the job is in processing status ?

Should we have a timeout where the queue itself, after sometime will mark the job execution to fail and that job will be ready to be executed by other node.

Expand full comment
author

Hey, the node segments are managed by the coordinator. Segments can be stored in a key-value store. The coordinator initially assigns segments evenly across the worker nodes but if one of the nodes go down, it will re-assign it's segment to other workers.

The segment assignment is dynamic and coordinator makes sure that the load is evenly balanced (like a load balancer).

When a worker node fails during a job execution, the job it re-assigned to a different node. The coordinator node does regular health-checks/heartbeats to keep track of the node status.

Although I didn't mention it in this post but it's a good idea to have a timeout so that if a certain job is stuck, it can be re-assigned to a different worker.

Expand full comment
Sep 16Liked by Ashish Pratap Singh

Very informative blog with detailed info.

Thank you

Keep it up.

Expand full comment
author

Thank you, great to hear this!

Expand full comment
Sep 12Liked by Ashish Pratap Singh

Extremely helpful and well-detailed. Thank you!

Expand full comment
author

Thank you so much for your kind feedback!

Expand full comment

Well described from A to Z. Like the first part with requirements. Caught me from the first sentence. Thank you

Expand full comment
author

Great to hear this, thank you 😊

Expand full comment

Good details. great work :)

Couple of follow up questions -

>"Limit the rate at which jobs are pushed into the distributed job queue. This can be achieved by implementing queue-level throttling, where only a certain number of jobs are allowed to enter the queue per second or minute."

If the execution time is the same for all jobs and the number of jobs exceeds the queue-level throttling limit, how do you determine which jobs should be submitted to the queue—especially if no priority is defined?

>Rate Limiting at the Worker Node Level

Do we need this ? given we already know available capacity for a single worker and based on that we could be able to decide to submit a new job or not.

If a job belongs to segment X and all workers assigned to it are busy, with no indication of when they will become free, a starvation problem could arise. Although there may be free workers in segment Y, they are unable to execute jobs from segment X.

This raises the question: is there a fallback mechanism to handle such scenarios, or should the system dynamically reassign segments based on worker availability?

Expand full comment

For on demand job, this architecture doesn't give client a way to trigger existing jobs. I see client can interact only for job submission.

Expand full comment

Great article, very useful.

what happen if there are some recurrent jobs taking long time to finish, will it block their subsequent scheduled jobs ? Should we integrate Celery with the message queue ?

Expand full comment

Hey Ashish, very well written. Can you please tell me how are you ensuring that the jobs are scheduled in realtime as there can be some lag introduced by kafka and the worker which is actually executing the task.

Expand full comment
author

Hey,

Yeah, there could be some lag during the peak time but if we are scaled enough, it should be fine. Overall, the lag should be minimal compared to the actual job execution time.

Expand full comment

Hey you re running a newsletter after leaving your full time job. Is it sufficient to cover your expenses?

How much is the growth in a newsletter business?

Expand full comment