A distributed job scheduler is a system designed to manage, schedule, and execute tasks (referred to as "jobs") across multiple computers or nodes in a distributed network.
how does a node will know which segments it has to pick from, is it hardcoded.
If the current node fails, how will segment's job will be executed after sometime.
how does the failure of node, can be taken care of, when the job is in processing status ?
Should we have a timeout where the queue itself, after sometime will mark the job execution to fail and that job will be ready to be executed by other node.
Hey, the node segments are managed by the coordinator. Segments can be stored in a key-value store. The coordinator initially assigns segments evenly across the worker nodes but if one of the nodes go down, it will re-assign it's segment to other workers.
The segment assignment is dynamic and coordinator makes sure that the load is evenly balanced (like a load balancer).
When a worker node fails during a job execution, the job it re-assigned to a different node. The coordinator node does regular health-checks/heartbeats to keep track of the node status.
Although I didn't mention it in this post but it's a good idea to have a timeout so that if a certain job is stuck, it can be re-assigned to a different worker.
Great content ! One question, do we need leader-election for coordinator node, as it is stateless , can we simply add more replicas to need ? Any state like worker states etc can be pushed to S3 or DB.
>"Limit the rate at which jobs are pushed into the distributed job queue. This can be achieved by implementing queue-level throttling, where only a certain number of jobs are allowed to enter the queue per second or minute."
If the execution time is the same for all jobs and the number of jobs exceeds the queue-level throttling limit, how do you determine which jobs should be submitted to the queue—especially if no priority is defined?
>Rate Limiting at the Worker Node Level
Do we need this ? given we already know available capacity for a single worker and based on that we could be able to decide to submit a new job or not.
If a job belongs to segment X and all workers assigned to it are busy, with no indication of when they will become free, a starvation problem could arise. Although there may be free workers in segment Y, they are unable to execute jobs from segment X.
This raises the question: is there a fallback mechanism to handle such scenarios, or should the system dynamically reassign segments based on worker availability?
what happen if there are some recurrent jobs taking long time to finish, will it block their subsequent scheduled jobs ? Should we integrate Celery with the message queue ?
Hey Ashish, very well written. Can you please tell me how are you ensuring that the jobs are scheduled in realtime as there can be some lag introduced by kafka and the worker which is actually executing the task.
Yeah, there could be some lag during the peak time but if we are scaled enough, it should be fine. Overall, the lag should be minimal compared to the actual job execution time.
Your Website is slow, but content is great
Hey, I am actually using Substack for my blog.
Amazing detailed article. I didn't expected Leader Election and other concepts to get covered. Great Work!!
thank you!
how does a node will know which segments it has to pick from, is it hardcoded.
If the current node fails, how will segment's job will be executed after sometime.
how does the failure of node, can be taken care of, when the job is in processing status ?
Should we have a timeout where the queue itself, after sometime will mark the job execution to fail and that job will be ready to be executed by other node.
Hey, the node segments are managed by the coordinator. Segments can be stored in a key-value store. The coordinator initially assigns segments evenly across the worker nodes but if one of the nodes go down, it will re-assign it's segment to other workers.
The segment assignment is dynamic and coordinator makes sure that the load is evenly balanced (like a load balancer).
When a worker node fails during a job execution, the job it re-assigned to a different node. The coordinator node does regular health-checks/heartbeats to keep track of the node status.
Although I didn't mention it in this post but it's a good idea to have a timeout so that if a certain job is stuck, it can be re-assigned to a different worker.
Very informative blog with detailed info.
Thank you
Keep it up.
Thank you, great to hear this!
Extremely helpful and well-detailed. Thank you!
Thank you so much for your kind feedback!
Well described from A to Z. Like the first part with requirements. Caught me from the first sentence. Thank you
Great to hear this, thank you 😊
Great content ! One question, do we need leader-election for coordinator node, as it is stateless , can we simply add more replicas to need ? Any state like worker states etc can be pushed to S3 or DB.
What exactly is a worker node denote here?
Is it an instance of the schedulingService or the Execution service?
Good details. great work :)
Couple of follow up questions -
>"Limit the rate at which jobs are pushed into the distributed job queue. This can be achieved by implementing queue-level throttling, where only a certain number of jobs are allowed to enter the queue per second or minute."
If the execution time is the same for all jobs and the number of jobs exceeds the queue-level throttling limit, how do you determine which jobs should be submitted to the queue—especially if no priority is defined?
>Rate Limiting at the Worker Node Level
Do we need this ? given we already know available capacity for a single worker and based on that we could be able to decide to submit a new job or not.
If a job belongs to segment X and all workers assigned to it are busy, with no indication of when they will become free, a starvation problem could arise. Although there may be free workers in segment Y, they are unable to execute jobs from segment X.
This raises the question: is there a fallback mechanism to handle such scenarios, or should the system dynamically reassign segments based on worker availability?
For on demand job, this architecture doesn't give client a way to trigger existing jobs. I see client can interact only for job submission.
Great article, very useful.
what happen if there are some recurrent jobs taking long time to finish, will it block their subsequent scheduled jobs ? Should we integrate Celery with the message queue ?
Hey Ashish, very well written. Can you please tell me how are you ensuring that the jobs are scheduled in realtime as there can be some lag introduced by kafka and the worker which is actually executing the task.
Hey,
Yeah, there could be some lag during the peak time but if we are scaled enough, it should be fine. Overall, the lag should be minimal compared to the actual job execution time.
Hey you re running a newsletter after leaving your full time job. Is it sufficient to cover your expenses?
How much is the growth in a newsletter business?