Serverless Jobs — a New Paradigm in Cloud
We may ask? Why do we call a new paradigm for the serverless job, everybody using Cloud Services nowadays and especially serverless functionalities? Here I’m gonna share my experience from one of my recent projects in cloud computing to conclude if is this a paradigm or not.
What are Serverless Jobs?
- Serverless computing (or serverless for short), is an execution model where the cloud provider (AWS, Azure, or Google Cloud) is responsible for executing a piece of codeby dynamically allocating the resources. And only charges for the number of resources used to run the code.
- A program can solve the problem with the execution(Scheduled/On-Demand) independentlyand it’s called Jobs.
- To combine these two (Serverless + Job) we need anto orchestrate the cloud + execution and deploy the program.
Since I came across multiple frameworks in our projects like quartz jobs which heavily used dedicated server (EC2 – instances) 24/7 (We may think why can’t we stop when not used — Hence the schedule framed in such a way).
Here we have limitations (problem) when we used the frameworks,
- Job-based resource allocations
- Use a resource when you need itto avoid the upfront cost
- Job-based programminglanguage supports
- Job-based resource utilization monitoring
- On-Demand execution with task workflow (based on load distributed execution)
Given are the solutions based on the projects I have worked and from the experience in AWS Cloud.
I took the following AWS services to achieve the Serverless Jobs paradigm.
- Event Bridge (scheduling)
- ECS ( Fargate)
- API Gateway
- Simple Queue Service
Here, I will try to justify each resource, and why have used these stacks.
1. Event Bridge
The primary element of the job is always to schedule the things, We used this schedule as a resource for each job. There is also the feasibility to enable or disable at any time.
The core of art — the step function plays a major role in serverless jobs architecture. All our above problem statements solution covered in this single resource.
It connects multiple things like job stack/audit lambda/dynamo DB/notification(slack).
We built a workflow more customized based on below problems.
Sometimes the jobs have to run one at a time, we used to check in dynamo DB whether already a trigger entry is in the PROGRESS state. If so, the workflow has to cancel the executions else execute the trigger.
Everybody knows the serverless jobs will be a lambda, Yes it’s correct.
Will solve the following problems:
It supports many languages, Preferable smaller and quicker jobs for low cost with second level pricing.
But not all the cases will it solve since lambda has its own limitation of runtime timeout, code bundling size, and memory size.
4.ECS ( Fargate)
To solve the Lambda limitation, we came to know that Elastic Container Service has the capability and solves our problems.
For Long-running and more memory consumable jobs are deployed and run through step functions.
Elastic Container Service — which builds our solution bundle and is deployed in ECR, especially for ECS jobs and lambda container jobs.
The bucket has hold the Lambda handler bundle.
The dynamo DB is crucial for this system since it is auto scalable and event-driven support natively.
- Jobs (job configuration like — job name, type (lambda/ECS), memory, language, bundle(binary) location)
- Stacks (Where stack have deployed — lambda arn, ecr, cloud watch path,ecs task, ecs cluster)
- Triggers ( The execution of jobs each time — start time, end time, status, success/failure message)
Mainly the jobs table has enabled the event( CREATE/MODIFY/DELETE) all these events have to execute the lambda(we create a small java program using cloud formation SDK) and based on data it builds a cloud formation dynamically and creates/updates/delete a stack and update the stack details in the appropriate table.
Capture all the logs in single places and monitor the jobs very easily.
On top of dynamo DB data, we build a simple User Interface to check/monitor the job/stack/trigger details efficiently. Also, add the on-demand job trigger API which connects the step function and runs the jobs instantly with dynamic payload.
10.Simple Queue Service
Assume that suddenly the jobs have a huge load to process certain time limit, we could split the jobs, run in parallel, and complete the triggers.
We used to run this model for more than a year, almost 10 million executions were completed and run without any issues. This is why I said earlier Serverless Jobs — a New Paradigm in Cloud, Still, we are adding more jobs in this model.
Even though we used all these AWS services, success of this implementation was by providing simple UI and CLI plugins to manage these services.
Author: Sabarinathan Soundararajan