AWS ECS deployments step-by-step

The default ECS deployment type is called rolling update. With this simple approach, running ECS tasks are replaced with new ECS tasks.

You control this process with the deployment configuration, where you define the minimum and maximum number of tasks allowed during a deployment. Through this mechanism you can ensure enough tasks are running to service your traffic, and likewise you’re not overspending by running too many.

To understand the deployment process we’ll take the following simple setup as an example.

ECS service - with a desired count of 1, the service tries to make sure one ECS task is running at a time
Load balancer - in this case we’re using an Application Load Balancer (ALB), which receives user requests through a load balancer listener
Target group - connected to the load balancer listener, the target group contains target IP addresses across which incoming traffic is distributed
Target - automatically created by the ECS service, the target references the private IP of the ECS task

For this setup, let’s have a look at how it looks in the AWS Console.

Going to Services > Elastic Container Service > Clusters, clicking on the cluster name, clicking on the service, then selecting the Tasks tab shows us we have a single ECS task running.

The Deployments tab shows us the status of any ongoing deployments. We have a single deployment listed in the PRIMARY status, meaning this is the most recent deployment. No other records means there is no ongoing deployment.

Lastly, let’s go to Services > EC2 > Target Groups, then select the target group relevant to this service. Selecting the Targets tab shows us we have a single registered target in the healthy status, pointing at our running ECS task.

At each step through the deployment process covered in the rest of this article, we’ll evaluate the state of each these resources.

Default deployment configuration: for this example we’ll use the default deployment configuration, which allows a maximum of 2 tasks (200% of the service’s desired count) and a minimum of 1 task (100% of the desired count). This means during deployment a new task will be created before the old one is terminated.

Now we’re going to step through the five steps of the ECS deployment process for the rolling update deployment. If you want to follow along with this article, feel free to create the same setup in your own account using the CloudFormation described in the Try it out yourself section.

1) Initiate deployment

In a scenario where you want to deploy a new version of an ECS, task you’ll need to initiate a deployment. This might be through CloudFormation, Terraform, the AWS CLI, or AWS Console.

One way to do this is to issue this AWS CLI command, which starts a deployment even if there are no changes to make.

aws ecs update-service --service <service-name> --cluster <cluster-name> --force-new-deployment

Whatever method is used, we’ll end up in a situation where a deployment is initiated in ECS. Note this is the only manual step, with everything from this point onwards being managed automatically by AWS.

Now the Deployments tab shows a new deployment in the PRIMARY state. You can see it has a desired count of 1 task, but a running count of 0 because the task hasn’t started yet.

The deployment we saw before has moved to the ACTIVE state, which means it still has tasks running (1 task), but is in the process of being replaced by a new PRIMARY deployment.

2) Start new ECS task

Now on the Tasks tab we have a new ECS task in the PROVISIONING state. This is the stage where setup happens like creating an elastic network interface.

Shortly afterwards the task moves to the PENDING status, where the container agent is doing things like pulling the Docker image from the repository.

Last, the task moves into the RUNNING status, which means the Docker image has started.

This doesn’t necessarily mean the new task is ready to receive traffic, as the application running inside may have a startup sequence to run.

At this point we have two tasks running, one from each deployment.

Even though the new task is in a RUNNING state, there’s no way to make requests to it until it’s been registered into the load balancer.

3) Register load balancer target

Back in the list of target group targets, we now have two targets. The first is for the old ECS task, and the second for the new task. The initial state means the load balancer is registering the target and performing initial health checks.

Since the application running in the ECS task may not immediately be ready to serve traffic, it may take some time to return a successful health check result and move to the healthy status

4) Drain load balancer target

Once the new target gets a successful health check response it’s marked as healthy and it will start receiving user requests. The target to be replaced goes into a draining state.

When the target is draining it’s in the process of being deregistered from the target group, which means:

no more requests will be sent to the target
any in-flight requests can be completed, as long as it’s done within the target group’s deregistration delay period (more on this later)

At this point we still have two registered targets in the target group, only one of which is receiving new requests.

Once the deregistration delay has elapsed though, the old target is removed from the target group entirely. All traffic is now being served by the new target.

5) Stop ECS Task

At this stage, we have an ECS task that has been deregistered from the load balancer, but is still running. ECS will now schedule the deletion of the old ECS task, following this procedure:

a SIGTERM signal is sent to any containers in the ECS task to tell them to shut down (remember an ECS task can have multiple containers). This is the equivalent of running docker stop <container-id> against a Docker container. Just like with Docker, the ECS task has a default timeout of 30s in which to terminate (more on this later).
If the ECS task has not gracefully shutdown by the end of the stop timeout, it will be forcefully killed with the SIGKILL signal.

At this point we have a single ECS task running.

And in the Deployments tab we have a single deployment listed in the PRIMARY status, indicating we have no ongoing deployment.

Awesome! We’ve successfully reached our desired state of a single newly deployed ECS task, receiving requests through the load balancer.

ECS deployment timeouts

During the deployment process above there were two timeouts mentioned that are crucial to ensure:

your users receive a seamless uninterrupted experience during deployment
your deployments happen as quickly as possible

Deregistration delay

In deployment step 4 above, ECS initiates the deregistration of the ECS task from the load balancer target group. In order to ensure that any in-flight requests can be serviced without interruption, the target group waits for the configured deregistration delay before removing the target.

During the deregistration process:

any in-flight requests can continue to receive traffic
no new connections are established to the target being deregistered

The deregistration delay is configured on a target group, with a default of 300 seconds (5 minutes). You can view and edit the value for a target group in the Group details tab.

It’s worth noting that even if there are no in-flight requests, the load balancer will wait for the full deregistration delay before removing the target. This can make deployments unnecessarily slow, especially if you have an ECS task that doesn’t have any long running requests.

Consider reducing the deregistration delay to speed up deployments of ECS services that have short-lived requests.

Stop timeout

Once the ECS task has been deregistered from the target group, ECS still needs to stop it. This process begins by sending any containers running in the task the SIGTERM signal, giving them a chance to begin shutting down gracefully. By default, a container has 30 seconds to complete any shutdown logic, before the SIGKILL signal is sent to it, forcefully killing it.

You can see the configured stop timeout for a container in the container definition within the task definition (see Services > Elastic Container Service > Task Definitions).

Depending on which launch type you’re using for your ECS task, the stop timeout can have a maximum value of:

120 seconds for the Fargate launch type
no maximum for the EC2 launch type

If your ECS service has any long running requests or background processes, consider increasing the stop timeout.

Try it out yourself

To familiarise yourself with the ECS deployment process and its parameters I suggest creating your own deployment. You can then modify the parameters described above and see how the behaviour changes.

To get going with an example quickly, here’s a simple CloudFormation template I’ve created for you which creates:

a VPC and subnets
an ECS task definition for using the NGINX container (a simple container that returns a default response on port 80)
an ECS service with a desired count of 1
a load balancer and configuration to integrate the ECS service with it

The launch stack button above will open the create stack page in your own AWS account. Just agree to the additional capabilities and click Create stack. After 5 minutes the stack should reach the CREATE_COMPLETE status.

Now you can go to Services > EC2 > Load balancers, click on the newly created load balancer, and copy the DNS name. Paste this into a browser and you’ll get the default Welcome to nginx! response. If you see this, you know everything’s working correctly

Initiate an ECS deployment with:

aws ecs update-service --service <service-name> --cluster default-cluster --force-new-deployment

Using the service name which you can get from Services > Elastic Container Service > Clusters > default-cluster. Watch the deployment process described in this article yourself, through the AWS Console.

Don’t forget to delete the CloudFormation stack when you’re done to avoid any unnecessary charges.

Resources

Check out these AWS docs describing the load balancer deregistration delay
In the docs Task definition parameters there’s more info on the task stop timeout

1) Initiate deployment#

2) Start new ECS task#

3) Register load balancer target#

4) Drain load balancer target#

5) Stop ECS Task#

ECS deployment timeouts#

Deregistration delay#

Stop timeout#

Try it out yourself#

Resources#