Guide to Deploy a Web App on AWS with Terraform and GitHub Actions
In this guide, I will show you how to deploy a very basic Express.js web application on AWS ECS using the Fargate launch type.
We will use Terraform to provision and manage all the resources needed, like the ECS cluster, VPC, ECR repository, IAM roles, etc. Then we'll use GitHub Actions to build the project and automatically deploy the application and infrastructure changes with the push of a new git tag.
Be aware that some AWS resources created throughout this guide are not free. The costs will depend on your region and the resources allocated. The initial costs will be lower if you qualify for the Free Tier.
Prerequisites
This guide assumes that you already have the following:
Some knowledge of Git, Terraform, AWS, and Docker
Terraform v1.4.1 (latest version at the time of writing)
AWS account
AWS credentials configured on your local machine
GitHub account and a git repository
Node.js installed on your machine.
A Brief Introduction to AWS ECS and Fargate:
ECS is a fully managed service for running and orchestrating Docker containers, so you don't have to worry about managing the underlying infrastructure. It allows you to create a cluster of EC2 instances and then run tasks on those instances where each task is a containerized application.
Fargate is a serverless compute engine for containers built on top of ECS. When you launch a task or a service on ECS with Fargate, it will provision all the necessary infrastructure to run your containers without the need to provision and manage the underlying EC2 instances.
You can read more about ECS and Fargate here and here.
Hello World! A very basic Express.js web application
In an empty git repository, create an app.js
file and add the following:
const express = require('express')
const app = express()
const port = 3000
app.get('/', (req, res) => {
res.send('Hello World!')
})
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
Next, create a Dockerfile
file with the following:
FROM node:18
WORKDIR /usr/src/app
COPY package*.json app.js ./
RUN npm install
EXPOSE 3000
CMD ["node", "app.js"]
And finally, create a .gitignore
file with the following:
node_modules
terraform/.terraform
Now, inside the project run npm init -y
to auto generate a package.json
file and then run npm install express
.
That's it! You can run the app with the following command node app.js
and then go to http://localhost:3000. You should see "Hello World!".
Infrastructure Provisioning with Terraform
Before starting, I must mention that some examples in this guide will use the open-source "Cloud Posse" terraform modules. These modules simplify the provisioning of AWS services and the underlying resources. They are well-maintained and have very good documentation. You can find more about the modules and what properties each module supports on their GitHub.
Ok, without further ado, let's jump to the Terraform code.
First, create a folder named terraform
in your project and add the following files: backend.tf
, main.tf
, outputs.tf
, provider.tf
, variables.tf
, and versions.tf
.
Let's set up the backend where terraform will store its state. For this, you must first create an S3 bucket and give it any name, e.g., "tfstate-a123" (name must be globally unique). This is the only resource that we'll create without terraform.
Tip: You should enable Bucket Versioning on the S3 bucket to allow for state recovery if something goes wrong.
Add the following to the backend.tf
:
# backend.tf
terraform {
backend "s3" {
bucket = "tfstate-a123" # name of the s3 bucket you created
key = "production/terraform.tfstate"
region = "eu-west-1" # change to your region
encrypt = true
}
}
Now let's add the rest of the Terraform configuration. Add the following to the provider.tf
:
# provider.tf
provider "aws" {
region = var.region # value will be set later in variables.tf
default_tags {
tags = {
ManagedBy = "Terraform"
}
}
}
This tells Terraform which provider to use and the region where to create the resources. Here we also set the default_tags
property to automatically tag all resources that support tagging.
Next, let's set the required terraform version and the providers. Add the following to the versions.tf
:
# versions.tf
terraform {
required_version = ">= 1.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 4.0"
}
}
}
Now let's set the variables that we'll use throughout the code. Add the following to the variables.tf
:
# variables.tf
variable "region" {
type = string
default = "eu-west-1"
description = "AWS Region"
}
variable "namespace" {
type = string
default = "app"
description = "Usually an abbreviation of your organization name, e.g. 'eg' or 'cp'"
}
variable "stage" {
type = string
default = "prod"
description = "Usually used to indicate role, e.g. 'prod', 'staging', 'source', 'build', 'test', 'deploy', 'release'"
}
variable "name" {
type = string
default = "myproject"
description = "Project name"
}
variable "image_tag" {
type = string
default = "latest"
description = "Docker image tag"
}
variable "container_port_mappings" {
type = list(object({
containerPort = number
hostPort = number
protocol = string
}))
default = [
{
containerPort = 3000
hostPort = 3000
protocol = "tcp"
}
]
description = "The port mappings to configure for the container. This is a list of maps. Each map should contain \"containerPort\", \"hostPort\", and \"protocol\", where \"protocol\" is one of \"tcp\" or \"udp\". If using containers in a task with the awsvpc or host network mode, the hostPort can either be left blank or set to the same value as the containerPort"
}
variable "desired_count" {
type = number
description = "The number of instances of the task definition to place and keep running"
default = 1
}
The values for the namespace
, stage
and name
will dictate how the resources will be named. Change them to your needs. With the default values in this example the resources will be named or prefixed with "app-prod-myproject".
And now comes the fun part, the provisioning of the services needed to run a web application. Let's start by adding the configurations for the VPC and Subnets resources first.
Add the following to the main.tf
:
# main.tf
module "vpc" {
source = "cloudposse/vpc/aws"
version = "2.0.0"
namespace = var.namespace
stage = var.stage
name = var.name
ipv4_primary_cidr_block = "10.0.0.0/16"
}
module "subnets" {
source = "cloudposse/dynamic-subnets/aws"
version = "2.0.4"
namespace = var.namespace
stage = var.stage
name = var.name
availability_zones = ["eu-west-1a", "eu-west-1b", "eu-west-1c"] # change to your AZs
vpc_id = module.vpc.vpc_id
igw_id = [module.vpc.igw_id]
ipv4_cidr_block = [module.vpc.vpc_cidr_block]
nat_gateway_enabled = true
max_nats = 1
}
Here, max_nats
is set to 1
, to save costs at the expense of availability as NAT Gateways are fairly expensive. You can remove or comment out the max_nats
property and it will create a NAT Gateway for each "Availability Zone" (number depends on your region) or you can set the nat_gateway_enabled
to false
if you don't want to create any NAT Gateway and then create resources under the public subnets if they need access to services outside the VPC.
From AWS documentation:
A NAT gateway is a Network Address Translation (NAT) service. You can use a NAT gateway so that instances in a private subnet can connect to services outside your VPC but external services cannot initiate a connection with those instances.
With that said, if you choose not to use a NAT Gateway, don't worry. The Security Group attached to the ECS Service will only allow incoming requests from the ALB.
Now let's add the configuration for the Application Load Balancer (ALB). This will be the entry point to the application and as the name suggest, it will automatically distribute the incoming traffic across multiple targets, meaning ECS Tasks aka docker containers.
# main.tf
module "alb" {
source = "cloudposse/alb/aws"
version = "1.7.0"
namespace = var.namespace
stage = var.stage
name = var.name
access_logs_enabled = false
vpc_id = module.vpc.vpc_id
ip_address_type = "ipv4"
subnet_ids = module.subnets.public_subnet_ids
security_group_ids = [module.vpc.vpc_default_security_group_id]
# https_enabled = true
# certificate_arn = aws_acm_certificate.cert.arn
# http_redirect = true
health_check_interval = 60
}
The https
related properties are commented as it's out of scope for this guide. There will be a follow up article on how to create an SSL certificate with AWS Certificate Manager and enable HTTPS.
Next comes the ECS configuration. This will create the ECS Cluster, ECR, CloudWatch Logs, a Container definition -- which will be used to create the ECS task definition -- and the Service that will run and manage the ECS task definition.
# main.tf
module "ecr" {
source = "cloudposse/ecr/aws"
version = "0.35.0"
namespace = var.namespace
stage = var.stage
name = var.name
max_image_count = 100
protected_tags = ["latest"]
image_tag_mutability = "MUTABLE"
enable_lifecycle_policy = true
# Whether to delete the repository even if it contains images
force_delete = true
}
module "cloudwatch_logs" {
source = "cloudposse/cloudwatch-logs/aws"
version = "0.6.6"
namespace = var.namespace
stage = var.stage
name = var.name
retention_in_days = 7
}
module "container_definition" {
source = "cloudposse/ecs-container-definition/aws"
version = "0.58.1"
container_name = "${var.namespace}-${var.stage}-${var.name}"
container_image = "${module.ecr.repository_url}:${var.image_tag}"
container_memory = 512 # optional for FARGATE launch type
container_cpu = 256 # optional for FARGATE launch type
essential = true
port_mappings = var.container_port_mappings
# The environment variables to pass to the container.
environment = [
{
name = "ENV_NAME"
value = "ENV_VALUE"
},
]
# Pull secrets from AWS Parameter Store.
# "name" is the name of the env var.
# "valueFrom" is the name of the secret in PS.
secrets = [
# {
# name = "SECRET_ENV_NAME"
# valueFrom = "SECRET_ENV_NAME"
# },
]
log_configuration = {
logDriver = "awslogs"
options = {
"awslogs-region" = var.region
"awslogs-group" = module.cloudwatch_logs.log_group_name
"awslogs-stream-prefix" = var.name
}
secretOptions = null
}
}
module "ecs_alb_service_task" {
source = "cloudposse/ecs-alb-service-task/aws"
version = "0.66.4"
namespace = var.namespace
stage = var.stage
name = var.name
use_alb_security_group = true
alb_security_group = module.alb.security_group_id
container_definition_json = module.container_definition.json_map_encoded_list
ecs_cluster_arn = aws_ecs_cluster.ecs_cluster.arn
launch_type = "FARGATE"
vpc_id = module.vpc.vpc_id
security_group_ids = [module.vpc.vpc_default_security_group_id]
subnet_ids = module.subnets.private_subnet_ids # change to "module.subnets.public_subnet_ids" if "nat_gateway_enabled" is false
ignore_changes_task_definition = false
network_mode = "awsvpc"
assign_public_ip = false # change to true if "nat_gateway_enabled" is false
propagate_tags = "TASK_DEFINITION"
desired_count = var.desired_count
task_memory = 512
task_cpu = 256
force_new_deployment = true
container_port = var.container_port_mappings[0].containerPort
ecs_load_balancers = [{
container_name = "${var.namespace}-${var.stage}-${var.name}"
container_port = var.container_port_mappings[0].containerPort
elb_name = ""
target_group_arn = module.alb.default_target_group_arn
}]
}
There are two parameters that I want to mention here, the ignore_changes_task_definition
and force_new_deployment
as they are related to how a new version of the app is deployed live. For the first, we want terraform to track changes to the ECS task definition, like the container_image
value and to create a new ECS task definition version. For the second we tell the Service to deploy the latest version immediately.
And last we will create the assume IAM Role and the OpenID Connect (OIDC) Identity provider. This will allow GitHub Actions to request temporary security credentials for access to AWS resources.
# main.tf
resource "aws_iam_openid_connect_provider" "github_actions_oidc" {
url = "https://token.actions.githubusercontent.com"
client_id_list = [
"sts.amazonaws.com",
]
thumbprint_list = [
"6938fd4d98bab03faadb97b34396831e3780aea1"
]
tags = {
Namespace = var.namespace
Stage = var.stage
Name = var.name
}
}
resource "aws_iam_role" "github_actions_role" {
name = "github_actions"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Federated = aws_iam_openid_connect_provider.github_actions_oidc.arn
},
Action = "sts:AssumeRoleWithWebIdentity",
Condition = {
StringLike = {
"token.actions.githubusercontent.com:sub" : "repo:EXAMPLE_ORG/REPO_NAME:*"
},
StringEquals = {
"token.actions.githubusercontent.com:aud" : "sts.amazonaws.com"
}
}
}
]
})
managed_policy_arns = ["arn:aws:iam::aws:policy/AdministratorAccess"]
tags = {
Namespace = var.namespace
Stage = var.stage
Name = var.name
}
}
The StringLike
condition will match and allow any branch, pull request merge branch, or environment from the EXAMPLE_ORG/REPO_NAME
to assume the github_actions
IAM Role. If you want to limit to a specific branch like main
, then replace the wildcard (*) with the branch name. You can read more here.
For the purpose of this guide we'll attach the AWS managed AdministratorAccess policy to the role. As the name suggests the role will have administrator permissions.
It's important to mention that the policies assigned to the role determine what the GitHub Actions user is allowed to do in AWS. As a security best practice it's highly recommended to give the least permissions to a role.
And at last add the following to the outputs.tf
:
# outputs.tf
output "alb_dns_name" {
description = "DNS name of ALB"
value = module.alb.alb_dns_name
}
output "github_actions_role_arn" {
description = "The ARN of the role to be assumed by the GitHub Actions"
value = aws_iam_role.github_actions_role.arn
}
output "ecr_repository_name" {
description = "The name of the ECR Repository"
value = module.ecr.repository_name
}
Now, inside the terraform folder, run the terraform init
command and then run the terraform plan
and check the output to see the resources that will be created, and lastly, you should see this:
...
Plan: 54 to add, 0 to change, 0 to destroy.
If everything looks good go on and run terraform apply
. After all the resources were created, copy the alb_dns_name
value from the outputs and paste it in the browser and... (drum rolls), you get a "503 Service Temporarily Unavailable" error. This is expected as the Task Definition is configured to use the docker image with the "latest" or "1.x.x" tag which does not exist in the ECR yet. That's where GitHub Actions comes into play. Before moving to the next section, copy the github_actions_role_arn
value. We'll need it for the next step.
And that's it for the Infrastructure Provisioning part. Congrats for making it so far!
Automation with GitHub Actions
Now that we have the infrastructure ready, let's create the GitHub Actions workflow and automate the deployment process.
Create the following folder structure .github/workflows
at the root of your project. Inside the workflows
folder create a .yaml
file and name it buildAndDeploy
(or any name you want) and copy the following configuration.
# buildAndDeploy.yaml
name: 'Build and deploy with terraform'
on:
push:
tags:
- '*'
env:
AWS_REGION: eu-west-1 # Change to your region
IAM_ROLE_ARN: arn:aws:iam::xxxxxxxxxxxx:role/github_actions # Change to github action role arn
permissions:
id-token: write # This is required for requesting the JWT
contents: read # This is required for actions/checkout
jobs:
build:
name: Build Docker Image
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1-node16
with:
role-to-assume: ${{ env.IAM_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build, tag, and push image to Amazon ECR
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: app-prod-myproject # namespace-stage-name
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$GITHUB_REF_NAME -t $ECR_REGISTRY/$ECR_REPOSITORY:latest .
docker image push -a $ECR_REGISTRY/$ECR_REPOSITORY
terraform:
name: Terraform Apply
needs: build
runs-on: ubuntu-latest
environment: production
defaults:
run:
shell: bash
steps:
- name: Check out code
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1-node16
with:
role-to-assume: ${{ env.IAM_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.4.1
- name: Terraform Init
working-directory: ./terraform
run: terraform init
- name: Terraform Plan
id: plan
working-directory: ./terraform
run: terraform plan -var="image_tag=$GITHUB_REF_NAME"
continue-on-error: true
- name: Terraform Plan Status
if: steps.plan.outcome == 'failure'
run: exit 1
- name: Terraform Apply
working-directory: ./terraform
run: terraform apply -var="image_tag=$GITHUB_REF_NAME" -auto-approve
I believe the pipeline jobs and steps within are quite self explanatory and I will not go into details. I will mention though the important parts. Replace the AWS_REGION
and IAM_ROLE_ARN
environment values with your AWS region and the github_actions_role_arn
value that you copied earlier from the terraform output.
In the build
job, under the steps
, change the ECR_REPOSITORY
value with the values you set for the namespace
, stage
and name
in the variables.tf
file.
That's it! Commit everything, create a tag and push to GitHub. Now go to your repository in GitHub and click on the "Actions" tab. If everything was setup correctly you should see that the "Build and deploy with terraform" workflow has been started or it's starting.
After the deployment pipeline has finished successfully try again to access the ALB DNS in the browser. You should now see "Hello World!". If you still get a 503
, give it a few minutes. It's possible that the Task did not start yet. Otherwise, for troubleshooting, go the the ECS/Cluster/{cluster name}/Services in the AWS Console and check the followings:
At least 1 Task is in running state
Unhealthy targets in the Target Groups
The "Deployments and events" tab
The logs under the "Logs" tab or in CloudWatch
The application is starting successfully and responds within the 200-399 http code range
Awesome, you made it! I hope it was easy to follow along. I tried to keep it simple and to the point as possible.
If, at this point, you want to delete the AWS resources created by terraform, run terraform destroy
.
Additionally, If you want your website to be accessible under your custom domain, go to your DNS provider, e.g. GoDaddy and create a CNAME Record pointing to the ALB DNS. If your DNS is managed by Route 53 create an Alias (A) record pointing to the ALB DNS.
And finally, you can check the full code example in this repo: deploy-web-app-on-aws-with-terraform-and-github-actions
The following up articles will come soon:
Create an SSL certificate with AWS Certificate Manager and enable HTTPS
Setting up AWS CloudFront CDN for your website.
ECS autoscaling