Skip to main content

Terraform State Backend & S3 Bucket

This guide explains the structure of a Terraform S3 state backend bucket, including the use of workspaces, key prefixes, and buckets. It details how the backend.tf.json file is used to configure the S3 backend for storing Terraform state, and how DynamoDB is used for state locking and consistency checking. The document provides examples and best practices for managing and accessing the Terraform state backend.

Terraform State Locking via Amazon S3

Using Terraform 1.10.1 with built in state locking via S3, eliminating the need for DynamoDB for state management.

1 Enable S3 Object Lock

When creating your Terraform state bucket, enable Object Lock (note: this is irreversible after activation).

📚 Locking objects with Object Lock Guide

2 Modify Terraform Backend Configuration

terraform {
backend "s3" {
bucket = "tf-state-bucket"
key = "state/terraform.tfstate"
region = "us-east-1"
encrypt = true
use_lockfile = true
}
}

3 Test in Dev/Staging

Deploy the updated configuration in a non-production environment to validate the changes.

4 Migrate Production

Once confident, phase out DynamoDB and rely solely on S3 Object Lock.

5 Monitor & Optimize

Use AWS CloudTrail and Terraform logs to monitor state lock behavior.

The backend.tf.json File

This file is programmatically generated by Semaphore using all the capabilities of Stacks to deep merge. Every component defines a backend.tf.json, which is what distinguishes it as a root module (as opposed to a terraform child module). The backend tells terraform where to access the last known deployed state of infrastructure for the given component. Since the backend is stored in S3, it’s easily accessed by in a distributed manner by anyone running terraform.

info

An identical backend.tf.json file is used by all environments (stacks). Environments are selected using the terraform workspace command, which happens automatically when using Taskfile together with the --stackargument.

For reference, this is the anatomy of the backend configuration: (note this is just a JSON representation of HCL)

{
"terraform": {
"backend": {
"s3": {
"acl": "bucket-owner-full-control",
"bucket": "acme-ue2-root-tfstate",
"dynamodb_table": "acme-ue2-root-tfstate-lock",
"encrypt": true,
"key": "terraform.tfstate",
"profile": "acme-gbl-root-terraform",
"region": "us-east-2",
"workspace_key_prefix": "vpc"
}
}
}
}
note

Either profile or role_arn can be used here

S3 Backend

The S3 bucket is created in the cold start using the tfstate-backend component provisioned in the root account.

The state format is s3://{bucket_name}/{component}/{stack}/terraform.tfstate

  • The bucket name format is {namespace}-{optional tenant}-{environment}-{stage}-tfstate

  • We deploy this bucket in the root account so here are some example bucket names

acme-ue2-root-tfstate (without tenant) acme-mgmt-ue2-root-tfstate (with tenant: mgmt)

  • The component name provided is used as the terraform state’s workspace_key_prefix in each component’s backend.tf.json. Therefore, this will be the first s3 key after the bucket name.

  • The stack is where the component is provisioned and the name of the workspace created

  • Finally, the terraform.tfstate is the key provided in each component’s backend.tf.json

The terraform commands run by Taskfile for the backend s3://acme-ue2-root-tfstate/vpc/ue2-prod/terraform.tfstate

task terraform deploy vpc --stack ue2-prod
| task will create the input variables from the YAML and run the following commands
| -- terraform init
| -- terraform workspace ue2-prod
| -- terraform plan
| -- terraform apply

To better visualize what’s going on, we recommend running the commands below to explore your own state bucket. Make sure to use the correct profile for your organization (acme-gbl-root-admin is just a placeholder).

Find the bucket. It should contain tfstate in its name. In the example below, we can see the vpc component is deployed to use2-auto, use2-corp, use2-dev, use2-qa, use2-sbx01, use2-staging. As you can see, the workspace is constructed as the {environment}-{stage}. This setting is defined in the task.yaml config with the stacks.name_pattern setting (see Semaphore for all settings).

$ aws --profile acme-gbl-root-admin \
s3 ls --recursive
...
2021-11-01 19:53:48 120926 vpc/use2-auto/terraform.tfstate # workspace key prefix: vpc, workspace name is `use2-auto`
2021-11-01 19:49:12 123604 vpc/use2-corp/terraform.tfstate
2021-11-01 19:50:18 123486 vpc/use2-dev/terraform.tfstate
2021-11-01 19:48:39 123354 vpc/use2-qa/terraform.tfstate
2021-11-01 19:49:46 123735 vpc/use2-sbx01/terraform.tfstate
2021-11-01 19:50:50 124014 vpc/use2-staging/terraform.tfstate

See where all the VPC components contain state

aws --profile acme-gbl-root-admin \
s3 ls s3://{bucket_name}/vpc/
note

If a component is mistakenly deployed somewhere and destroyed, a leftover terraform.tfstate file will be present on your local filesystem with a small file size so while this is a good way to search for backends, it's not the best way to determine where a component is deployed. Also, the S3 bucket has versioning enabled, ensuring we can always (manually) revert to a previous state if need be.

DynamoDB Locking

Find the table. It should contain tfstate-lock in its name.

aws --profile acme-gbl-root-admin \
dynamodb list-tables

Get a LockID

aws --profile acme-gbl-root-admin \
dynamodb get-item \
--table-name {table_name} \
--key '{"LockID": {"S": "{bucket_name}/{component}/{stack}/terraform.tfstate-md5"}}'

References