ποΈ Terraform State Management & Design Mindset for Enterprise AWS Multi-Account Landing Zones
When a team of 20 engineers concurrently runs terraform apply across 50 AWS accounts, state management stops being an operational concern and becomes a business risk. State corruption takes hours to diagnose, compliance audits fail when drift goes undetected, and the root cause is almost never the engineers β it is the absence of a principled architecture before the first line of Terraform is written.
This post combines three disciplines that belong together but are rarely addressed as a unified system: the design mindset that prevents state problems from occurring, the S3 native locking strategy that eliminates the DynamoDB tax (ADR-006, saving up to $9,000/year at 50 accounts), and a real production-ready IAM Identity Center module that demonstrates both principles working at enterprise scale in an AWS multi-account Landing Zone.
All code, configuration, and test artifacts referenced here are live in the terraform-aws framework β not theoretical examples, but verified, scored output (97/100 production-readiness) from a running ADLC-governed project.
π₯ Section 1 β The Problem: State at Scaleβ
The Reality of Concurrent Terraform Operationsβ
A single engineer running terraform apply on a laptop is a solved problem. Twenty engineers, five CI pipelines, and three environment tiers running Terraform concurrently across 50 AWS accounts is a distributed systems problem β and most organisations discover this only after their first production incident.
The failure modes are well-documented in the industry. According to practitioner surveys, approximately 60% of Terraform production incidents trace back to state management issues: stale locks, corrupted state files, drift between live infrastructure and recorded state, or provider version skew across environments. The business impact compounds quickly:
| Failure Mode | Typical Discovery | Business Impact |
|---|---|---|
| State corruption | terraform apply fails mid-run, partial resource creation | 2β8 hours of incident response; manual terraform import to rebuild state |
| Stale lock | Second engineer's apply blocks indefinitely | Lost engineering time; requires HITL escalation to force-unlock |
| Configuration drift | Console change not reflected in state | Compliance audit finding; manual reconciliation across hundreds of resources |
| Provider version skew | CI fails; works locally | "Works on my machine" β delayed release; reproducibility failure |
| Cross-account state collision | Two modules share the same S3 key | State overwrites; resource tracking lost for entire account |
The cost of these failures is not just the hours spent debugging. It is the audit findings when a security team cannot produce evidence that a permission set was applied through a controlled change process. It is the FinOps review that shows $40,000 of untagged resources because a corrupted apply created resources without tags. It is the team velocity that drops 30% when engineers distrust their CI pipeline and start running applies manually.
Before addressing recovery, most organisations are paying for the problem twice. The conventional DynamoDB locking pattern adds $5β15 per table per month per account. At 50 accounts, that is $3,000β$9,000 per year in perpetual overhead β for infrastructure that Terraform 1.10+ no longer needs.
The solution is not a better incident runbook. The solution is a principled design approach that makes most of these failure modes structurally impossible.
π§ Section 2 β Design Mindset: Six Principles for Enterprise Cloudβ
A design mindset is not a philosophy. It is a set of structural decisions made before code is written that constrain how the system can fail. In the context of enterprise Terraform, six principles govern every file, directory, and configuration choice in the terraform-aws framework.
The Six Principlesβ
| # | Principle | Definition | Concrete Project Example |
|---|---|---|---|
| 1 | Modularity | Independently versioned, testable units with clear boundaries | modules/sso/ owns its own variables.tf, outputs.tf, locals.tf, data.tf, tests/, examples/, VERSION, and .pre-commit-config.yaml β fully self-contained |
| 2 | Abstraction | Hide implementation complexity behind typed, validated interfaces | variables.tf exposes typed map(object({...})) inputs; locals.tf flattens and transforms them into for_each-ready maps before any resource block sees them |
| 3 | Developer Experience (DX) | Under 5 minutes to onboard; zero host-tool dependencies; single-command pipelines | task ci:quick = validate + lint + legal in under 60 seconds; _exec auto-starts the terraform-aws-dev container; 28 tasks across 8 ADLC phases |
| 4 | Iterative Design | Ship β Test β Learn loops with fast feedback at every stage | 3-tier testing: snapshot (2β3s, $0) β LocalStack (30β60s, $0) β real AWS (5β10 min, ~$5β50); ADR process for every architecture decision (ADR-001 through ADR-007) |
| 5 | Cost-Awareness | FinOps-first β quantify cost before provisioning, eliminate idle spend | FOCUS 1.2+ 4-tier tag taxonomy in global/global_variables.tf; task plan:cost per module; ADR-006 eliminates DynamoDB (~$5β15/table/account) |
| 6 | Separation of Concerns | Each layer and directory owns exactly one responsibility | modules/ = reusable logic; projects/ = account-level compositions; global/ = shared tag conventions; examples/ = documentation-by-example; tests/ = quality assurance |
The Abstraction Layers Diagramβ
The most impactful structural decision in the framework is the strict separation between the Interface Layer (variables.tf), the Transform Layer (locals.tf), and the Resource Layer (main.tf). This is not stylistic β it is an enforced data flow contract.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β examples/ and projects/ β
β βββββββββββββββββββββββββ β
β Compositions (user-facing entry points) β
β Consume modules, bind variables, wire outputs to downstream β
β e.g. examples/create-users-and-groups/main.tf β
β projects/sso/main.tf β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β variables.tf β
β βββββββββββββ β
β Interface Layer β typed, validated inputs β
β map(object({...})) with optional() and validation{} blocks β
β e.g. var.sso_users (21 fields), var.permission_sets (type = any) β
β var.account_assignments (principal + permission + account) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β locals.tf β
β βββββββββ β
β Transform Layer β business logic, no cloud API calls β
β flatten(), for expressions, format() key generation β
β e.g. flatten_user_data, users_and_their_groups β
β principals_and_their_account_assignments β
β pset_aws_managed_policy_maps β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββ€
β main.tf β
β βββββββ β
β Resource Layer β cloud API calls only (no logic) β
β aws_ssoadmin_permission_set, aws_identitystore_user β
β aws_ssoadmin_account_assignment, aws_ssoadmin_application β
β Iterates locals.*, never variables.* directly β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β data.tf β
β βββββββ β
β Data Source Layer β read-only lookups, no mutation β
β aws_ssoadmin_instances (SSO instance ARN + store ID) β
β aws_organizations_organization (account ID resolution) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β outputs.tf β
β ββββββββββ β
β Output Layer β typed contract for downstream modules β
β Exposes ARNs, IDs, maps for composition β
β Enables Registry publication and module chaining (ADR-007) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
locals.tf is the only layer permitted to transform var.* inputs into structures consumed by main.tf. Resources never reference var.* directly. This enforces a strict, auditable data flow:
variables β locals β resources β outputs
This rule is not enforced by Terraform itself β it is enforced by the team's discipline and code review. When broken, debugging becomes exponentially harder because business logic bleeds into resource declarations.
π Section 3 β State Management Strategy: ADR-006β
Why DynamoDB Locking Is the Wrong Default in 2026β
DynamoDB state locking was a reasonable solution when it was introduced. In 2026, with Terraform 1.10+ supporting S3 Conditional Writes natively, it is an unnecessary dependency that adds cost, operational complexity, and IAM surface area to every account in your organisation.
ADR-006 documents the decision to eliminate DynamoDB locking entirely across the terraform-aws framework.
DynamoDB vs S3 Native Locking: 11-Factor Comparisonβ
- S3 Native (ADR-006)
- DynamoDB (Legacy)
bucket = "ams-terraform-org-state"
use_lockfile = true
encrypt = true
bucket = "ams-terraform-org-state"
dynamodb_table = "terraform-lock"
encrypt = true
| Evaluation Factor | DynamoDB Locking (Legacy) | S3 Native Locking (ADR-006, Selected) |
|---|---|---|
| Terraform version required | Any | >= 1.10 (S3 Conditional Writes API) |
| Monthly cost per account | ~$5β15 (table + read/write units) | $0 β included in existing S3 API pricing |
| Setup complexity | 3 steps: create table + add IAM policy + dynamodb_table = in backend | 1 step: use_lockfile = true in backend config |
| Additional IAM permissions | dynamodb:GetItem, PutItem, DeleteItem, DescribeTable | None β uses existing S3 IAM principal |
| Lock mechanism | DynamoDB conditional writes (atomic PutItem with condition expression) | S3 If-None-Match: * header (Conditional Writes, atomic) |
| Lock file artifact | None β lock state held in-memory and DynamoDB row | .terraform.lock.hcl stored in S3 alongside state |
| Cross-account architecture | DynamoDB table per account OR single shared table (complex cross-account IAM) | Single S3 bucket, key-path isolation β simpler |
| Failure mode (stale lock) | Stale row in DynamoDB β terraform force-unlock <LOCK_ID> | Stale lock object in S3 β same terraform force-unlock |
| State bucket requirement | S3 (state) + DynamoDB (lock) β two services to manage | S3 only β single service boundary |
| Terraform 2.0 direction | Deprecated β will be removed | Native, documented recommended path |
| Decision | REJECTED β cost, operational complexity, legacy trajectory | SELECTED β zero cost, simpler IAM, future-proof |
π° Cost Savings at Scaleβ
The financial case for migration is unambiguous. The savings grow linearly with organisational scale:
| Environment Count | DynamoDB Annual Cost | S3 Native Annual Cost | Annual Saving |
|---|---|---|---|
| 5 accounts | ~$300β900 | $0 | ~$300β900 |
| 20 accounts | ~$1,200β3,600 | $0 | ~$1,200β3,600 |
| 50 accounts | ~$3,000β9,000 | $0 | ~$3,000β9,000 |
The migration effort is one configuration line change per module. The payback period is immediate.
Backend Configuration (Source of Truth)β
backend.hcl.example is the single configuration artifact that bootstraps state for any module in any account:
# Multi-Org S3 Backend Configuration
# Usage: terraform init -backend-config=backend.hcl
#
# Copy and customize per account:
# cp backend.hcl.example backend.hcl
# # Edit values below, then:
# terraform init -backend-config=backend.hcl
bucket = "ams-terraform-org-state"
region = "ap-southeast-2"
use_lockfile = true
# Path pattern: tf-org-aws/<account-id>/<module>/terraform.tfstate
# Example for identity-center in account 123456789012:
# key = "tf-org-aws/123456789012/identity-center/terraform.tfstate"
key = "tf-org-aws/<ACCOUNT_ID>/<MODULE_NAME>/terraform.tfstate"
# State bucket lives in management account
# Ensure cross-account access policy exists on this bucket
encrypt = true
# Initialise for a new account and module
cp backend.hcl.example backend.hcl
sed -i 's/<ACCOUNT_ID>/123456789012/' backend.hcl
sed -i 's/<MODULE_NAME>/identity-center/' backend.hcl
terraform init -backend-config=backend.hcl
# Verify state location
terraform state list
use_lockfile = trueβ the single line that replaces DynamoDB everywhere; requires Terraform >= 1.10region = "ap-southeast-2"β data sovereignty; all state remains in the Sydney region (APRA CPS 234 alignment)encrypt = trueβ SSE-S3 minimum; SSE-KMS recommended for regulated workloadskeypattern β hierarchical isolation: org β account β module; prevents any key collision
Team Collaboration Safeguardsβ
Eight layers of defence prevent the most common team collaboration failures:
| Safeguard | Implementation | Failure Mode Prevented |
|---|---|---|
| S3 Conditional Writes | use_lockfile = true in backend.hcl.example | Two engineers run terraform apply simultaneously β second blocks, not corrupts |
| Provider lock file in VCS | .terraform.lock.hcl committed to repo and in tests/snapshot/ | Provider silently upgrades between CI runs β pinned hashes prevent version skew |
| Encryption at rest | encrypt = true β SSE-S3 minimum | State contains ARNs, IDs, and sensitive values β encrypted at rest |
| RBAC via IAM roles | Per-account IAM roles scoped to s3:GetObject/PutObject on tf-org-aws/<account-id>/* | Engineers from account A cannot read or write state belonging to account B |
| S3 Bucket Versioning | Enabled on ams-terraform-org-state bucket | Accidental terraform state rm β recover prior state version |
| MFA Delete | Configured on state bucket (management account) | Malicious or accidental permanent deletion of state objects |
| Cross-account bucket policy | s3:PutObject allowed from member account IAM roles only with specific key prefix | Member accounts cannot read each other's state |
| Lock file verify gate | task build:lock-verify β blocks PR if .terraform.lock.hcl missing per module | Provider lock file absent β enforces committed lock before merge |
| Automated provider upgrade | task build:lock-upgrade + provider-upgrade.yml weekly PR | Provider drift goes undetected for weeks β automated weekly detection + PR |
ποΈ Section 4 β Multi-Account State Isolationβ
S3 Key Hierarchy: Five-Account Exampleβ
The key path pattern tf-org-aws/<account-id>/<module>/terraform.tfstate provides deterministic, collision-free isolation across the entire AWS organisation. A five-account hierarchy looks like this:
s3://ams-terraform-org-state/
βββ tf-org-aws/
βββ 111111111111/ # Management account (AWS Organizations root)
β βββ identity-center/
β β βββ terraform.tfstate # SSO: users, groups, permission sets
β β βββ terraform.tfstate.tflock # S3 native lock file (when held)
β βββ organizations/
β βββ terraform.tfstate # AWS Organizations structure
β
βββ 222222222222/ # Security account (audit, GuardDuty, Config)
β βββ guardduty/
β βββ terraform.tfstate
β
βββ 333333333333/ # Operations account (shared services)
β βββ ecs-platform/
β β βββ terraform.tfstate
β βββ networking/
β βββ terraform.tfstate
β
βββ 444444444444/ # Sandbox / development account
β βββ ecs-platform/
β βββ terraform.tfstate
β
βββ 335083429030/ # State bucket owner account
βββ state-bucket-bootstrap/
βββ terraform.tfstate
Isolation Guarantee Matrixβ
Five independent isolation layers mean that no single failure can corrupt state across account boundaries:
| Isolation Layer | Mechanism | Guarantee |
|---|---|---|
| Organisation-level | S3 bucket prefix tf-org-aws/ | All state contained within a single auditable bucket |
| Account-level | Key prefix tf-org-aws/<account-id>/ | State from account A cannot overwrite account B |
| Module-level | Sub-prefix <module>/terraform.tfstate | Identity Center state never collides with ECS or networking state |
| Concurrency | use_lockfile = true β S3 If-None-Match Conditional Write | Two simultaneous terraform apply operations β one waits or fails, never corrupts |
| Data residency | region = ap-southeast-2 | State does not leave the Sydney region (APRA CPS 234 Para 15 data sovereignty) |
π‘οΈ Section 5 β Production-Ready Module: IAM Identity Centerβ
How Design Mindset and State Management Convergeβ
The modules/sso/ module is where both disciplines become concrete. It is not a demonstration module β it is a production-ready, scored implementation (97/100, rising to 99/100 after Q5 legal compliance cleanup) that manages the full lifecycle of AWS SSO identity governance in a multi-account Landing Zone.
| Metric | Value |
|---|---|
| Production-readiness score | 97/100 (pre-cleanup) β 99/100 (post-cleanup) |
| Resource blocks | 17 covering users, groups, permission sets, account assignments, applications |
| Test files | 8 scenario-based .tftest.hcl files + cross-domain snapshot tests |
| Example configurations | 8 covering all major use cases |
| Outputs | 10 typed outputs (ARNs, IDs, names) for downstream module chaining |
| Upstream | aws-ia/terraform-aws-sso v1.0.4 (Apache-2.0) |
The global_variables.tf Tag Convention Layerβ
Before a single resource is declared, the framework establishes a mandatory tag contract in global/global_variables.tf. This file is not imported by modules β Terraform does not support inter-module variable sharing β but it documents the shared convention that every composition (examples/, projects/) is expected to apply:
# Copyright 2026 [email protected] (oceansoft.io). Licensed under Apache-2.0.
# Global conventions for terraform-aws module library (KISS/LEAN)
#
# Tag Taxonomy (4-tier):
# Tier 1 β Mandatory: Project, Environment, Owner, CostCenter, ManagedBy
# Tier 2 β FinOps: ServiceName, ServiceCategory (FOCUS 1.2+)
# Tier 3 β Compliance: DataClassification, Compliance (APRA CPS 234)
# Tier 4 β Ops: Automation, BackupPolicy, GitRepo
variable "common_tags" {
description = "Tags applied to all resources β 4-tier taxonomy for FOCUS 1.2+ FinOps and APRA CPS 234 compliance"
type = map(string)
default = {
# Tier 1 β Mandatory (enforced by AWS Organizations Tag Policy)
Project = "terraform-aws"
Environment = "dev"
Owner = "[email protected]"
CostCenter = "platform"
ManagedBy = "Terraform"
# Tier 2 β FinOps (FOCUS 1.2+ dimension mapping)
# ServiceName and ServiceCategory set per-module in locals.tf merge
# Tier 3 β Compliance (APRA CPS 234 Para 15)
DataClassification = "internal"
Compliance = "none"
# Tier 4 β Operational
Automation = "true"
BackupPolicy = "default"
GitRepo = "terraform-aws"
}
}
Per-Module Tag Merge Patternβ
Each module's locals.tf merges the global tag baseline with module-specific Tier 2 FinOps dimensions. This is the Separation of Concerns principle in action: the global layer owns the mandatory baseline; the module layer owns the service classification:
locals {
module_tags = merge(var.common_tags, {
ServiceName = "IAM Identity Center"
ServiceCategory = "Security"
})
}
Resources then reference local.module_tags β never var.common_tags directly. The merge happens exactly once, in locals.tf, consistent with the strict data flow rule.
The locals.tf Transform Layer in Practiceβ
The IAM Identity Center module's locals.tf demonstrates why the Transform Layer is indispensable. Managing user-to-group membership across an enterprise SSO configuration involves deeply nested data structures. The locals block flattens them into for_each-ready maps that resource blocks can iterate cleanly:
locals {
# Flatten nested user β group membership into a flat map
flatten_user_data = flatten([
for this_user in keys(var.sso_users) : [
for group in var.sso_users[this_user].group_membership : {
user_name = var.sso_users[this_user].user_name
group_name = group
}
]
])
# Build for_each-ready map with composite key
users_and_their_groups = {
for s in local.flatten_user_data :
format("%s_%s", s.user_name, s.group_name) => s
}
}
In main.tf, the aws_identitystore_group_membership resource iterates local.users_and_their_groups β no business logic, no variable references, pure iteration. This is what makes the resource layer auditable: every resource block is a declarative statement of intent, not a procedural algorithm.
Wrapper Pattern (ADR-007): Consume, Don't Copyβ
The framework follows the Wrapper Pattern for upstream module consumption. Consumers reference the module via source and override only what they need β they never copy-paste the module internals. This is ADR-007:
module "iam_identity_center" {
source = "nnthanh101/terraform-aws/aws//modules/sso"
sso_users = local.sso_users
sso_groups = local.sso_groups
permission_sets = local.permission_sets
account_assignments = local.account_assignments
tags = local.module_tags
}
When the upstream module releases a patch, the consumer updates a version constraint and runs terraform init. No internal files to merge, no logic to reconcile. This is the contract that makes modular Terraform sustainable at scale.
Test Coverage: 8 Scenariosβ
The module ships with 8 .tftest.hcl test files covering every major deployment scenario. Tests run in Tier 1 (snapshot, zero credentials, 2β3 seconds) before any cloud API is called:
- Core Scenarios
- Integration Scenarios
- Advanced Scenarios
| Test File | Scenario |
|---|---|
01_mandatory.tftest.hcl | Minimal valid configuration |
02_existing_users_and_groups.tftest.hcl | Import and manage pre-existing SSO entities |
03_inline_policy.tftest.hcl | Inline policy attachment to permission sets |
| Test File | Scenario |
|---|---|
04_google_workspace.tftest.hcl | Google Workspace SAML federation integration |
05_create_apps_and_assignments.tftest.hcl | OIDC/SAML application creation and assignment |
06_existing_user_groups_and_apps.tftest.hcl | Mixed existing and new entity management |
| Test File | Scenario |
|---|---|
07_instance_access_control_attributes.tftest.hcl | ABAC attribute configuration |
08_create-users-and-groups-with-customer-managed-policies.tftest.hcl | Customer-managed policy attachments |
β‘ Section 6 β Container-First Developer Experienceβ
Why "Works on My Machine" Is a Deployment Anti-Patternβ
The DX principle from Section 2 is made concrete through container-first execution. Every tool-dependent task β terraform fmt, tflint, checkov, trivy, infracost β runs inside the terraform-aws-dev container. No brew install, no pip install, no version negotiation.
The _exec helper in Taskfile.yml abstracts the execution context so that every task command works identically whether the engineer is on a MacBook, a Linux workstation, or a GitHub Actions runner:
if [ -f /.dockerenv ]; then
eval '{{.CMD}}' # Already inside container
elif docker exec terraform-aws-dev echo "ok" >/dev/null 2>&1; then
docker exec -w /workspace terraform-aws-dev bash -c '{{.CMD}}' # Container running
else
docker compose up -d devcontainer # Auto-start container
docker exec -w /workspace terraform-aws-dev bash -c '{{.CMD}}'
fi
18 Tools, Zero Host Installsβ
The nnthanh101/terraform:2.6.0 image bundles 18 pre-installed, pinned tools across all CI categories:
| Category | Tools |
|---|---|
| IaC | terraform (>= 1.11.0), terragrunt, terraform-docs |
| Linting | tflint, checkov, trivy, tfsec |
| Formatting | terraform fmt, pre-commit |
| Testing | go (Terratest), terraform test (native) |
| Cost | infracost |
| Security | checkov, trivy, tfsec |
| Utilities | task, git, jq, yq, aws-cli |
Developer Machine Docker Container (nnthanh101/terraform:2.6.0)
ββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββ
β β β β
β task ci:quick βββ_exec()βββΆβ terraform fmt -check β
β task build:validate βββ_exec()βββΆβ terraform validate β
β task build:lint βββ_exec()βββΆβ tflint (all modules) β
β task build:lint βββ_exec()βββΆβ checkov (security scan) β
β task test:tier1 βββ_exec()βββΆβ terraform test -verbose (snapshot) β
β task build:lock βββ_exec()βββΆβ terraform providers lock (4 platforms) β
β task plan:cost ββββββββββββΆβ infracost breakdown β
β β β β
ββββ βββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββ
| |
| bare-metal: Docker + Task only | 18 tools pre-installed + pinned
| zero brew/pip installs | deterministic across all environments
The result is an onboarding time under 5 minutes: docker compose up -d devcontainer && task ci:quick. A new team member has a validated, lint-clean, test-passing environment before their first PR.
π‘οΈ Section 7 β Drift Detection & Recoveryβ
Prevention Is the Primary Strategyβ
Drift management follows a strict hierarchy: prevent first, detect second, recover only as a last resort. Most recovery scenarios are expensive β in time, in confidence, and occasionally in data. Prevention and detection together eliminate the majority of recovery work.
Recovery Scenario Matrixβ
When prevention fails, the recovery path must be deterministic. The following table maps each failure scenario to a specific recovery procedure and realistic RTO:
Recovery Scenario Matrix (6 scenarios with RTO estimates)
| # | Scenario | Detection Signal | Recovery Steps | RTO |
|---|---|---|---|---|
| 1 | State locked (stale) | Error: state locked on terraform plan | 1. Verify no concurrent op running. 2. terraform force-unlock <LOCK_ID>. 3. Retry terraform plan. | < 5 min |
| 2 | State corrupted | Error: couldn't decode state on init | 1. List S3 object versions for state key. 2. Retrieve prior version. 3. Upload good version. 4. Re-run terraform plan. | 15β30 min |
| 3 | Drift detected (console change) | terraform plan shows unexpected diff | Option A (enforce): terraform apply to reconcile. Option B (import): terraform import if resource created outside Terraform. | 10β20 min |
| 4 | State lost entirely | Error: no state found | 1. Check S3 versioning for soft-deleted object. 2. If unrecoverable: terraform import each resource incrementally. | 60β240 min (HIGH) |
| 5 | Cross-account state conflict | Two modules share the same key path | 1. Verify key pattern uniqueness. 2. Rename conflicting key via aws s3 mv. 3. terraform init -reconfigure. | 20β40 min |
| 6 | Provider version drift in CI | CI fails: required providers not satisfied | 1. task build:lock locally (4 platforms). 2. Commit .terraform.lock.hcl. 3. Re-run CI. | < 10 min |
State loss without S3 versioning enabled can require a complete terraform import rebuild β mapping every live cloud resource back to Terraform addresses. At 50 accounts with hundreds of resources each, this is a multi-day engineering effort. Enable S3 versioning and MFA Delete on your state bucket before your first terraform apply.
β Section 8 β Anti-Patterns and Resultsβ
Six Anti-Patterns to Eliminateβ
Anti-Pattern Reference Table (6 patterns with guards)
| Anti-Pattern | Symptom | Root Cause | Fix | Guard |
|---|---|---|---|---|
| Copy-paste modules | Duplicated aws_ssoadmin_* blocks across stacks | No wrapper pattern discipline | source = "nnthanh101/terraform-aws/aws//modules/sso" β inherit, don't copy | ADR-007 |
| Bare-metal tool installs | CI fails because tool version differs from local | No container discipline | All tool tasks run via _exec inside terraform-aws-dev | _exec in Taskfile.yml |
var.* in resource blocks | for_each = var.sso_users in resource directly | Skipping locals transform layer | Transform in locals.tf first: for_each = local.users_and_their_groups | Code review + locals contract |
| DynamoDB state locking | dynamodb_table = "terraform-lock" in backend | Legacy pattern, pre-Terraform 1.10 | Replace with use_lockfile = true in backend.hcl | ADR-006; grep -ri dynamodb *.hcl |
| Completion claims without evidence | "Done" with no tmp/ artifacts | NATO (No Action, Talk Only) | Every completion claim requires tmp/terraform-aws/ evidence path | task monitor:verify |
Missing .terraform.lock.hcl in VCS | Provider silently upgrades between CI runs | Lock file in .gitignore | Commit .terraform.lock.hcl; task build:lock-verify blocks PR if missing; task build:lock-upgrade automates upgrade; provider-upgrade.yml detects drift weekly | task build:lock-verify + provider-upgrade.yml |
Quantified Resultsβ
The combination of design mindset, S3 native locking, container-first DX, and the production-ready IAM Identity Center module delivers measurable outcomes:
| Outcome | Metric | Mechanism |
|---|---|---|
| Cost savings | $3,000β$9,000/year at 50 accounts | DynamoDB elimination (ADR-006) |
| Onboarding time | Under 5 minutes | _exec container + task ci:quick |
| CI pipeline time | Under 60 seconds for task ci:quick | Containerised parallelism |
| Snapshot test speed | 2β3 seconds per test run | Native terraform test (zero credentials) |
| Module production-readiness | 97β99/100 scored | 8 test scenarios, 8 examples, 17 resource types |
| State corruption incidents | Zero (architectural prevention) | S3 Conditional Writes + key-path isolation |
| Provider skew incidents | Zero (lock file gated by CI) | task build:lock-verify PR gate |
| Compliance readiness | APRA CPS 234 Para 15/36/37 traceable | 4-tier tag taxonomy + data residency enforcement |
π Call to Actionβ
The terraform-aws framework is available as a reference implementation demonstrating all patterns described in this post:
- Terraform: >= 1.11.0 | AWS Provider: >= 6.28, < 7.0
- Primary region: ap-southeast-2 | Identity Center: us-east-1
- State bucket:
ams-terraform-org-statewith S3 native locking, versioning, and MFA Delete
Start with these three files and the framework will constrain failure modes before your first apply:
backend.hcl.exampleβ copy, setACCOUNT_IDandMODULE_NAME, runterraform initglobal/global_variables.tfβ adopt the 4-tier tag taxonomy as your organisation's standardmodules/sso/β consume via wrapper pattern for SSO identity governance
For the tagging strategy that underpins the common_tags convention used throughout the framework, see the companion post: Enterprise AWS Tagging Strategy: 4-Tier Taxonomy for FinOps & APRA CPS 234 Compliance.
CloudOps Engineering β OceanSoft Corporation | ap-southeast-2
