Skip to main content

BC1: CloudOps Architecture

Status: CA-APPROVED (96%) | Date: 2026-03-07 | Business Case: BC1 CloudOps/DevOps/FinOps (xOps)

Architecture Approval

This architecture was reviewed and approved by the cloud-architect agent at 96% agreement. Open questions are tracked at the bottom of this page.

Stack Overviewโ€‹

LayerTechnologyTF ModuleStatus
L1 IdentityIAM Identity Center (SCIM 2.0 + OIDC)terraform-aws-iam-identity-centerPUBLISHED v1.3.0
L2 ComputeECS Fargate (Graviton3 ARM64)terraform-aws-ecsPUBLISHED v1.1.0
L3 Delivery EdgeALB + CloudFront + WAFv2 + ACMterraform-aws-webRQ5a MVP v1.0.2
L4 Data + AIAurora PG16 + OpenSearch + RedisGAP โ€” not startedBlocked
L5 ApplicationFastAPI 0.115+ + Claude APIECS service (express-service sub-module)Pattern exists
L6 AI InterfaceOpen WebUI 0.6+ECS service (express-service sub-module)Pattern exists

Layer Decision Tableโ€‹

L1: IAM Identity Centerโ€‹

DecisionRationale
Selected: IAM Identity Center (AWS-native, free)SCIM 2.0 + OIDC, multi-account console access, $0/mo
Not CognitoUser-pool only โ€” no SSO federation for multi-account console access
Not Auth0/Okta SaaSData sovereignty violation (APRA CPS 234 ยง50)
ADLC mappingSettings layer โ€” SCPs + permission sets = compliance boundaries

Module: oceansoft/iam-identity-center/aws v1.3.0

L2: ECS Fargate (Graviton3)โ€‹

DecisionRationale
Selected: ECS Fargate with ARM64 Graviton3Serverless containers, ~20% cheaper vs x86, no control plane
Not EKSControl plane overhead, unnecessary complexity for this container workload profile
Not Lambda15-min timeout, cold starts, no persistent WebSocket support
ADLC mappingCommands & Hooks layer โ€” ECS task lifecycle hooks

Module: oceansoft/ecs/aws v1.1.0

L3: Delivery Edgeโ€‹

DecisionRationale
Selected: ALB + CloudFront + WAFv2450+ PoPs zero-ops, WAF rules at edge, ACM managed TLS
Not API GatewayNot designed for WebSocket streaming (Open WebUI live token streaming)
Not Nginx/HAProxyOperational overhead; CloudFront provides PoP distribution with zero ops
ADLC mappingSettings layer โ€” WAF rules + ACM certs = compliance boundaries at edge

Module: oceansoft/web/aws v1.0.2 (RQ5a MVP)

L4: Data + AIโ€‹

DecisionRationale
Selected: Aurora Serverless v2 PG16 + pgvector + OpenSearch Serverless + ElastiCache Redis 7Auto-scale 0.5โ€“128 ACU, SQL joins for FinOps queries, vector search native
Not RDS PostgreSQLNo auto-scale โ€” Aurora Serverless v2 scales in seconds
Not DynamoDBNo pgvector; SQL joins essential for FinOps cost analysis queries
Not Pinecone/WeaviateData sovereignty violation (APRA CPS 234)
Gap Blocker

BC1 production is blocked on three gap modules: terraform-aws-aurora-serverless, terraform-aws-opensearch-serverless, and terraform-aws-elasticache-serverless. Run task sprint:validate to see current gap status.

L5: FastAPI Applicationโ€‹

DecisionRationale
Selected: FastAPI 0.115+ on ECS FargateNative async, Anthropic prompt caching, model-agnostic
Not LambdaCold starts kill streaming UX
Not Django/FlaskNo native async support

Key feature: OPENAI_API_BASE_URL env var toggles between Ollama (local dev) and Claude API (prod) โ€” no code change required.

L6: Open WebUI 0.6+โ€‹

DecisionRationale
Selected: Open WebUI 0.6+ (92k+ GitHub stars)Out-of-box RBAC, MCP pipelines, multi-model routing, pipeline engine
Not AG-UIAG-UI is a protocol/SDK, not a deployable UI. Building a full CloudOps UI with AG-UI adds ~4 sprints with no differentiated value vs Open WebUI's pipeline engine
Not DifyWorkflow builder IDE, not an operator chat interface
Not LibreChatNo pipeline/MCP-native automation hooks for xOps use cases

Key feature: The Open WebUI pipeline engine maps directly to ADLC Commands & Hooks in a browser UI. MCP-native via mcpo sidecar.

Docker-compose Local Devโ€‹

Local dev is $0/month โ€” all services run in containers with Ollama replacing the Claude API.

services:
open-webui:
image: ghcr.io/open-webui/open-webui:latest

ollama:
image: ollama/ollama:latest

postgres:
image: postgres:16-alpine

chromadb:
image: chromadb/chroma:latest

fastapi:
build: ./app

redis:
image: redis:7-alpine

Model swap: set OPENAI_API_BASE_URL=http://ollama:11434/v1 locally; unset for Claude API in non-local environments. No application code changes required.

TF Module Dependency Chainโ€‹

L1: terraform-aws-iam-identity-center (PUBLISHED v1.3.0)
โ””โ”€โ”€ SCIM 2.0 application assignment for Open WebUI

L2: terraform-aws-ecs (PUBLISHED v1.1.0)
โ””โ”€โ”€ cluster_arn, task_exec_iam_role_arn

L3: terraform-aws-web (RQ5a MVP v1.0.2)
โ””โ”€โ”€ ALB target groups โ†’ L2 ECS services

L4: terraform-aws-aurora-serverless (GAP โ€” not started)
L4: terraform-aws-opensearch-serverless (GAP โ€” not started)
L4: terraform-aws-elasticache-serverless (GAP โ€” not started)

L5/L6: ECS express-service sub-module (pattern exists in L2)
โ””โ”€โ”€ consumes L2 cluster_arn + L3 target group ARNs

Compliance Mapping (APRA CPS 234)โ€‹

CPS 234 ParaControlTechnology
50Access controlIAM Identity Center MFA + ABAC
53(a)MFA enforcementTOTP/FIDO2 via Identity Center
53(b)Encryption in transitTLS 1.3 (ALB policy), CloudFront HTTPS-only
57Encryption at restAurora KMS CMK, S3 SSE-KMS
53Audit trailCloudTrail + WAFv2 logs โ†’ S3

Cost Modelโ€‹

EnvironmentInfraAITotal
LOCAL-DEV$0$0 (Ollama)$0/month
TEST/SITVariable โ€” run task plan:costVariableRun task plan:cost
PRODVariable โ€” run task plan:costVariableRun task plan:cost
Cost Estimation

Non-local costs depend on traffic, replica counts, and Claude API usage. Run task plan:cost against the target environment tfvars to get a current Infracost estimate.

Open Questionsโ€‹

  1. Aurora Serverless v2 module โ€” who builds first? This is the BC1 production gap blocker.
  2. OpenSearch Serverless vector collection module โ€” separate module or composite with Aurora?
  3. Bedrock Agent vs direct Claude API โ€” for FinOps pipeline agents, evaluate latency and cost trade-offs at RQ6.