BC1: CloudOps Architecture
Status: CA-APPROVED (96%) | Date: 2026-03-07 | Business Case: BC1 CloudOps/DevOps/FinOps (xOps)
This architecture was reviewed and approved by the cloud-architect agent at 96% agreement. Open questions are tracked at the bottom of this page.
Stack Overviewโ
| Layer | Technology | TF Module | Status |
|---|---|---|---|
| L1 Identity | IAM Identity Center (SCIM 2.0 + OIDC) | terraform-aws-iam-identity-center | PUBLISHED v1.3.0 |
| L2 Compute | ECS Fargate (Graviton3 ARM64) | terraform-aws-ecs | PUBLISHED v1.1.0 |
| L3 Delivery Edge | ALB + CloudFront + WAFv2 + ACM | terraform-aws-web | RQ5a MVP v1.0.2 |
| L4 Data + AI | Aurora PG16 + OpenSearch + Redis | GAP โ not started | Blocked |
| L5 Application | FastAPI 0.115+ + Claude API | ECS service (express-service sub-module) | Pattern exists |
| L6 AI Interface | Open WebUI 0.6+ | ECS service (express-service sub-module) | Pattern exists |
Layer Decision Tableโ
L1: IAM Identity Centerโ
| Decision | Rationale |
|---|---|
| Selected: IAM Identity Center (AWS-native, free) | SCIM 2.0 + OIDC, multi-account console access, $0/mo |
| Not Cognito | User-pool only โ no SSO federation for multi-account console access |
| Not Auth0/Okta SaaS | Data sovereignty violation (APRA CPS 234 ยง50) |
| ADLC mapping | Settings layer โ SCPs + permission sets = compliance boundaries |
Module: oceansoft/iam-identity-center/aws v1.3.0
L2: ECS Fargate (Graviton3)โ
| Decision | Rationale |
|---|---|
| Selected: ECS Fargate with ARM64 Graviton3 | Serverless containers, ~20% cheaper vs x86, no control plane |
| Not EKS | Control plane overhead, unnecessary complexity for this container workload profile |
| Not Lambda | 15-min timeout, cold starts, no persistent WebSocket support |
| ADLC mapping | Commands & Hooks layer โ ECS task lifecycle hooks |
Module: oceansoft/ecs/aws v1.1.0
L3: Delivery Edgeโ
| Decision | Rationale |
|---|---|
| Selected: ALB + CloudFront + WAFv2 | 450+ PoPs zero-ops, WAF rules at edge, ACM managed TLS |
| Not API Gateway | Not designed for WebSocket streaming (Open WebUI live token streaming) |
| Not Nginx/HAProxy | Operational overhead; CloudFront provides PoP distribution with zero ops |
| ADLC mapping | Settings layer โ WAF rules + ACM certs = compliance boundaries at edge |
Module: oceansoft/web/aws v1.0.2 (RQ5a MVP)
L4: Data + AIโ
| Decision | Rationale |
|---|---|
| Selected: Aurora Serverless v2 PG16 + pgvector + OpenSearch Serverless + ElastiCache Redis 7 | Auto-scale 0.5โ128 ACU, SQL joins for FinOps queries, vector search native |
| Not RDS PostgreSQL | No auto-scale โ Aurora Serverless v2 scales in seconds |
| Not DynamoDB | No pgvector; SQL joins essential for FinOps cost analysis queries |
| Not Pinecone/Weaviate | Data sovereignty violation (APRA CPS 234) |
BC1 production is blocked on three gap modules: terraform-aws-aurora-serverless, terraform-aws-opensearch-serverless, and terraform-aws-elasticache-serverless. Run task sprint:validate to see current gap status.
L5: FastAPI Applicationโ
| Decision | Rationale |
|---|---|
| Selected: FastAPI 0.115+ on ECS Fargate | Native async, Anthropic prompt caching, model-agnostic |
| Not Lambda | Cold starts kill streaming UX |
| Not Django/Flask | No native async support |
Key feature: OPENAI_API_BASE_URL env var toggles between Ollama (local dev) and Claude API (prod) โ no code change required.
L6: Open WebUI 0.6+โ
| Decision | Rationale |
|---|---|
| Selected: Open WebUI 0.6+ (92k+ GitHub stars) | Out-of-box RBAC, MCP pipelines, multi-model routing, pipeline engine |
| Not AG-UI | AG-UI is a protocol/SDK, not a deployable UI. Building a full CloudOps UI with AG-UI adds ~4 sprints with no differentiated value vs Open WebUI's pipeline engine |
| Not Dify | Workflow builder IDE, not an operator chat interface |
| Not LibreChat | No pipeline/MCP-native automation hooks for xOps use cases |
Key feature: The Open WebUI pipeline engine maps directly to ADLC Commands & Hooks in a browser UI. MCP-native via mcpo sidecar.
Docker-compose Local Devโ
Local dev is $0/month โ all services run in containers with Ollama replacing the Claude API.
services:
open-webui:
image: ghcr.io/open-webui/open-webui:latest
ollama:
image: ollama/ollama:latest
postgres:
image: postgres:16-alpine
chromadb:
image: chromadb/chroma:latest
fastapi:
build: ./app
redis:
image: redis:7-alpine
Model swap: set OPENAI_API_BASE_URL=http://ollama:11434/v1 locally; unset for Claude API in non-local environments. No application code changes required.
TF Module Dependency Chainโ
L1: terraform-aws-iam-identity-center (PUBLISHED v1.3.0)
โโโ SCIM 2.0 application assignment for Open WebUI
L2: terraform-aws-ecs (PUBLISHED v1.1.0)
โโโ cluster_arn, task_exec_iam_role_arn
L3: terraform-aws-web (RQ5a MVP v1.0.2)
โโโ ALB target groups โ L2 ECS services
L4: terraform-aws-aurora-serverless (GAP โ not started)
L4: terraform-aws-opensearch-serverless (GAP โ not started)
L4: terraform-aws-elasticache-serverless (GAP โ not started)
L5/L6: ECS express-service sub-module (pattern exists in L2)
โโโ consumes L2 cluster_arn + L3 target group ARNs
Compliance Mapping (APRA CPS 234)โ
| CPS 234 Para | Control | Technology |
|---|---|---|
| 50 | Access control | IAM Identity Center MFA + ABAC |
| 53(a) | MFA enforcement | TOTP/FIDO2 via Identity Center |
| 53(b) | Encryption in transit | TLS 1.3 (ALB policy), CloudFront HTTPS-only |
| 57 | Encryption at rest | Aurora KMS CMK, S3 SSE-KMS |
| 53 | Audit trail | CloudTrail + WAFv2 logs โ S3 |
Cost Modelโ
| Environment | Infra | AI | Total |
|---|---|---|---|
| LOCAL-DEV | $0 | $0 (Ollama) | $0/month |
| TEST/SIT | Variable โ run task plan:cost | Variable | Run task plan:cost |
| PROD | Variable โ run task plan:cost | Variable | Run task plan:cost |
Non-local costs depend on traffic, replica counts, and Claude API usage. Run task plan:cost against the target environment tfvars to get a current Infracost estimate.
Open Questionsโ
- Aurora Serverless v2 module โ who builds first? This is the BC1 production gap blocker.
- OpenSearch Serverless vector collection module โ separate module or composite with Aurora?
- Bedrock Agent vs direct Claude API โ for FinOps pipeline agents, evaluate latency and cost trade-offs at RQ6.