Highlights
What we build in L64:
A production multi-stage Dockerfile that shrinks the VAIA agent image to ~180 MB while keeping all Gemini, ChromaDB, and FastAPI dependencies intact.
A Docker Compose stack that runs the full local dev environment (agent + Redis + ChromaDB) with a single command, sharing the same service contracts the CI tests from L63 verify.
A GitHub Actions CD workflow that chains L63’s CI jobs → docker build → Trivy CVE scan → registry push → kubectl apply, with hard failure gates at each stage.
Kubernetes Deployment, Service, ConfigMap, and HPA manifests that form the serving target for L65’s locust load tests.
Connection to L63: The CD pipeline imports
ci.ymlas a prerequisite job. Docker build only triggers after L63’s pytest suite — tool schema checks, prompt robustness tests, agent logic assertions — passes clean. The sameGEMINI_API_KEYandCHROMA_HOSTvalues verified in CI become Kubernetes Secrets consumed by the Deployment.
Enables L65: The containerized pod exposes /health, /metrics (Prometheus), and /v1/agent/infer at a fixed port, giving L65’s FastAPI serving layer a stable, load-testable surface. The HPA stub in k8s/hpa.yaml is intentionally left at a low CPU threshold so L65’s locust traffic immediately triggers scale-out.
Architecture Context
L64 sits at the hinge point of Module 5. L61–L63 built the code quality and testing machinery; L64 packages the agent as an immutable, scannable artifact and installs the delivery rails. L65–L67 assume a running, containerized agent and focus entirely on serving characteristics — throughput, latency, and autoscaling.
The architecture follows an immutable-image model: every commit produces a uniquely tagged image (sha-<git_sha> + semantic version on release). No environment-specific code lives inside the image; instead, environment differences are injected at runtime via ConfigMap (non-sensitive env vars) and Secret (API keys). This pattern eliminates “works on my machine” failures — the same image bit-for-bit runs locally via Compose, in staging, and in prod.
The Kubernetes layer is intentionally minimal at this stage: one Deployment with two replicas, a ClusterIP Service, and an Ingress stub. L65 upgrades the HPA and adds load-balanced serving strategies on top of this base.


