MVP need-to-have: release readiness (staging deploy + smoke sign-off) #90

Closed
opened 2026-02-28 04:28:40 +01:00 by architecture-bot · 17 comments

Baggrund:

  • Fase 3 gameplay + MVP UI need-to-have guards er nu leveret/lukket i boardet.
  • Næste kritiske skridt inden release er at gøre driftsspor eksplicit i issue-flowet.

Scope (inden for eksisterende core-praksis, ikke nyt produktscope):

  1. DevOps-runner kører staging deploy på seneste main (efter auth-preflight).
  2. Test-runner kører smoke-suite (healthz + kerne-gameplay flow).
  3. Resultat dokumenteres kort i issue (PASS/FAIL + links).
  4. Ved FAIL oprettes konkrete fejlissues med smoke-fail + need-to-have.
  5. Ved PASS markeres MVP release-kandidat klar til PO-go/no-go.

Acceptkriterier:

  • Staging deploy gennemført og dokumenteret
  • Smoke-test resultat dokumenteret
  • Eventuelle fejl har egne issues
  • #16 opdateret med status

Relateret: #16 #17

Baggrund: - Fase 3 gameplay + MVP UI need-to-have guards er nu leveret/lukket i boardet. - Næste kritiske skridt inden release er at gøre driftsspor eksplicit i issue-flowet. Scope (inden for eksisterende core-praksis, ikke nyt produktscope): 1. DevOps-runner kører staging deploy på seneste main (efter auth-preflight). 2. Test-runner kører smoke-suite (`healthz` + kerne-gameplay flow). 3. Resultat dokumenteres kort i issue (PASS/FAIL + links). 4. Ved FAIL oprettes konkrete fejlissues med `smoke-fail` + `need-to-have`. 5. Ved PASS markeres MVP release-kandidat klar til PO-go/no-go. Acceptkriterier: - Staging deploy gennemført og dokumenteret - Smoke-test resultat dokumenteret - Eventuelle fejl har egne issues - #16 opdateret med status Relateret: #16 #17
architecture-bot added the bot-taskneed-to-havearchitectdevopsstaging labels 2026-02-28 04:28:40 +01:00
Owner

Scheduler sweep (scope #16/#17):

  • DEV auth-preflight: OK
  • Åben PR-check: #108 har APPROVED review + grønne required checks; ingen åbne review-fund at fixe.
  • Næste need-to-have i scope er #90 (release readiness).

Plan iht. flow (Udv -> branch -> PR -> review -> merge):

  1. Integrator merger #108 til main (delete source branch).
  2. DevOps/Test runner kører #90: staging deploy -> smoke -> dokumentér PASS/FAIL i #90.
  3. Eventuelle smoke-fejl oprettes som nye need-to-have issues (inkl. UI-opgave hvis fundet i smoke).
Scheduler sweep (scope #16/#17): - DEV auth-preflight: OK - Åben PR-check: #108 har APPROVED review + grønne required checks; ingen åbne review-fund at fixe. - Næste need-to-have i scope er #90 (release readiness). Plan iht. flow (Udv -> branch -> PR -> review -> merge): 1) Integrator merger #108 til main (delete source branch). 2) DevOps/Test runner kører #90: staging deploy -> smoke -> dokumentér PASS/FAIL i #90. 3) Eventuelle smoke-fejl oprettes som nye need-to-have issues (inkl. UI-opgave hvis fundet i smoke).
Member

Release readiness run for issue #90 (2026-02-28 11:00 UTC)

Resultat: FAIL (blocker)

Kørt i rækkefølge:

  1. wpp-mode getactive (ingen stop nødvendig)
  2. Auth preflight MANAGE (ok=true)
  3. Staging deploy af seneste main (branch SHA: 850a364251a523521e030485136894d88310a059)
    • Kommando: ./infra/staging/deploy_staging.sh main
    • Fejl: django.db.utils.OperationalError: attempt to write a readonly database
  4. Smoke med REVIEW-role/preflight /
    • REVIEW preflight: ok=true
    • Healthz: OK ({"ok": true, "service": "weirsoe-party-protocol"})
    • Core gameplay smoke (infra/staging/smoke_suite.sh)
    • Fejl: django.db.utils.OperationalError: no such column: fupogfakta_player.session_token

Ekstra verifikation i CT 143:

  • systemctl is-active wpp-staging.serviceactive
  • /opt/wpp-staging/app/db.sqlite3 ownership/perms: root:root 664 (app dir root:root 755), mens migrate køres som wpp.

Oprettede opfølgningsissues:

  • #130 [smoke-fail] staging gameplay smoke fails (schema mismatch after failed deploy) (labels: smoke-fail, need-to-have, staging)
  • #131 [staging-blocker] deploy_staging.sh main fails: readonly SQLite during migrate (labels: need-to-have, staging, devops)

Ingen PR approval/merge udført.

Release readiness run for issue #90 (2026-02-28 11:00 UTC) **Resultat: FAIL (blocker)** Kørt i rækkefølge: 1. `wpp-mode get` → `active` ✅ (ingen stop nødvendig) 2. Auth preflight MANAGE ✅ (`ok=true`) 3. Staging deploy af seneste `main` (branch SHA: `850a364251a523521e030485136894d88310a059`) ❌ - Kommando: `./infra/staging/deploy_staging.sh main` - Fejl: `django.db.utils.OperationalError: attempt to write a readonly database` 4. Smoke med REVIEW-role/preflight ✅/❌ - REVIEW preflight: `ok=true` ✅ - Healthz: OK (`{"ok": true, "service": "weirsoe-party-protocol"}`) ✅ - Core gameplay smoke (`infra/staging/smoke_suite.sh`) ❌ - Fejl: `django.db.utils.OperationalError: no such column: fupogfakta_player.session_token` Ekstra verifikation i CT 143: - `systemctl is-active wpp-staging.service` → `active` - `/opt/wpp-staging/app/db.sqlite3` ownership/perms: `root:root 664` (app dir `root:root 755`), mens migrate køres som `wpp`. Oprettede opfølgningsissues: - #130 `[smoke-fail] staging gameplay smoke fails (schema mismatch after failed deploy)` (labels: `smoke-fail`, `need-to-have`, `staging`) - #131 `[staging-blocker] deploy_staging.sh main fails: readonly SQLite during migrate` (labels: `need-to-have`, `staging`, `devops`) Ingen PR approval/merge udført.
Member

Release-readiness rerun (MANAGE) for #90 on main SHA 30e3f1c77f9468b235efe12a213025f89e5d0924.

Status: PARTIAL / BLOCKED for stable readiness

  • Auth preflight MANAGE: OK
  • Deploy command ./infra/staging/deploy_staging.sh main: FAIL (unable to open database file)
  • Drift checks identified ownership mismatch (/opt/wpp-staging/app owned by root:root, service user wpp, sqlite db path under app dir).
  • Manual remediation on staging instance succeeded (chown + migrate + restart).
  • Smoke after remediation: PASS (healthz + gameplay flow).

Assessment:

  • Current staging is green after manual fix.
  • Release-readiness is not yet robust because standard deploy path still fails without manual intervention.
  • No release-tag created in this run.
Release-readiness rerun (MANAGE) for #90 on `main` SHA `30e3f1c77f9468b235efe12a213025f89e5d0924`. Status: **PARTIAL / BLOCKED for stable readiness** - Auth preflight MANAGE: OK - Deploy command `./infra/staging/deploy_staging.sh main`: **FAIL** (`unable to open database file`) - Drift checks identified ownership mismatch (`/opt/wpp-staging/app` owned by `root:root`, service user `wpp`, sqlite db path under app dir). - Manual remediation on staging instance succeeded (chown + migrate + restart). - Smoke after remediation: PASS (`healthz` + gameplay flow). Assessment: - Current staging is green after manual fix. - Release-readiness is not yet robust because standard deploy path still fails without manual intervention. - No release-tag created in this run.
Member

Follow-up: opened #138 to track permanent deploy automation fix (ownership drift in staging deploy path). Keeping #90 as blocker until deploy+smoke pass without manual remediation.

Follow-up: opened **#138** to track permanent deploy automation fix (ownership drift in staging deploy path). Keeping #90 as blocker until deploy+smoke pass without manual remediation.
Owner

Smoke test PASS (staging)

Kørt efter ny staging deploy-verifikation i dag (manager update ~16:03 UTC).
Udført i CT143: infra/staging/smoke_suite.sh

  • healthz: ok
  • migrate --check: ok (ingen schema drift)
  • gameplay flow (manage.py smoke_staging): ok
Smoke test PASS (staging) ✅ Kørt efter ny staging deploy-verifikation i dag (manager update ~16:03 UTC). Udført i CT143: `infra/staging/smoke_suite.sh` - healthz: ok - migrate --check: ok (ingen schema drift) - gameplay flow (`manage.py smoke_staging`): ok
Member

Update med lidt sass:

Ja, staging insisterede på at være dramatisk og falde tilbage til SQLite 😏 — men nu er den tæmmet.

Status nu (verificeret):

  • DB engine i staging = django.db.backends.mysql
  • Service restartet og stabil
  • Smoke-suite kører grønt (healthz, migrate --check, gameplay flow)

Vigtig nuance: dette blev løst med manuel intervention i miljøet (env + runtime justeringer), så vi skal stadig have det permanentgjort i automation/deploy-flow, så vi ikke får en sequel i næste deploy.

Update med lidt sass: Ja, staging *insisterede* på at være dramatisk og falde tilbage til SQLite 😏 — men nu er den tæmmet. Status nu (verificeret): - DB engine i staging = `django.db.backends.mysql` - Service restartet og stabil - Smoke-suite kører grønt (`healthz`, `migrate --check`, gameplay flow) Vigtig nuance: dette blev løst med **manuel intervention** i miljøet (env + runtime justeringer), så vi skal stadig have det permanentgjort i automation/deploy-flow, så vi ikke får en sequel i næste deploy.
Member

DevOps re-verify (UTC 2026-02-28) for release-readiness on main.

Artifact

  • Git commit (main): ab08303fc35a7145465339d8b1cd149a50a65b7f
  • Deployed archive SHA256 (/opt/wpp-staging/releases/app.tar.gz): dec9a71293820e33fb69e6a01d9875f5a729316d7b3ecfb8d18cfd35791574af

Executed commands (standard path + smoke)

  1. ./infra/staging/deploy_staging.sh
    • Result: FAIL (exit 7)
    • Failing line: curl: (7) Failed to connect to 127.0.0.1 port 8000
  2. Verification right after failure:
    • ssh proxmox-lan "sudo -n pct status 143"running
    • ssh proxmox-lan "sudo -n pct exec 143 -- systemctl is-active wpp-staging.service"active
    • ssh proxmox-lan "sudo -n pct exec 143 -- curl -fsS http://127.0.0.1:8000/healthz"{"ok": true, "service": "weirsoe-party-protocol"}
  3. Smoke:
    • ssh proxmox-lan "sudo -n pct exec 143 -- bash -lc 'cd /opt/wpp-staging/app && ./infra/staging/smoke_suite.sh'"
    • Result: PASS ([smoke] OK, Smoke flow OK ...)

Conclusion

  • Smoke suite passes on staging for the deployed artifact.
  • Release-readiness remains BLOCKED because canonical deploy command currently exits non-zero due immediate post-restart health check race.

Concrete next step

  • Fix infra/staging/deploy_staging.sh health check to tolerate service warm-up (e.g. bounded retry/backoff) and re-run this exact verification flow.
DevOps re-verify (UTC 2026-02-28) for release-readiness on `main`. **Artifact** - Git commit (main): `ab08303fc35a7145465339d8b1cd149a50a65b7f` - Deployed archive SHA256 (`/opt/wpp-staging/releases/app.tar.gz`): `dec9a71293820e33fb69e6a01d9875f5a729316d7b3ecfb8d18cfd35791574af` **Executed commands (standard path + smoke)** 1. `./infra/staging/deploy_staging.sh` - Result: **FAIL** (exit 7) - Failing line: `curl: (7) Failed to connect to 127.0.0.1 port 8000` 2. Verification right after failure: - `ssh proxmox-lan "sudo -n pct status 143"` → `running` - `ssh proxmox-lan "sudo -n pct exec 143 -- systemctl is-active wpp-staging.service"` → `active` - `ssh proxmox-lan "sudo -n pct exec 143 -- curl -fsS http://127.0.0.1:8000/healthz"` → `{"ok": true, "service": "weirsoe-party-protocol"}` 3. Smoke: - `ssh proxmox-lan "sudo -n pct exec 143 -- bash -lc 'cd /opt/wpp-staging/app && ./infra/staging/smoke_suite.sh'"` - Result: **PASS** (`[smoke] OK`, `Smoke flow OK ...`) **Conclusion** - Smoke suite passes on staging for the deployed artifact. - **Release-readiness remains BLOCKED** because canonical deploy command currently exits non-zero due immediate post-restart health check race. **Concrete next step** - Fix `infra/staging/deploy_staging.sh` health check to tolerate service warm-up (e.g. bounded retry/backoff) and re-run this exact verification flow.
Owner

Scheduler update: release-readiness blocker fra seneste verify er nu formaliseret som #141 (health-check race i deploy_staging.sh). Dev-runner er retasket til at levere commit+PR artifact for fix. #90 holdes åben indtil canonical deploy + smoke passerer uden manuel intervention.

Scheduler update: release-readiness blocker fra seneste verify er nu formaliseret som **#141** (health-check race i `deploy_staging.sh`). Dev-runner er retasket til at levere commit+PR artifact for fix. #90 holdes åben indtil canonical deploy + smoke passerer uden manuel intervention.
Owner

Release-readiness update: blocker-fix for deploy health-check race er nu i PR #142 (SHA c4ea5ca20853b2ee6752013881578e03abfb3970). #90 forbliver åben indtil PR er reviewed/merged og canonical deploy_staging.sh + smoke er verificeret grønt uden manuel intervention.

Release-readiness update: blocker-fix for deploy health-check race er nu i PR #142 (SHA `c4ea5ca20853b2ee6752013881578e03abfb3970`). #90 forbliver åben indtil PR er reviewed/merged og canonical `deploy_staging.sh` + smoke er verificeret grønt uden manuel intervention.
Owner

Smoke test PASS (staging) \n\nKørt 2026-02-28 17:40 UTC efter ny deploy-verifikation på release-readiness-sporet.\nKommando (CT143): cd /opt/wpp-staging/app && ISSUE_ON_FAIL=0 ./infra/staging/smoke_suite.sh\nResultat: healthz OK, migrate --check --noinput OK, manage.py smoke_staging OK.

Smoke test PASS (staging) ✅\n\nKørt 2026-02-28 17:40 UTC efter ny deploy-verifikation på release-readiness-sporet.\nKommando (CT143): `cd /opt/wpp-staging/app && ISSUE_ON_FAIL=0 ./infra/staging/smoke_suite.sh`\nResultat: healthz OK, `migrate --check --noinput` OK, `manage.py smoke_staging` OK.
Owner

Scheduler update: PR #142 er nu merged til main (merge SHA 6fd57d1714ee97c2f855e215e5df0728490df19a) og source branch er slettet. Næste release-readiness step for #90 er canonical staging verify: ./infra/staging/deploy_staging.sh + smoke-suite, uden manuel intervention.

Scheduler update: PR #142 er nu merged til `main` (merge SHA `6fd57d1714ee97c2f855e215e5df0728490df19a`) og source branch er slettet. Næste release-readiness step for #90 er canonical staging verify: `./infra/staging/deploy_staging.sh` + smoke-suite, uden manuel intervention.
Owner

Canonical staging verify rerun (dev-runner) on main after PR #142 merge.

  • UTC: 2026-02-28 17:51 UTC
  • Target SHA: 6fd57d1714ee97c2f855e215e5df0728490df19a
  • Deploy command: ./infra/staging/deploy_staging.sh
  • Smoke command (CT143): cd /opt/wpp-staging/app && ISSUE_ON_FAIL=0 ./infra/staging/smoke_suite.sh

Evidence:

[deploy] host=proxmox-lan CT_ID=143 REF=main
[deploy] loaded staging env: /opt/wpp-staging/.env.staging
DB_ENGINE=django.db.backends.mysql
Running migrations:
  No migrations to apply.
[deploy] health check passed: http://127.0.0.1:8000/healthz
[deploy] OK: staging deploy complete for CT 143 (main)

[smoke] healthz check: http://127.0.0.1:8000/healthz
[smoke] migration consistency check
[smoke] gameplay flow via management command
Smoke flow OK for session 95VX5K
[smoke] OK

Result: PASS for canonical deploy+smoke gate on current main. #90 is ready to close from dev side.

Canonical staging verify rerun (dev-runner) on `main` after PR #142 merge. - UTC: 2026-02-28 17:51 UTC - Target SHA: `6fd57d1714ee97c2f855e215e5df0728490df19a` - Deploy command: `./infra/staging/deploy_staging.sh` - Smoke command (CT143): `cd /opt/wpp-staging/app && ISSUE_ON_FAIL=0 ./infra/staging/smoke_suite.sh` Evidence: ```text [deploy] host=proxmox-lan CT_ID=143 REF=main [deploy] loaded staging env: /opt/wpp-staging/.env.staging DB_ENGINE=django.db.backends.mysql Running migrations: No migrations to apply. [deploy] health check passed: http://127.0.0.1:8000/healthz [deploy] OK: staging deploy complete for CT 143 (main) [smoke] healthz check: http://127.0.0.1:8000/healthz [smoke] migration consistency check [smoke] gameplay flow via management command Smoke flow OK for session 95VX5K [smoke] OK ``` Result: PASS for canonical deploy+smoke gate on current `main`. #90 is ready to close from dev side.
Owner

Scheduler closure: canonical release-readiness gate is now satisfied on main (deploy + smoke PASS at SHA 6fd57d1714ee97c2f855e215e5df0728490df19a, see latest evidence above).\n\nClosing #90 and treating MVP release-readiness as complete. Next step stays with PO/architect go-no-go in the release window.

Scheduler closure: canonical release-readiness gate is now satisfied on `main` (deploy + smoke PASS at SHA `6fd57d1714ee97c2f855e215e5df0728490df19a`, see latest evidence above).\n\nClosing #90 and treating MVP release-readiness as complete. Next step stays with PO/architect go-no-go in the release window.
Owner

Smoke test PASS (staging)

Kørt 2026-02-28 18:39 UTC efter ny canonical deploy-verifikation på #90.
Kommando (CT143): cd /opt/wpp-staging/app && ISSUE_ON_FAIL=0 ./infra/staging/smoke_suite.sh
Resultat: healthz OK, migrate --check --noinput OK, manage.py smoke_staging OK.
Artifact: session NRT3TG.

Smoke test PASS (staging) ✅ Kørt 2026-02-28 18:39 UTC efter ny canonical deploy-verifikation på #90. Kommando (CT143): `cd /opt/wpp-staging/app && ISSUE_ON_FAIL=0 ./infra/staging/smoke_suite.sh` Resultat: healthz OK, `migrate --check --noinput` OK, `manage.py smoke_staging` OK. Artifact: session `NRT3TG`.

Staging smoke PASS — 2026-03-15 11:15 UTC

  • healthz: OK
  • migration consistency (manage.py migrate --check --noinput): OK
  • gameplay smoke (manage.py smoke_staging): OK
  • staging app mtime observed before run: 2026-02-28T17:50:33Z

Ingen smoke-fail issue oprettet.

Staging smoke PASS — 2026-03-15 11:15 UTC - healthz: OK - migration consistency (`manage.py migrate --check --noinput`): OK - gameplay smoke (`manage.py smoke_staging`): OK - staging app mtime observed before run: 2026-02-28T17:50:33Z Ingen `smoke-fail` issue oprettet.

Staging smoke pass 2026-03-16 03:06 UTC run: /healthz OK, migrate --check OK, gameplay smoke_staging OK (session P2UFQN). Ny staging deploy observeret på /opt/wpp-staging/app mtime 2026-03-15 19:16:39 UTC, så dette dækker den seneste deploy.

Staging smoke pass 2026-03-16 03:06 UTC run: /healthz OK, migrate --check OK, gameplay smoke_staging OK (session P2UFQN). Ny staging deploy observeret på /opt/wpp-staging/app mtime 2026-03-15 19:16:39 UTC, så dette dækker den seneste deploy.

Staging smoke PASS — 2026-03-17 20:04 UTC

  • Ny staging deploy observeret siden sidste test-run (/opt/wpp-staging/app mtime: 2026-03-17 06:54:48 UTC).
  • /healthz: OK
  • manage.py migrate --check --noinput: OK
  • manage.py smoke_staging: OK (session XE95EF)
  • Artifact: /opt/wpp-staging/app/artifacts/smoke/smoke-20260317T200420Z.json

Ingen smoke-fail issue oprettet.

Staging smoke PASS — 2026-03-17 20:04 UTC - Ny staging deploy observeret siden sidste test-run (`/opt/wpp-staging/app` mtime: `2026-03-17 06:54:48 UTC`). - `/healthz`: OK - `manage.py migrate --check --noinput`: OK - `manage.py smoke_staging`: OK (`session XE95EF`) - Artifact: `/opt/wpp-staging/app/artifacts/smoke/smoke-20260317T200420Z.json` Ingen `smoke-fail` issue oprettet.
Sign in to join this conversation.
5 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: wpp/weirsoe-party-protocol#90