[smoke-fail] staging gameplay smoke fails (schema mismatch after failed deploy) #130

Closed
opened 2026-02-28 12:01:40 +01:00 by manager-bot · 4 comments
Member

Automatisk oprettet ifm. issue #90 release readiness.

Fejl under smoke (infra/staging/smoke_suite.sh):
django.db.utils.OperationalError: no such column: fupogfakta_player.session_token

Kontekst:

  • healthz var OK
  • deploy af main fejlede først på migration med readonly SQLite
  • gameplay-smoke fejler derefter pga schema mismatch mellem kode og DB
Automatisk oprettet ifm. issue #90 release readiness. Fejl under smoke (`infra/staging/smoke_suite.sh`): `django.db.utils.OperationalError: no such column: fupogfakta_player.session_token` Kontekst: - healthz var OK - deploy af main fejlede først på migration med readonly SQLite - gameplay-smoke fejler derefter pga schema mismatch mellem kode og DB
manager-bot added the need-to-havesmoke-failstaging labels 2026-02-28 12:01:40 +01:00
Owner

Downstream mitigation landed via PR #135.

Context:

  • #131 root cause included shipping a tracked db.sqlite3 in source archives, which could leave stale/root-owned SQLite state in staging during deploy.
  • PR #135 removes tracked db.sqlite3, reducing schema drift risk after deploy and aligning with MySQL-only staging expectations.

Artifacts:

After merge, staging deploy + smoke should be re-run to verify the schema mismatch symptom is gone end-to-end.

Downstream mitigation landed via PR #135. Context: - #131 root cause included shipping a tracked `db.sqlite3` in source archives, which could leave stale/root-owned SQLite state in staging during deploy. - PR #135 removes tracked `db.sqlite3`, reducing schema drift risk after deploy and aligning with MySQL-only staging expectations. Artifacts: - Commit: `12fc12f955618b895f14e2109e24409206bc2fd6` - PR: #135 After merge, staging deploy + smoke should be re-run to verify the schema mismatch symptom is gone end-to-end.
Owner

Follow-up fix pushed for #130 (also supports #90):

  • Added schema consistency pre-check in infra/staging/smoke_suite.sh:
    • manage.py migrate --check --noinput
    • fails with explicit schema drift message before gameplay smoke
  • Updated lobby.management.commands.smoke_staging to send session_token on lie/guess submit (matches current API contract)
  • Added regression test SmokeStagingCommandTests.test_smoke_staging_command_runs_full_flow

Checks run locally:

  • python3 -m py_compile lobby/management/commands/smoke_staging.py lobby/tests.py
  • bash -n infra/staging/smoke_suite.sh
  • bash -n infra/staging/deploy_staging.sh
  • .venv/bin/python manage.py test lobby.tests.SmokeStagingCommandTests lobby.tests.LieSubmissionTests.test_submit_lie_requires_session_token -v 2 (PASS)

PR: #137
Commit: abb656d

Follow-up fix pushed for #130 (also supports #90): - Added schema consistency pre-check in `infra/staging/smoke_suite.sh`: - `manage.py migrate --check --noinput` - fails with explicit `schema drift` message before gameplay smoke - Updated `lobby.management.commands.smoke_staging` to send `session_token` on lie/guess submit (matches current API contract) - Added regression test `SmokeStagingCommandTests.test_smoke_staging_command_runs_full_flow` Checks run locally: - `python3 -m py_compile lobby/management/commands/smoke_staging.py lobby/tests.py` - `bash -n infra/staging/smoke_suite.sh` - `bash -n infra/staging/deploy_staging.sh` - `.venv/bin/python manage.py test lobby.tests.SmokeStagingCommandTests lobby.tests.LieSubmissionTests.test_submit_lie_requires_session_token -v 2` (PASS) PR: https://gitea.weircon.dk/wpp/weirsoe-party-protocol/pulls/137 Commit: `abb656d`
Author
Member

DevOps drift verification run (MANAGE) completed on staging CT143.

Result: manual remediation PASS, automation still FAIL-prone.

What happened:

  1. ./infra/staging/deploy_staging.sh main failed with django.db.utils.OperationalError: unable to open database file.
  2. Read-only verification showed:
    • service uses DATABASE_URL=sqlite:////opt/wpp-staging/app/db.sqlite3 (/etc/wpp/staging.env)
    • /opt/wpp-staging/app was owned by root:root after deploy copy step
    • db.sqlite3 missing, and service runs as user wpp
  3. Manual drift remediation on CT143:
    • chown -R wpp:wpp /opt/wpp-staging/app
    • manage.py migrate --noinput
    • restart service
  4. Smoke rerun passed: infra/staging/smoke_suite.sh -> healthz OK + gameplay flow OK.

Conclusion:

  • #130 symptom is mitigated on current staging instance.
  • Root cause now appears to be deploy ownership drift (copy as root without post-copy ownership fix), which can re-break next deploy unless fixed in deploy automation.
DevOps drift verification run (MANAGE) completed on staging CT143. Result: **manual remediation PASS, automation still FAIL-prone**. What happened: 1) `./infra/staging/deploy_staging.sh main` failed with `django.db.utils.OperationalError: unable to open database file`. 2) Read-only verification showed: - service uses `DATABASE_URL=sqlite:////opt/wpp-staging/app/db.sqlite3` (`/etc/wpp/staging.env`) - `/opt/wpp-staging/app` was owned by `root:root` after deploy copy step - `db.sqlite3` missing, and service runs as user `wpp` 3) Manual drift remediation on CT143: - `chown -R wpp:wpp /opt/wpp-staging/app` - `manage.py migrate --noinput` - restart service 4) Smoke rerun passed: `infra/staging/smoke_suite.sh` -> healthz OK + gameplay flow OK. Conclusion: - #130 symptom is mitigated on current staging instance. - Root cause now appears to be deploy ownership drift (copy as root without post-copy ownership fix), which can re-break next deploy unless fixed in deploy automation.

Resolved by merged PR #140 (env bootstrap in staging smoke suite). Closing this blocker; release-readiness tracking continues in #90.

Resolved by merged PR #140 (env bootstrap in staging smoke suite). Closing this blocker; release-readiness tracking continues in #90.
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: wpp/weirsoe-party-protocol#130