Files
goddard bfd6771a9a
admin-web / build (push) Successful in 22s
backend / test (push) Failing after 52s
mistpipe / test (push) Successful in 10s
admin-web / build-and-push (push) Failing after 5s
backend / build-and-push (push) Has been skipped
Initial Mist scaffold
Successor to the Josh Steam prototypes. Single-VM Docker Compose stack with
the load-bearing core/ logic ported from JoshSteam CDN with bug fixes.

Contents:
- backend/  FastAPI + Celery (same image, two entrypoints)
            core/  hdiff, librsync, chain_replay, manifest, compression,
                   discord, steam, unrealpak, paths
            api/   auth, catalog, admin, builds (skeletons) + downloads (real)
            worker/  Celery factory replacing the missing prototype Tasks/__init__.py
            db/    SQLAlchemy models + Alembic initial migration
- admin-web/  SvelteKit + Tailwind skeleton
- client/    Tauri 2 + Svelte skeleton (Mist placeholder UI)
- mistpipe/  click-based admin CLI with subcommand stubs
- docs/      ARCHITECTURE, DECISIONS (9 ADRs), RUNBOOK
- docker-compose.yml + dev overlay + .github/workflows

Bugs fixed during port:
- Routes/download.py:2 stray backslash on import line
- Utils/celery.py inspect.reserved() missing parens + double active() typo
- Hardcoded OneDrive/Desktop paths replaced with pydantic-settings config
- Discord webhook URL + RabbitMQ password moved to env vars
- Missing Tasks/__init__.py reconstructed as worker/__init__.py

Out of scope for this commit: route bodies, UI screens, mistpipe subcommand
bodies, real image builds.
2026-06-07 19:39:25 -04:00

4.2 KiB

Mist — Operational Runbook

A short, dense reference for "what do I do when X happens." Fill in as we hit real situations.

Backend VM access

SSH: ssh mist@<vm-tailnet-name> (or <vm-tailnet-ip>). Compose lives at: /opt/mist/ (TODO: confirm during deploy). All docker compose commands run from that directory.

Normal operations

Deploy a new image

CI pushes images to GHCR on merge to main. To pull and restart:

cd /opt/mist
docker compose pull
docker compose up -d
docker compose ps

View logs

docker compose logs -f api
docker compose logs -f worker
docker compose logs --tail=200 api worker

Restart the stack

cd /opt/mist
docker compose down
docker compose up -d

Restart a single service

docker compose restart worker

Run a Celery task manually (debugging)

docker compose exec api python -c "from mist.worker.tasks import generate_direct_update; generate_direct_update.delay('Satisfactory', '1.0.0.0', '1.0.0.1')"

Failure scenarios

NAS is unreachable

Symptoms: worker tasks fail with FileNotFoundError for /mnt/nas/..., API /downloads/* returns 404 for non-cached files.

Action:

  1. Verify NAS reachability from the VM: ls /mnt/nas/mist/games/
  2. If empty/error, NFS mount is broken. Check mount: mount | grep nas
  3. Remount: sudo mount -a (assuming /etc/fstab has the entry)
  4. If still broken, log into NAS, verify it's serving NFS
  5. Stack will recover automatically once NAS is back; in-flight jobs will retry per Celery config

Postgres won't start

Symptoms: api container restarts in a loop, logs show connection refused to postgres.

Action:

  1. docker compose logs postgres — look for the actual error
  2. Common cause: out of disk space. df -h on the VM.
  3. If corrupted volume: stop stack, restore from last pg_dump (see "Restore from backup")

Worker queue is backed up

Symptoms: Builds take forever, RabbitMQ UI (http://<vm>:15672/) shows growing queue depth.

Action:

  1. Check worker logs for stuck tasks
  2. Scale workers: edit docker-compose.yml, set worker.deploy.replicas: 2, docker compose up -d
  3. If a specific task is hanging, purge it: docker compose exec worker celery -A mist.worker purge

Cache disk is full

Symptoms: Build jobs fail with OSError: no space left on device.

Action:

  1. df -h to confirm
  2. docker compose exec api python -m mist.core.paths --clear-cache (TODO: implement this maintenance task)
  3. Or manually: stop stack, rm -rf /var/lib/docker/volumes/mist_cache-vol/_data/*, restart

Stack won't come back up after VM reboot

Symptoms: SSH in after reboot, docker compose ps shows nothing or services are Exited.

Action:

  1. Verify Docker daemon: systemctl status docker
  2. cd /opt/mist && docker compose up -d
  3. If still failing, check restart: unless-stopped is set on all services in docker-compose.yml

Backups

What we back up

  • Postgres (full dump) — daily
  • Mist/.env (passwords, secrets) — versioned outside this repo
  • docker-compose.yml and any host-level config — in git

What we DON'T back up here

  • Game files on NAS — NAS has its own backup story (assumed RAID + remote replication)
  • Hot cache — regenerable from NAS

Take a Postgres backup

docker compose exec -T postgres pg_dump -U mist mist | zstd > /mnt/nas/mist/backups/pg-$(date +%F).sql.zst

Restore from a Postgres backup

docker compose stop api worker
zstd -d < /mnt/nas/mist/backups/pg-YYYY-MM-DD.sql.zst | docker compose exec -T postgres psql -U mist mist
docker compose start api worker

Provisioning a new friend account

(Until the admin portal supports this end-to-end.)

docker compose exec api python -m mist.scripts.create_user <username> <password> [--admin]

(TODO: implement that script.)

Resetting your admin password

docker compose exec api python -m mist.scripts.reset_password <username> <new-password>

(TODO: implement that script.)

Health checks (manual)

curl -s https://api.mist.example/healthz   # expect {"ok": true}
curl -s https://api.mist.example/readyz    # expect 200 if DB/Redis/RabbitMQ all reachable