Initial Mist scaffold
admin-web / build (push) Successful in 22s
backend / test (push) Failing after 52s
mistpipe / test (push) Successful in 10s
admin-web / build-and-push (push) Failing after 5s
backend / build-and-push (push) Has been skipped

Successor to the Josh Steam prototypes. Single-VM Docker Compose stack with
the load-bearing core/ logic ported from JoshSteam CDN with bug fixes.

Contents:
- backend/  FastAPI + Celery (same image, two entrypoints)
            core/  hdiff, librsync, chain_replay, manifest, compression,
                   discord, steam, unrealpak, paths
            api/   auth, catalog, admin, builds (skeletons) + downloads (real)
            worker/  Celery factory replacing the missing prototype Tasks/__init__.py
            db/    SQLAlchemy models + Alembic initial migration
- admin-web/  SvelteKit + Tailwind skeleton
- client/    Tauri 2 + Svelte skeleton (Mist placeholder UI)
- mistpipe/  click-based admin CLI with subcommand stubs
- docs/      ARCHITECTURE, DECISIONS (9 ADRs), RUNBOOK
- docker-compose.yml + dev overlay + .github/workflows

Bugs fixed during port:
- Routes/download.py:2 stray backslash on import line
- Utils/celery.py inspect.reserved() missing parens + double active() typo
- Hardcoded OneDrive/Desktop paths replaced with pydantic-settings config
- Discord webhook URL + RabbitMQ password moved to env vars
- Missing Tasks/__init__.py reconstructed as worker/__init__.py

Out of scope for this commit: route bodies, UI screens, mistpipe subcommand
bodies, real image builds.
This commit is contained in:
2026-06-07 19:39:25 -04:00
commit bfd6771a9a
76 changed files with 3890 additions and 0 deletions
+151
View File
@@ -0,0 +1,151 @@
# Mist — Operational Runbook
A short, dense reference for "what do I do when X happens." Fill in as we hit real situations.
## Backend VM access
SSH: `ssh mist@<vm-tailnet-name>` (or `<vm-tailnet-ip>`).
Compose lives at: `/opt/mist/` (TODO: confirm during deploy).
All `docker compose` commands run from that directory.
## Normal operations
### Deploy a new image
CI pushes images to GHCR on merge to `main`. To pull and restart:
```sh
cd /opt/mist
docker compose pull
docker compose up -d
docker compose ps
```
### View logs
```sh
docker compose logs -f api
docker compose logs -f worker
docker compose logs --tail=200 api worker
```
### Restart the stack
```sh
cd /opt/mist
docker compose down
docker compose up -d
```
### Restart a single service
```sh
docker compose restart worker
```
### Run a Celery task manually (debugging)
```sh
docker compose exec api python -c "from mist.worker.tasks import generate_direct_update; generate_direct_update.delay('Satisfactory', '1.0.0.0', '1.0.0.1')"
```
## Failure scenarios
### NAS is unreachable
**Symptoms:** worker tasks fail with `FileNotFoundError` for `/mnt/nas/...`, API `/downloads/*` returns 404 for non-cached files.
**Action:**
1. Verify NAS reachability from the VM: `ls /mnt/nas/mist/games/`
2. If empty/error, NFS mount is broken. Check mount: `mount | grep nas`
3. Remount: `sudo mount -a` (assuming `/etc/fstab` has the entry)
4. If still broken, log into NAS, verify it's serving NFS
5. Stack will recover automatically once NAS is back; in-flight jobs will retry per Celery config
### Postgres won't start
**Symptoms:** `api` container restarts in a loop, logs show `connection refused` to `postgres`.
**Action:**
1. `docker compose logs postgres` — look for the actual error
2. Common cause: out of disk space. `df -h` on the VM.
3. If corrupted volume: stop stack, restore from last `pg_dump` (see "Restore from backup")
### Worker queue is backed up
**Symptoms:** Builds take forever, RabbitMQ UI (`http://<vm>:15672/`) shows growing queue depth.
**Action:**
1. Check worker logs for stuck tasks
2. Scale workers: edit `docker-compose.yml`, set `worker.deploy.replicas: 2`, `docker compose up -d`
3. If a specific task is hanging, purge it: `docker compose exec worker celery -A mist.worker purge`
### Cache disk is full
**Symptoms:** Build jobs fail with `OSError: no space left on device`.
**Action:**
1. `df -h` to confirm
2. `docker compose exec api python -m mist.core.paths --clear-cache` (TODO: implement this maintenance task)
3. Or manually: stop stack, `rm -rf /var/lib/docker/volumes/mist_cache-vol/_data/*`, restart
### Stack won't come back up after VM reboot
**Symptoms:** SSH in after reboot, `docker compose ps` shows nothing or services are Exited.
**Action:**
1. Verify Docker daemon: `systemctl status docker`
2. `cd /opt/mist && docker compose up -d`
3. If still failing, check `restart: unless-stopped` is set on all services in `docker-compose.yml`
## Backups
### What we back up
- Postgres (full dump) — daily
- `Mist/.env` (passwords, secrets) — versioned outside this repo
- `docker-compose.yml` and any host-level config — in git
### What we DON'T back up here
- Game files on NAS — NAS has its own backup story (assumed RAID + remote replication)
- Hot cache — regenerable from NAS
### Take a Postgres backup
```sh
docker compose exec -T postgres pg_dump -U mist mist | zstd > /mnt/nas/mist/backups/pg-$(date +%F).sql.zst
```
### Restore from a Postgres backup
```sh
docker compose stop api worker
zstd -d < /mnt/nas/mist/backups/pg-YYYY-MM-DD.sql.zst | docker compose exec -T postgres psql -U mist mist
docker compose start api worker
```
## Provisioning a new friend account
(Until the admin portal supports this end-to-end.)
```sh
docker compose exec api python -m mist.scripts.create_user <username> <password> [--admin]
```
(TODO: implement that script.)
## Resetting your admin password
```sh
docker compose exec api python -m mist.scripts.reset_password <username> <new-password>
```
(TODO: implement that script.)
## Health checks (manual)
```sh
curl -s https://api.mist.example/healthz # expect {"ok": true}
curl -s https://api.mist.example/readyz # expect 200 if DB/Redis/RabbitMQ all reachable
```