Initial Mist scaffold
Successor to the Josh Steam prototypes. Single-VM Docker Compose stack with
the load-bearing core/ logic ported from JoshSteam CDN with bug fixes.
Contents:
- backend/ FastAPI + Celery (same image, two entrypoints)
core/ hdiff, librsync, chain_replay, manifest, compression,
discord, steam, unrealpak, paths
api/ auth, catalog, admin, builds (skeletons) + downloads (real)
worker/ Celery factory replacing the missing prototype Tasks/__init__.py
db/ SQLAlchemy models + Alembic initial migration
- admin-web/ SvelteKit + Tailwind skeleton
- client/ Tauri 2 + Svelte skeleton (Mist placeholder UI)
- mistpipe/ click-based admin CLI with subcommand stubs
- docs/ ARCHITECTURE, DECISIONS (9 ADRs), RUNBOOK
- docker-compose.yml + dev overlay + .github/workflows
Bugs fixed during port:
- Routes/download.py:2 stray backslash on import line
- Utils/celery.py inspect.reserved() missing parens + double active() typo
- Hardcoded OneDrive/Desktop paths replaced with pydantic-settings config
- Discord webhook URL + RabbitMQ password moved to env vars
- Missing Tasks/__init__.py reconstructed as worker/__init__.py
Out of scope for this commit: route bodies, UI screens, mistpipe subcommand
bodies, real image builds.
This commit is contained in:
+151
@@ -0,0 +1,151 @@
|
||||
# Mist — Operational Runbook
|
||||
|
||||
A short, dense reference for "what do I do when X happens." Fill in as we hit real situations.
|
||||
|
||||
## Backend VM access
|
||||
|
||||
SSH: `ssh mist@<vm-tailnet-name>` (or `<vm-tailnet-ip>`).
|
||||
Compose lives at: `/opt/mist/` (TODO: confirm during deploy).
|
||||
All `docker compose` commands run from that directory.
|
||||
|
||||
## Normal operations
|
||||
|
||||
### Deploy a new image
|
||||
|
||||
CI pushes images to GHCR on merge to `main`. To pull and restart:
|
||||
|
||||
```sh
|
||||
cd /opt/mist
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
### View logs
|
||||
|
||||
```sh
|
||||
docker compose logs -f api
|
||||
docker compose logs -f worker
|
||||
docker compose logs --tail=200 api worker
|
||||
```
|
||||
|
||||
### Restart the stack
|
||||
|
||||
```sh
|
||||
cd /opt/mist
|
||||
docker compose down
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Restart a single service
|
||||
|
||||
```sh
|
||||
docker compose restart worker
|
||||
```
|
||||
|
||||
### Run a Celery task manually (debugging)
|
||||
|
||||
```sh
|
||||
docker compose exec api python -c "from mist.worker.tasks import generate_direct_update; generate_direct_update.delay('Satisfactory', '1.0.0.0', '1.0.0.1')"
|
||||
```
|
||||
|
||||
## Failure scenarios
|
||||
|
||||
### NAS is unreachable
|
||||
|
||||
**Symptoms:** worker tasks fail with `FileNotFoundError` for `/mnt/nas/...`, API `/downloads/*` returns 404 for non-cached files.
|
||||
|
||||
**Action:**
|
||||
1. Verify NAS reachability from the VM: `ls /mnt/nas/mist/games/`
|
||||
2. If empty/error, NFS mount is broken. Check mount: `mount | grep nas`
|
||||
3. Remount: `sudo mount -a` (assuming `/etc/fstab` has the entry)
|
||||
4. If still broken, log into NAS, verify it's serving NFS
|
||||
5. Stack will recover automatically once NAS is back; in-flight jobs will retry per Celery config
|
||||
|
||||
### Postgres won't start
|
||||
|
||||
**Symptoms:** `api` container restarts in a loop, logs show `connection refused` to `postgres`.
|
||||
|
||||
**Action:**
|
||||
1. `docker compose logs postgres` — look for the actual error
|
||||
2. Common cause: out of disk space. `df -h` on the VM.
|
||||
3. If corrupted volume: stop stack, restore from last `pg_dump` (see "Restore from backup")
|
||||
|
||||
### Worker queue is backed up
|
||||
|
||||
**Symptoms:** Builds take forever, RabbitMQ UI (`http://<vm>:15672/`) shows growing queue depth.
|
||||
|
||||
**Action:**
|
||||
1. Check worker logs for stuck tasks
|
||||
2. Scale workers: edit `docker-compose.yml`, set `worker.deploy.replicas: 2`, `docker compose up -d`
|
||||
3. If a specific task is hanging, purge it: `docker compose exec worker celery -A mist.worker purge`
|
||||
|
||||
### Cache disk is full
|
||||
|
||||
**Symptoms:** Build jobs fail with `OSError: no space left on device`.
|
||||
|
||||
**Action:**
|
||||
1. `df -h` to confirm
|
||||
2. `docker compose exec api python -m mist.core.paths --clear-cache` (TODO: implement this maintenance task)
|
||||
3. Or manually: stop stack, `rm -rf /var/lib/docker/volumes/mist_cache-vol/_data/*`, restart
|
||||
|
||||
### Stack won't come back up after VM reboot
|
||||
|
||||
**Symptoms:** SSH in after reboot, `docker compose ps` shows nothing or services are Exited.
|
||||
|
||||
**Action:**
|
||||
1. Verify Docker daemon: `systemctl status docker`
|
||||
2. `cd /opt/mist && docker compose up -d`
|
||||
3. If still failing, check `restart: unless-stopped` is set on all services in `docker-compose.yml`
|
||||
|
||||
## Backups
|
||||
|
||||
### What we back up
|
||||
|
||||
- Postgres (full dump) — daily
|
||||
- `Mist/.env` (passwords, secrets) — versioned outside this repo
|
||||
- `docker-compose.yml` and any host-level config — in git
|
||||
|
||||
### What we DON'T back up here
|
||||
|
||||
- Game files on NAS — NAS has its own backup story (assumed RAID + remote replication)
|
||||
- Hot cache — regenerable from NAS
|
||||
|
||||
### Take a Postgres backup
|
||||
|
||||
```sh
|
||||
docker compose exec -T postgres pg_dump -U mist mist | zstd > /mnt/nas/mist/backups/pg-$(date +%F).sql.zst
|
||||
```
|
||||
|
||||
### Restore from a Postgres backup
|
||||
|
||||
```sh
|
||||
docker compose stop api worker
|
||||
zstd -d < /mnt/nas/mist/backups/pg-YYYY-MM-DD.sql.zst | docker compose exec -T postgres psql -U mist mist
|
||||
docker compose start api worker
|
||||
```
|
||||
|
||||
## Provisioning a new friend account
|
||||
|
||||
(Until the admin portal supports this end-to-end.)
|
||||
|
||||
```sh
|
||||
docker compose exec api python -m mist.scripts.create_user <username> <password> [--admin]
|
||||
```
|
||||
|
||||
(TODO: implement that script.)
|
||||
|
||||
## Resetting your admin password
|
||||
|
||||
```sh
|
||||
docker compose exec api python -m mist.scripts.reset_password <username> <new-password>
|
||||
```
|
||||
|
||||
(TODO: implement that script.)
|
||||
|
||||
## Health checks (manual)
|
||||
|
||||
```sh
|
||||
curl -s https://api.mist.example/healthz # expect {"ok": true}
|
||||
curl -s https://api.mist.example/readyz # expect 200 if DB/Redis/RabbitMQ all reachable
|
||||
```
|
||||
Reference in New Issue
Block a user