SixFlagsSuperCalendar/docs/OPERATIONS.md

# Operations

> See also: [Architecture](ARCHITECTURE.md) | [API Reference](API.md) | [Development](DEVELOPMENT.md)

## Deployment Overview

The application runs as two Docker containers:

| Container | Port | Role |
|-----------|------|------|
| `web` | 3000 | Next.js frontend (stateless, no database) |
| `backend` | 3001 | Hono API server (owns SQLite database, runs cron scheduler) |

**Infrastructure requirements:**
- Docker host with Docker Compose
- Container registry (Gitea, Docker Hub, or any OCI-compatible registry)
- Outbound HTTPS access to `d18car1k0ff81h.cloudfront.net` (Six Flags API) and `queue-times.com` (live ride data)
- A reverse proxy (Traefik, nginx, Caddy, etc.) is expected to sit in front for TLS termination and domain routing, but is not included in this repository

See [Architecture](ARCHITECTURE.md) for detailed system design.

---

## Docker Images

### Multi-Stage Build

The project uses a single `Dockerfile` with four stages producing two final images:

```
  builder           backend-deps
  (Next.js build)   (native modules)
      |                   |
      v                   v
    web               backend
  (final)             (final)
```

| Stage | Base | Purpose |
|-------|------|---------|
| `builder` | `node:22-bookworm-slim` | `npm ci` + `npm run build` -- produces Next.js standalone output |
| `backend-deps` | `node:22-bookworm-slim` | Installs `python3`, `make`, `g++` for `better-sqlite3` native compilation, then `npm ci` |
| `web` (final) | `node:22-bookworm-slim` | Copies standalone output from `builder`. Non-root user. ~150MB. |
| `backend` (final) | `node:22-bookworm-slim` | Copies `node_modules` from `backend-deps` + source code. Volume for SQLite. Non-root user. ~200MB. |

### Image Tags

```
{registry}/{owner}/thoosiecalendar:web
{registry}/{owner}/thoosiecalendar:backend
```

### Building Locally

```bash
# Build web image
docker build --target web -t thoosiecalendar:web .

# Build backend image
docker build --target backend -t thoosiecalendar:backend .
```

---

## Docker Compose

The production `docker-compose.yml`:

```yaml
services:
  web:
    image: gitea.thewrightserver.net/josh/thoosiecalendar:web
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - BACKEND_URL=http://backend:3001    # Docker internal networking
      - TZ=America/New_York
    restart: unless-stopped

  backend:
    image: gitea.thewrightserver.net/josh/thoosiecalendar:backend
    ports:
      - "3001:3001"
    volumes:
      - park_data:/app/backend/data         # SQLite database persistence
    environment:
      - NODE_ENV=production
      - TZ=America/New_York                # Timezone for cron schedules
      - PARK_HOURS_STALENESS_HOURS=72      # Hours before re-fetching park data
    restart: unless-stopped

volumes:
  park_data:                               # Named volume for database files
```

**Networking:** The `web` container reaches the backend via Docker's internal DNS at `http://backend:3001`. The backend port is also exposed to the host for manual API access during troubleshooting.

---

## Environment Variables

### Web Container

| Variable | Default | Description |
|----------|---------|-------------|
| `BACKEND_URL` | `http://localhost:3001` | Backend API base URL. Set to `http://backend:3001` in Docker Compose for internal networking. |
| `NODE_ENV` | -- | Set to `production` in Docker. |
| `NEXT_TELEMETRY_DISABLED` | `1` | Disables Next.js telemetry (set in Dockerfile). |
| `PORT` | `3000` | Server listen port (set in Dockerfile). |
| `HOSTNAME` | `0.0.0.0` | Bind address (set in Dockerfile to allow external access). |

### Backend Container

| Variable | Default | Description |
|----------|---------|-------------|
| `TZ` | `UTC` | Process timezone. Controls when cron jobs fire. Set to `America/New_York` in production so schedules align with US Eastern parks. |
| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is considered stale and re-fetched. Lower values increase API load; higher values increase data lag. |
| `RATE_LIMIT_PER_MIN` | `60` | Per-IP request limit for the public API. Over-limit requests return `429 Too Many Requests` with a `Retry-After` header. Enforced by `backend/src/middleware/rate-limit.ts`. Behind a proxy, ensure `x-forwarded-for` is set or every client looks like the proxy IP. |
| `NODE_ENV` | -- | Set to `production` in Docker. |
| `PORT` | `3001` | Server listen port. |

`backend/src/config.ts` parses and validates these at startup. A bad value (e.g. `PORT=foo`) fails fast with a thrown `Error` rather than surfacing in a request handler later.

---

## CI/CD Pipeline

### Gitea Actions Workflow

**File:** `.gitea/workflows/deploy.yml`

**Trigger:** Push to `main` branch.

**Steps:**
1. Checkout code (`actions/checkout@v4`)
2. Log in to Gitea container registry
3. Build and push `web` image (`docker/build-push-action@v6`, target: `web`)
4. Build and push `backend` image (`docker/build-push-action@v6`, target: `backend`)

### Required Configuration

| Type | Name | Description |
|------|------|-------------|
| Variable | `REGISTRY` | Container registry URL (e.g. `gitea.thewrightserver.net`) |
| Secret | `REGISTRY_TOKEN` | Authentication token for the registry |

These are configured in the Gitea repository settings under **Settings > Actions > Secrets** and **Settings > Actions > Variables**.

### Setting Up CI/CD from Scratch

1. Create a Gitea repository
2. Add `REGISTRY` as a repository variable (Settings > Actions > Variables)
3. Add `REGISTRY_TOKEN` as a repository secret (Settings > Actions > Secrets)
4. Push to `main` -- the workflow triggers automatically
5. Pull images on your Docker host: `docker compose pull && docker compose up -d`

---

## Initial Deployment Checklist

1. **Create a `docker-compose.yml`** on your Docker host (see the Docker Compose section above, or use the one from the repository).

2. **Pull and start the containers:**
   ```bash
   docker compose pull
   docker compose up -d
   ```

3. **Verify the backend started:**
   ```bash
   docker compose logs backend
   # Look for (structured log lines, see the Log Reference section):
   #   [INFO] [startup] database initialized
   #   [INFO] [scheduler] cron jobs registered ...
   #   [INFO] [startup] listening url=http://localhost:3001
   ```

4. **Check database status (will be empty on first run):**
   ```bash
   curl http://localhost:3001/api/status
   # { "status": "ok", "database": { "totalDays": 0, ... } }
   ```

5. **Trigger the initial data scrape:**
   ```bash
   curl -X POST http://localhost:3001/api/scrape/trigger?scope=full
   ```
   This scrapes all 12 months for all 24 parks with a 1-second delay between parks. **Expected duration: 5-10 minutes.**

6. **Verify data was scraped:**
   ```bash
   curl http://localhost:3001/api/status
   # totalDays should be ~8000-9000
   ```

7. **Open the web UI:** Navigate to `http://your-host:3000`.

The cron scheduler starts automatically and will keep data fresh going forward.

---

## Updating

```bash
docker compose pull && docker compose up -d
```

- The SQLite database lives in a named Docker volume (`park_data`), so it persists across container recreations.
- Schema migrations are applied automatically on backend startup. New columns are added via `ALTER TABLE ... ADD COLUMN` wrapped in try/catch -- if the column already exists, the error is silently caught.
- No manual migration steps are needed.

---

## Backup and Restore

### What to Back Up

The SQLite database at `/app/backend/data/parks.db` inside the `park_data` Docker volume. WAL journal files (`parks.db-wal` and `parks.db-shm`) must be included for a consistent backup.

### Backup Methods

**Method 1: Copy from the container**
```bash
docker compose cp backend:/app/backend/data/parks.db ./backup/parks.db
docker compose cp backend:/app/backend/data/parks.db-wal ./backup/parks.db-wal 2>/dev/null
docker compose cp backend:/app/backend/data/parks.db-shm ./backup/parks.db-shm 2>/dev/null
```

**Method 2: Mount the volume to the host**
Add a bind mount in `docker-compose.yml`:
```yaml
volumes:
  - ./data:/app/backend/data
```

### Restore

1. Stop the backend: `docker compose stop backend`
2. Replace the database files in the volume
3. Restart: `docker compose start backend`

### Note on Reproducibility

All data is sourced from external APIs and is fully reproducible. If the database is lost, simply restart the backend (which auto-creates an empty database) and trigger a full scrape:
```bash
curl -X POST http://localhost:3001/api/scrape/trigger?scope=full
```

Backups are recommended for continuity (avoiding the 5-10 minute re-scrape window) but are not critical.

---

## Scheduler Operations

### Tiered Cron Schedule

The backend runs five scraping tiers via `node-cron`:

| Tier | Cron Expression | Schedule | Scope | Delay |
|------|-----------------|----------|-------|-------|
| 1 | `0 * * 3-12 *` | Hourly, March through December | Today's hours for all parks | 500ms |
| 2 | `0 */6 * * *` | Every 6 hours | Current month for all parks | 1000ms |
| 3 | `0 3,15 * * *` | 3 AM and 3 PM | Current + next month | 1000ms |
| 4 | `0 3 * * *` | Daily at 3 AM | Full year (all 12 months) | 1000ms |
| 5 | `*/5 * * * *` | Every 5 minutes | Wait-time samples for currently-open parks into `ride_wait_samples` | parallel chunks of 6 |

**Staleness:** Tiers 2-4 skip any park-month that was scraped within `PARK_HOURS_STALENESS_HOURS` (default 72h). Tier 1 always fetches (uses diff-before-write instead). Tier 5 only samples parks whose `park_days` row marks them open today *and* whose current local time is inside the operating window (with a 1-hour closing buffer).

**Off-season:** Tier 1 only runs from March through December. The month constraint `3-12` in the cron expression skips January and February when most parks are closed. Tier 5 runs year-round but is effectively a no-op when no parks are open.

**Concurrency latches:** Every tier is wrapped in `withLatch()` (see `backend/src/services/scheduler.ts`). If a tick is still running when the next would fire, the new tick is *skipped* and logged with a `previous run still in progress` warning rather than stacking. Each tier has its own latch so a slow Tier-4 doesn't block Tier-5's 5-minute cadence.

**Weather-delayed parks skipped from sampling:** Tier 5 detects the "rides exist but all closed during scheduled hours" case and skips writes for that park, so a storm doesn't poison the uptime statistics with hours of `is_open=0` samples.

### Startup Behavior

On boot, the scheduler checks `getParkDayCount()` against a threshold of 50 rows:

- **Empty / nearly-empty database** (< 50 rows): runs `scrapeToday()` followed by `scrapeFullYear()` in sequence. Logs `[scheduler.startup]` lines for each phase.
- **Populated database** (≥ 50 rows): skips the startup scrape and relies on cron tiers. Logs `skipping startup scrape — relying on cron`.

This replaces the earlier behavior of full-scraping on every container start, which doubled outbound API load and delayed readiness on every deploy.

### Timezone Sensitivity

Cron expressions execute in the process timezone, controlled by the `TZ` environment variable. In production this is set to `America/New_York` so that "3 AM" aligns with US Eastern time.

The per-park timezone (e.g. `America/Los_Angeles` for Magic Mountain) is used separately for operating window detection -- it does not affect cron schedule timing.

### The 3 AM Switchover

`getTodayLocal()` in `lib/env.ts` implements a 3 AM local-time switchover: before 3 AM, the system considers it "yesterday." This prevents the calendar from flipping to the next day at midnight while park visitors are still out. The switchover uses the server's local time (influenced by `TZ`), not individual park timezones.

---

## Manual Scraping

Trigger a scrape at any time via the backend API:

```bash
curl -X POST http://localhost:3001/api/scrape/trigger?scope=<scope>
```

### Scope Options

| Scope | Behavior | Duration |
|-------|----------|----------|
| `today` | Fetches today's hours for all 24 parks. Diffs against database before writing. 500ms delay. | ~15s |
| `month` | Current month for all parks. Respects staleness window. 1000ms delay. | ~30s |
| `upcoming` | Current + next month. Respects staleness. | ~1min |
| `full` | All 12 months. Respects staleness. | ~5-10min |
| `force` | All 12 months. **Ignores staleness** -- forces re-fetch of everything. | ~5-10min |

### Response

```json
{
    "scope": "today",
    "fetched": 24,
    "skipped": 0,
    "errors": 0,
    "updated": 3,
    "startedAt": "2026-04-23T14:00:00.000Z",
    "finishedAt": "2026-04-23T14:00:12.000Z"
}
```

### Typical Use Cases

- **After initial deployment:** `scope=full` to populate the database
- **After an extended outage:** `scope=force` to refresh all data regardless of staleness
- **Investigating a specific park:** `scope=today` to get fresh data quickly
- **Before peak season:** `scope=full` to ensure complete coverage

---

## Health Monitoring

### Health Endpoint

```bash
curl http://localhost:3001/api/status
```

### Response

```json
{
    "status": "ok",
    "uptime": 86400,
    "parks": 24,
    "database": {
        "totalDays": 8760,
        "lastScrape": "2026-04-23T14:00:12.000Z"
    },
    "lastScrapeResult": {
        "scope": "today",
        "fetched": 24,
        "skipped": 0,
        "errors": 0,
        "updated": 3,
        "startedAt": "2026-04-23T14:00:00.000Z",
        "finishedAt": "2026-04-23T14:00:12.000Z"
    }
}
```

### Key Metrics

| Metric | Expected Value | Concern If |
|--------|---------------|------------|
| `status` | `"ok"` | Not `"ok"` (always `"ok"` currently, but confirms the endpoint is reachable) |
| `uptime` | Increasing | Drops to 0 (container restarted) |
| `database.totalDays` | 8,000-9,000 (full year) | Much lower (scraping not running) or 0 (empty database) |
| `database.lastScrape` | Within the last hour (during operating season) | More than a few hours old (scheduler may be broken) |
| `lastScrapeResult.errors` | 0 | Consistently high (API may be blocking requests) |

### Suggested Alerting

- Alert if `database.lastScrape` is more than 12 hours old during operating season (March-December)
- Alert if `lastScrapeResult.errors` exceeds 5 on consecutive scrapes
- Alert if the health endpoint is unreachable

---

## Troubleshooting

### No data showing in the calendar

1. **Check if the backend is running:**
   ```bash
   docker compose logs backend --tail 50
   ```
   Look for an `[INFO] [startup] listening url=http://localhost:3001` line.

2. **Check if the database has data:**
   ```bash
   curl http://localhost:3001/api/status | jq .database.totalDays
   ```
   If 0, trigger a manual scrape: `curl -X POST http://localhost:3001/api/scrape/trigger?scope=full`

3. **Check the BACKEND_URL in the web container:**
   ```bash
   docker compose exec web env | grep BACKEND_URL
   ```
   Should be `http://backend:3001` (not localhost, which won't resolve inside Docker).

### Ride counts not appearing on the home page

- Ride counts only appear for parks that are **currently within their operating window**, as determined by `isWithinOperatingWindow()`. Outside of park hours, no rides are shown.
- Queue-Times data is cached for 5 minutes. Recent park openings may take up to 5 minutes to appear.
- **Weather delay** (blue badge) means the park is within its hours but all rides report closed -- this is expected during weather-related closures.
- Verify the park has a Queue-Times mapping in `lib/queue-times-map.ts`.

### Stale data / not updating

1. **Check scheduler logs:**
   ```bash
   docker compose logs backend | grep scheduler
   ```
   You should see periodic `[scheduler] tier-X: scraping...` messages.

2. **Verify timezone:**
   ```bash
   docker compose exec backend date
   ```
   Should match the `TZ` environment variable (`America/New_York`).

3. **Check staleness threshold:** Data within `PARK_HOURS_STALENESS_HOURS` (default 72h) is skipped by tiers 2-4. If you recently changed park data manually, it may not be re-fetched until the staleness window expires.

4. **Force a refresh:**
   ```bash
   curl -X POST http://localhost:3001/api/scrape/trigger?scope=force
   ```

### Rate limited by Six Flags API

- Look for `[rate-limited]` messages in the backend logs:
  ```bash
  docker compose logs backend | grep rate-limited
  ```
- The client uses exponential backoff: 30s, 60s, 120s, then throws a `RateLimitError` and moves to the next park.
- If rate limiting is persistent, increase `PARK_HOURS_STALENESS_HOURS` to reduce scrape frequency (e.g. 96 or 120).
- The inter-park delay is hardcoded at 1000ms (500ms for the today tier) in `backend/src/services/scraper.ts`.

### Wrong timezone / incorrect dates

- `getTodayLocal()` uses the server's local time (set by `TZ` env var) with a 3 AM cutover. Before 3 AM, the system considers it "yesterday."
- Each park has its own IANA timezone (stored in `lib/parks.ts`) used for operating window checks. The `TZ` env var only affects cron schedule timing and the "today" determination.
- If dates seem off, check both `TZ` and the server's system clock:
  ```bash
  docker compose exec backend date
  docker compose exec backend node -e "console.log(new Date().toISOString())"
  ```

### Database corruption

If the database becomes corrupted (unlikely with SQLite WAL mode, but possible after a hard crash):

1. Stop the backend: `docker compose stop backend`
2. Delete the database files from the volume:
   ```bash
   docker compose run --rm backend rm -f /app/backend/data/parks.db /app/backend/data/parks.db-wal /app/backend/data/parks.db-shm
   ```
3. Restart: `docker compose start backend` (auto-creates empty database)
4. Re-scrape: `curl -X POST http://localhost:3001/api/scrape/trigger?scope=full`

---

## Log Reference

The backend uses a small structured logger (`backend/src/log.ts`). Every line has the format:

```
<ISO timestamp> [<LEVEL>] [<tag>] <message> key1=value1 key2=value2 …
```

Levels are `INFO`, `WARN`, `ERROR`. `ERROR` writes to stderr; the others write to stdout. Grep-friendly: filter by tag (`grep '\[scheduler.tier1\]'`) or by key (`grep 'park=cedarpoint'`).

| Tag | Source | Meaning |
|-----|--------|---------|
| `startup` | `index.ts` | Config loaded, DB initialized, server listening |
| `shutdown` | `index.ts` | `SIGTERM`/`SIGINT` received; graceful shutdown progress |
| `http` | `index.ts` | One line per request: `method`, `path`, `status`, `ms` |
| `scheduler` | `scheduler.ts` | Cron job registration summary on boot |
| `scheduler.tier1` … `scheduler.tier5` | `scheduler.ts` | Each tier's tick; includes skip-due-to-latch warnings |
| `scheduler.startup` | `scheduler.ts` | Result of the "database empty" startup scrape |
| `today` / `month` | `scraper.ts` | Per-park / per-month scrape results |
| `wait-sampler` | `wait-sampler.ts` | Tier-5 per-park sample writes, errors, weather-delay skips |
| `rate-limit` | `middleware/rate-limit.ts` | `blocked` event with `ip`, `count`, `retryAfter` |
| `rides` | `routes/rides.ts` | Per-request warnings when upstream calls fail |
| `rate-limited` | `lib/scrapers/sixflags.ts` | HTTP 429/503 from Six Flags with backoff timing |

**Example log output:**

```
2026-04-23T14:00:00.012Z [INFO] [startup] config loaded port=3001 nodeEnv=production parkHoursStalenessHours=72 rateLimitPerMin=60
2026-04-23T14:00:00.034Z [INFO] [startup] database initialized
2026-04-23T14:00:00.041Z [INFO] [scheduler] cron jobs registered tiers="tier1=hourly(Mar-Dec) tier2=6h tier3=3am+3pm tier4=3am-daily tier5=5min"
2026-04-23T14:00:00.042Z [INFO] [scheduler] skipping startup scrape — relying on cron existingRows=8742
2026-04-23T14:00:00.045Z [INFO] [startup] listening url=http://localhost:3001
2026-04-23T14:00:00.123Z [INFO] [http] GET /api/calendar/week status=200 ms=18
2026-04-23T14:00:10.001Z [INFO] [scheduler.tier1] scraping today
2026-04-23T14:05:00.001Z [INFO] [scheduler.tier5] sample run complete parksSampled=14 parksSkipped=10 samplesWritten=612 weatherDelayed=0 errors=0
```

---

## Performance Tuning

| Aspect | Current Setting | Notes |
|--------|----------------|-------|
| SQLite WAL mode | Enabled | Allows concurrent reads during writes. No configuration needed. |
| In-memory cache | TtlCache (5 min TTL) | Bounded by park count -- at most ~72 entries (24 parks x 3 caches). Memory impact is negligible. |
| Staleness window | 72 hours | Controls how often park data is re-fetched from the API. Lower values = fresher data but more API calls and higher rate-limit risk. |
| Inter-park delay | 1000ms / 500ms | Hardcoded in `scraper.ts`. Provides respectful pacing against the Six Flags API. |
| ISR revalidation | 60-300s per route | Controlled in Next.js fetch calls. Lower values = fresher pages but more backend requests. |
| Next.js standalone | Enabled | Produces a minimal server bundle without unused dependencies. |