docs: sync README and docs/ with current codebase
Build and Deploy / Lint, typecheck, test (push) Successful in 34s
Build and Deploy / Build & Push (push) Successful in 1m6s

Surfaces features that landed after the last big docs pass: per-ride
history pages, Fast Lane wait times, outage shading on the today chart,
Tier-5 wait-time sampler, production-hardening pieces (rate limiter,
structured logger, env validation, graceful shutdown), and the new
rides + ride_wait_samples tables. Also corrects the weather-delay rule
to match the "open" vs "closing" gate now in rides.ts.
This commit is contained in:
2026-06-02 15:31:50 -04:00
parent 2e9cec0b56
commit f87462385c
5 changed files with 397 additions and 72 deletions
+54 -25
View File
@@ -116,9 +116,12 @@ volumes:
|----------|---------|-------------|
| `TZ` | `UTC` | Process timezone. Controls when cron jobs fire. Set to `America/New_York` in production so schedules align with US Eastern parks. |
| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is considered stale and re-fetched. Lower values increase API load; higher values increase data lag. |
| `RATE_LIMIT_PER_MIN` | `60` | Per-IP request limit for the public API. Over-limit requests return `429 Too Many Requests` with a `Retry-After` header. Enforced by `backend/src/middleware/rate-limit.ts`. Behind a proxy, ensure `x-forwarded-for` is set or every client looks like the proxy IP. |
| `NODE_ENV` | -- | Set to `production` in Docker. |
| `PORT` | `3001` | Server listen port. |
`backend/src/config.ts` parses and validates these at startup. A bad value (e.g. `PORT=foo`) fails fast with a thrown `Error` rather than surfacing in a request handler later.
---
## CI/CD Pipeline
@@ -167,9 +170,10 @@ These are configured in the Gitea repository settings under **Settings > Actions
3. **Verify the backend started:**
```bash
docker compose logs backend
# Look for: [backend] database initialized
# [scheduler] cron jobs registered
# [backend] listening on http://localhost:3001
# Look for (structured log lines, see the Log Reference section):
# [INFO] [startup] database initialized
# [INFO] [scheduler] cron jobs registered ...
# [INFO] [startup] listening url=http://localhost:3001
```
4. **Check database status (will be empty on first run):**
@@ -251,7 +255,7 @@ Backups are recommended for continuity (avoiding the 5-10 minute re-scrape windo
### Tiered Cron Schedule
The backend runs four scraping tiers via `node-cron`:
The backend runs five scraping tiers via `node-cron`:
| Tier | Cron Expression | Schedule | Scope | Delay |
|------|-----------------|----------|-------|-------|
@@ -259,10 +263,24 @@ The backend runs four scraping tiers via `node-cron`:
| 2 | `0 */6 * * *` | Every 6 hours | Current month for all parks | 1000ms |
| 3 | `0 3,15 * * *` | 3 AM and 3 PM | Current + next month | 1000ms |
| 4 | `0 3 * * *` | Daily at 3 AM | Full year (all 12 months) | 1000ms |
| 5 | `*/5 * * * *` | Every 5 minutes | Wait-time samples for currently-open parks into `ride_wait_samples` | parallel chunks of 6 |
**Staleness:** Tiers 2-4 skip any park-month that was scraped within `PARK_HOURS_STALENESS_HOURS` (default 72h). Tier 1 always fetches (uses diff-before-write instead).
**Staleness:** Tiers 2-4 skip any park-month that was scraped within `PARK_HOURS_STALENESS_HOURS` (default 72h). Tier 1 always fetches (uses diff-before-write instead). Tier 5 only samples parks whose `park_days` row marks them open today *and* whose current local time is inside the operating window (with a 1-hour closing buffer).
**Off-season:** Tier 1 only runs from March through December. The month constraint `3-12` in the cron expression skips January and February when most parks are closed.
**Off-season:** Tier 1 only runs from March through December. The month constraint `3-12` in the cron expression skips January and February when most parks are closed. Tier 5 runs year-round but is effectively a no-op when no parks are open.
**Concurrency latches:** Every tier is wrapped in `withLatch()` (see `backend/src/services/scheduler.ts`). If a tick is still running when the next would fire, the new tick is *skipped* and logged with a `previous run still in progress` warning rather than stacking. Each tier has its own latch so a slow Tier-4 doesn't block Tier-5's 5-minute cadence.
**Weather-delayed parks skipped from sampling:** Tier 5 detects the "rides exist but all closed during scheduled hours" case and skips writes for that park, so a storm doesn't poison the uptime statistics with hours of `is_open=0` samples.
### Startup Behavior
On boot, the scheduler checks `getParkDayCount()` against a threshold of 50 rows:
- **Empty / nearly-empty database** (< 50 rows): runs `scrapeToday()` followed by `scrapeFullYear()` in sequence. Logs `[scheduler.startup]` lines for each phase.
- **Populated database** (≥ 50 rows): skips the startup scrape and relies on cron tiers. Logs `skipping startup scrape — relying on cron`.
This replaces the earlier behavior of full-scraping on every container start, which doubled outbound API load and delayed readiness on every deploy.
### Timezone Sensitivity
@@ -374,7 +392,7 @@ curl http://localhost:3001/api/status
```bash
docker compose logs backend --tail 50
```
Look for `[backend] listening on http://localhost:3001`.
Look for an `[INFO] [startup] listening url=http://localhost:3001` line.
2. **Check if the database has data:**
```bash
@@ -452,28 +470,39 @@ If the database becomes corrupted (unlikely with SQLite WAL mode, but possible a
## Log Reference
| Prefix | Source | Meaning |
|--------|--------|---------|
| `[backend]` | `index.ts` | Startup messages: DB initialized, server listening |
| `[scheduler]` | `scheduler.ts` | Cron job triggers with tier number |
| `[today]` | `scraper.ts` | Per-park results for the today tier (updated/skipped/error) |
| `[month]` | `scraper.ts` | Per-park-month results (open days count, rate limited, errors) |
| `[rate-limited]` | `sixflags.ts` | HTTP 429/503 with backoff timing and retry attempt count |
The backend uses a small structured logger (`backend/src/log.ts`). Every line has the format:
```
<ISO timestamp> [<LEVEL>] [<tag>] <message> key1=value1 key2=value2 …
```
Levels are `INFO`, `WARN`, `ERROR`. `ERROR` writes to stderr; the others write to stdout. Grep-friendly: filter by tag (`grep '\[scheduler.tier1\]'`) or by key (`grep 'park=cedarpoint'`).
| Tag | Source | Meaning |
|-----|--------|---------|
| `startup` | `index.ts` | Config loaded, DB initialized, server listening |
| `shutdown` | `index.ts` | `SIGTERM`/`SIGINT` received; graceful shutdown progress |
| `http` | `index.ts` | One line per request: `method`, `path`, `status`, `ms` |
| `scheduler` | `scheduler.ts` | Cron job registration summary on boot |
| `scheduler.tier1` … `scheduler.tier5` | `scheduler.ts` | Each tier's tick; includes skip-due-to-latch warnings |
| `scheduler.startup` | `scheduler.ts` | Result of the "database empty" startup scrape |
| `today` / `month` | `scraper.ts` | Per-park / per-month scrape results |
| `wait-sampler` | `wait-sampler.ts` | Tier-5 per-park sample writes, errors, weather-delay skips |
| `rate-limit` | `middleware/rate-limit.ts` | `blocked` event with `ip`, `count`, `retryAfter` |
| `rides` | `routes/rides.ts` | Per-request warnings when upstream calls fail |
| `rate-limited` | `lib/scrapers/sixflags.ts` | HTTP 429/503 from Six Flags with backoff timing |
**Example log output:**
```
[backend] database initialized
[scheduler] cron jobs registered
tier-1: today — hourly (Mar-Dec)
tier-2: current month — every 6h
tier-3: upcoming — 3 AM + 3 PM
tier-4: full year — 3 AM daily
[backend] listening on http://localhost:3001
[scheduler] tier-1: scraping today @ 2026-04-23T14:00:00.000Z
[today] Great Adventure: updated (open 10am - 6pm)
[today] Cedar Point: updated (open 10am - 8pm)
[today] done: 24 fetched, 3 updated, 0 skipped, 0 errors
2026-04-23T14:00:00.012Z [INFO] [startup] config loaded port=3001 nodeEnv=production parkHoursStalenessHours=72 rateLimitPerMin=60
2026-04-23T14:00:00.034Z [INFO] [startup] database initialized
2026-04-23T14:00:00.041Z [INFO] [scheduler] cron jobs registered tiers="tier1=hourly(Mar-Dec) tier2=6h tier3=3am+3pm tier4=3am-daily tier5=5min"
2026-04-23T14:00:00.042Z [INFO] [scheduler] skipping startup scrape — relying on cron existingRows=8742
2026-04-23T14:00:00.045Z [INFO] [startup] listening url=http://localhost:3001
2026-04-23T14:00:00.123Z [INFO] [http] GET /api/calendar/week status=200 ms=18
2026-04-23T14:00:10.001Z [INFO] [scheduler.tier1] scraping today
2026-04-23T14:05:00.001Z [INFO] [scheduler.tier5] sample run complete parksSampled=14 parksSkipped=10 samplesWritten=612 weatherDelayed=0 errors=0
```
---