docs: sync README and docs/ with current codebase
Surfaces features that landed after the last big docs pass: per-ride history pages, Fast Lane wait times, outage shading on the today chart, Tier-5 wait-time sampler, production-hardening pieces (rate limiter, structured logger, env validation, graceful shutdown), and the new rides + ride_wait_samples tables. Also corrects the weather-delay rule to match the "open" vs "closing" gate now in rides.ts.
This commit is contained in:
+54
-25
@@ -116,9 +116,12 @@ volumes:
|
||||
|----------|---------|-------------|
|
||||
| `TZ` | `UTC` | Process timezone. Controls when cron jobs fire. Set to `America/New_York` in production so schedules align with US Eastern parks. |
|
||||
| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is considered stale and re-fetched. Lower values increase API load; higher values increase data lag. |
|
||||
| `RATE_LIMIT_PER_MIN` | `60` | Per-IP request limit for the public API. Over-limit requests return `429 Too Many Requests` with a `Retry-After` header. Enforced by `backend/src/middleware/rate-limit.ts`. Behind a proxy, ensure `x-forwarded-for` is set or every client looks like the proxy IP. |
|
||||
| `NODE_ENV` | -- | Set to `production` in Docker. |
|
||||
| `PORT` | `3001` | Server listen port. |
|
||||
|
||||
`backend/src/config.ts` parses and validates these at startup. A bad value (e.g. `PORT=foo`) fails fast with a thrown `Error` rather than surfacing in a request handler later.
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Pipeline
|
||||
@@ -167,9 +170,10 @@ These are configured in the Gitea repository settings under **Settings > Actions
|
||||
3. **Verify the backend started:**
|
||||
```bash
|
||||
docker compose logs backend
|
||||
# Look for: [backend] database initialized
|
||||
# [scheduler] cron jobs registered
|
||||
# [backend] listening on http://localhost:3001
|
||||
# Look for (structured log lines, see the Log Reference section):
|
||||
# [INFO] [startup] database initialized
|
||||
# [INFO] [scheduler] cron jobs registered ...
|
||||
# [INFO] [startup] listening url=http://localhost:3001
|
||||
```
|
||||
|
||||
4. **Check database status (will be empty on first run):**
|
||||
@@ -251,7 +255,7 @@ Backups are recommended for continuity (avoiding the 5-10 minute re-scrape windo
|
||||
|
||||
### Tiered Cron Schedule
|
||||
|
||||
The backend runs four scraping tiers via `node-cron`:
|
||||
The backend runs five scraping tiers via `node-cron`:
|
||||
|
||||
| Tier | Cron Expression | Schedule | Scope | Delay |
|
||||
|------|-----------------|----------|-------|-------|
|
||||
@@ -259,10 +263,24 @@ The backend runs four scraping tiers via `node-cron`:
|
||||
| 2 | `0 */6 * * *` | Every 6 hours | Current month for all parks | 1000ms |
|
||||
| 3 | `0 3,15 * * *` | 3 AM and 3 PM | Current + next month | 1000ms |
|
||||
| 4 | `0 3 * * *` | Daily at 3 AM | Full year (all 12 months) | 1000ms |
|
||||
| 5 | `*/5 * * * *` | Every 5 minutes | Wait-time samples for currently-open parks into `ride_wait_samples` | parallel chunks of 6 |
|
||||
|
||||
**Staleness:** Tiers 2-4 skip any park-month that was scraped within `PARK_HOURS_STALENESS_HOURS` (default 72h). Tier 1 always fetches (uses diff-before-write instead).
|
||||
**Staleness:** Tiers 2-4 skip any park-month that was scraped within `PARK_HOURS_STALENESS_HOURS` (default 72h). Tier 1 always fetches (uses diff-before-write instead). Tier 5 only samples parks whose `park_days` row marks them open today *and* whose current local time is inside the operating window (with a 1-hour closing buffer).
|
||||
|
||||
**Off-season:** Tier 1 only runs from March through December. The month constraint `3-12` in the cron expression skips January and February when most parks are closed.
|
||||
**Off-season:** Tier 1 only runs from March through December. The month constraint `3-12` in the cron expression skips January and February when most parks are closed. Tier 5 runs year-round but is effectively a no-op when no parks are open.
|
||||
|
||||
**Concurrency latches:** Every tier is wrapped in `withLatch()` (see `backend/src/services/scheduler.ts`). If a tick is still running when the next would fire, the new tick is *skipped* and logged with a `previous run still in progress` warning rather than stacking. Each tier has its own latch so a slow Tier-4 doesn't block Tier-5's 5-minute cadence.
|
||||
|
||||
**Weather-delayed parks skipped from sampling:** Tier 5 detects the "rides exist but all closed during scheduled hours" case and skips writes for that park, so a storm doesn't poison the uptime statistics with hours of `is_open=0` samples.
|
||||
|
||||
### Startup Behavior
|
||||
|
||||
On boot, the scheduler checks `getParkDayCount()` against a threshold of 50 rows:
|
||||
|
||||
- **Empty / nearly-empty database** (< 50 rows): runs `scrapeToday()` followed by `scrapeFullYear()` in sequence. Logs `[scheduler.startup]` lines for each phase.
|
||||
- **Populated database** (≥ 50 rows): skips the startup scrape and relies on cron tiers. Logs `skipping startup scrape — relying on cron`.
|
||||
|
||||
This replaces the earlier behavior of full-scraping on every container start, which doubled outbound API load and delayed readiness on every deploy.
|
||||
|
||||
### Timezone Sensitivity
|
||||
|
||||
@@ -374,7 +392,7 @@ curl http://localhost:3001/api/status
|
||||
```bash
|
||||
docker compose logs backend --tail 50
|
||||
```
|
||||
Look for `[backend] listening on http://localhost:3001`.
|
||||
Look for an `[INFO] [startup] listening url=http://localhost:3001` line.
|
||||
|
||||
2. **Check if the database has data:**
|
||||
```bash
|
||||
@@ -452,28 +470,39 @@ If the database becomes corrupted (unlikely with SQLite WAL mode, but possible a
|
||||
|
||||
## Log Reference
|
||||
|
||||
| Prefix | Source | Meaning |
|
||||
|--------|--------|---------|
|
||||
| `[backend]` | `index.ts` | Startup messages: DB initialized, server listening |
|
||||
| `[scheduler]` | `scheduler.ts` | Cron job triggers with tier number |
|
||||
| `[today]` | `scraper.ts` | Per-park results for the today tier (updated/skipped/error) |
|
||||
| `[month]` | `scraper.ts` | Per-park-month results (open days count, rate limited, errors) |
|
||||
| `[rate-limited]` | `sixflags.ts` | HTTP 429/503 with backoff timing and retry attempt count |
|
||||
The backend uses a small structured logger (`backend/src/log.ts`). Every line has the format:
|
||||
|
||||
```
|
||||
<ISO timestamp> [<LEVEL>] [<tag>] <message> key1=value1 key2=value2 …
|
||||
```
|
||||
|
||||
Levels are `INFO`, `WARN`, `ERROR`. `ERROR` writes to stderr; the others write to stdout. Grep-friendly: filter by tag (`grep '\[scheduler.tier1\]'`) or by key (`grep 'park=cedarpoint'`).
|
||||
|
||||
| Tag | Source | Meaning |
|
||||
|-----|--------|---------|
|
||||
| `startup` | `index.ts` | Config loaded, DB initialized, server listening |
|
||||
| `shutdown` | `index.ts` | `SIGTERM`/`SIGINT` received; graceful shutdown progress |
|
||||
| `http` | `index.ts` | One line per request: `method`, `path`, `status`, `ms` |
|
||||
| `scheduler` | `scheduler.ts` | Cron job registration summary on boot |
|
||||
| `scheduler.tier1` … `scheduler.tier5` | `scheduler.ts` | Each tier's tick; includes skip-due-to-latch warnings |
|
||||
| `scheduler.startup` | `scheduler.ts` | Result of the "database empty" startup scrape |
|
||||
| `today` / `month` | `scraper.ts` | Per-park / per-month scrape results |
|
||||
| `wait-sampler` | `wait-sampler.ts` | Tier-5 per-park sample writes, errors, weather-delay skips |
|
||||
| `rate-limit` | `middleware/rate-limit.ts` | `blocked` event with `ip`, `count`, `retryAfter` |
|
||||
| `rides` | `routes/rides.ts` | Per-request warnings when upstream calls fail |
|
||||
| `rate-limited` | `lib/scrapers/sixflags.ts` | HTTP 429/503 from Six Flags with backoff timing |
|
||||
|
||||
**Example log output:**
|
||||
|
||||
```
|
||||
[backend] database initialized
|
||||
[scheduler] cron jobs registered
|
||||
tier-1: today — hourly (Mar-Dec)
|
||||
tier-2: current month — every 6h
|
||||
tier-3: upcoming — 3 AM + 3 PM
|
||||
tier-4: full year — 3 AM daily
|
||||
[backend] listening on http://localhost:3001
|
||||
[scheduler] tier-1: scraping today @ 2026-04-23T14:00:00.000Z
|
||||
[today] Great Adventure: updated (open 10am - 6pm)
|
||||
[today] Cedar Point: updated (open 10am - 8pm)
|
||||
[today] done: 24 fetched, 3 updated, 0 skipped, 0 errors
|
||||
2026-04-23T14:00:00.012Z [INFO] [startup] config loaded port=3001 nodeEnv=production parkHoursStalenessHours=72 rateLimitPerMin=60
|
||||
2026-04-23T14:00:00.034Z [INFO] [startup] database initialized
|
||||
2026-04-23T14:00:00.041Z [INFO] [scheduler] cron jobs registered tiers="tier1=hourly(Mar-Dec) tier2=6h tier3=3am+3pm tier4=3am-daily tier5=5min"
|
||||
2026-04-23T14:00:00.042Z [INFO] [scheduler] skipping startup scrape — relying on cron existingRows=8742
|
||||
2026-04-23T14:00:00.045Z [INFO] [startup] listening url=http://localhost:3001
|
||||
2026-04-23T14:00:00.123Z [INFO] [http] GET /api/calendar/week status=200 ms=18
|
||||
2026-04-23T14:00:10.001Z [INFO] [scheduler.tier1] scraping today
|
||||
2026-04-23T14:05:00.001Z [INFO] [scheduler.tier5] sample run complete parksSampled=14 parksSkipped=10 samplesWritten=612 weatherDelayed=0 errors=0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user