josh/SixFlagsSuperCalendar

Fork 0

Files

T

josh f87462385c

Build and Deploy / Lint, typecheck, test (push) Successful in 34s

Details

Build and Deploy / Build & Push (push) Successful in 1m6s

Details

docs: sync README and docs/ with current codebase

Surfaces features that landed after the last big docs pass: per-ride
history pages, Fast Lane wait times, outage shading on the today chart,
Tier-5 wait-time sampler, production-hardening pieces (rate limiter,
structured logger, env validation, graceful shutdown), and the new
rides + ride_wait_samples tables. Also corrects the weather-delay rule
to match the "open" vs "closing" gate now in rides.ts.

2026-06-02 15:31:50 -04:00

20 KiB

Raw Permalink Blame History

Operations

See also: Architecture | API Reference | Development

Deployment Overview

The application runs as two Docker containers:

Container	Port	Role
`web`	3000	Next.js frontend (stateless, no database)
`backend`	3001	Hono API server (owns SQLite database, runs cron scheduler)

Infrastructure requirements:

Docker host with Docker Compose
Container registry (Gitea, Docker Hub, or any OCI-compatible registry)
Outbound HTTPS access to d18car1k0ff81h.cloudfront.net (Six Flags API) and queue-times.com (live ride data)
A reverse proxy (Traefik, nginx, Caddy, etc.) is expected to sit in front for TLS termination and domain routing, but is not included in this repository

See Architecture for detailed system design.

Docker Images

Multi-Stage Build

The project uses a single Dockerfile with four stages producing two final images:

  builder           backend-deps
  (Next.js build)   (native modules)
      |                   |
      v                   v
    web               backend
  (final)             (final)

Stage	Base	Purpose
`builder`	`node:22-bookworm-slim`	`npm ci` + `npm run build` -- produces Next.js standalone output
`backend-deps`	`node:22-bookworm-slim`	Installs `python3`, `make`, `g++` for `better-sqlite3` native compilation, then `npm ci`
`web` (final)	`node:22-bookworm-slim`	Copies standalone output from `builder`. Non-root user. ~150MB.
`backend` (final)	`node:22-bookworm-slim`	Copies `node_modules` from `backend-deps` + source code. Volume for SQLite. Non-root user. ~200MB.

Image Tags

{registry}/{owner}/thoosiecalendar:web
{registry}/{owner}/thoosiecalendar:backend

Building Locally

# Build web image
docker build --target web -t thoosiecalendar:web .

# Build backend image
docker build --target backend -t thoosiecalendar:backend .

Docker Compose

The production docker-compose.yml:

services:
  web:
    image: gitea.thewrightserver.net/josh/thoosiecalendar:web
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - BACKEND_URL=http://backend:3001    # Docker internal networking
      - TZ=America/New_York
    restart: unless-stopped

  backend:
    image: gitea.thewrightserver.net/josh/thoosiecalendar:backend
    ports:
      - "3001:3001"
    volumes:
      - park_data:/app/backend/data         # SQLite database persistence
    environment:
      - NODE_ENV=production
      - TZ=America/New_York                # Timezone for cron schedules
      - PARK_HOURS_STALENESS_HOURS=72      # Hours before re-fetching park data
    restart: unless-stopped

volumes:
  park_data:                               # Named volume for database files

Networking: The web container reaches the backend via Docker's internal DNS at http://backend:3001. The backend port is also exposed to the host for manual API access during troubleshooting.

Environment Variables

Web Container

Variable	Default	Description
`BACKEND_URL`	`http://localhost:3001`	Backend API base URL. Set to `http://backend:3001` in Docker Compose for internal networking.
`NODE_ENV`	--	Set to `production` in Docker.
`NEXT_TELEMETRY_DISABLED`	`1`	Disables Next.js telemetry (set in Dockerfile).
`PORT`	`3000`	Server listen port (set in Dockerfile).
`HOSTNAME`	`0.0.0.0`	Bind address (set in Dockerfile to allow external access).

Backend Container

Variable	Default	Description
`TZ`	`UTC`	Process timezone. Controls when cron jobs fire. Set to `America/New_York` in production so schedules align with US Eastern parks.
`PARK_HOURS_STALENESS_HOURS`	`72`	Hours before park schedule data is considered stale and re-fetched. Lower values increase API load; higher values increase data lag.
`RATE_LIMIT_PER_MIN`	`60`	Per-IP request limit for the public API. Over-limit requests return `429 Too Many Requests` with a `Retry-After` header. Enforced by `backend/src/middleware/rate-limit.ts`. Behind a proxy, ensure `x-forwarded-for` is set or every client looks like the proxy IP.
`NODE_ENV`	--	Set to `production` in Docker.
`PORT`	`3001`	Server listen port.

backend/src/config.ts parses and validates these at startup. A bad value (e.g. PORT=foo) fails fast with a thrown Error rather than surfacing in a request handler later.

CI/CD Pipeline

Gitea Actions Workflow

File: .gitea/workflows/deploy.yml

Trigger: Push to main branch.

Steps:

Checkout code (actions/checkout@v4)
Log in to Gitea container registry
Build and push web image (docker/build-push-action@v6, target: web)
Build and push backend image (docker/build-push-action@v6, target: backend)

Required Configuration

Type	Name	Description
Variable	`REGISTRY`	Container registry URL (e.g. `gitea.thewrightserver.net`)
Secret	`REGISTRY_TOKEN`	Authentication token for the registry

These are configured in the Gitea repository settings under Settings > Actions > Secrets and Settings > Actions > Variables.

Setting Up CI/CD from Scratch

Create a Gitea repository
Add REGISTRY as a repository variable (Settings > Actions > Variables)
Add REGISTRY_TOKEN as a repository secret (Settings > Actions > Secrets)
Push to main -- the workflow triggers automatically
Pull images on your Docker host: docker compose pull && docker compose up -d

Initial Deployment Checklist

Create a docker-compose.yml on your Docker host (see the Docker Compose section above, or use the one from the repository).

Pull and start the containers:

docker compose pull
docker compose up -d

Verify the backend started:

docker compose logs backend
# Look for (structured log lines, see the Log Reference section):
#   [INFO] [startup] database initialized
#   [INFO] [scheduler] cron jobs registered ...
#   [INFO] [startup] listening url=http://localhost:3001

Check database status (will be empty on first run):

curl http://localhost:3001/api/status
# { "status": "ok", "database": { "totalDays": 0, ... } }

Trigger the initial data scrape:
```
curl -X POST http://localhost:3001/api/scrape/trigger?scope=full
```
This scrapes all 12 months for all 24 parks with a 1-second delay between parks. Expected duration: 5-10 minutes.

Verify data was scraped:

curl http://localhost:3001/api/status
# totalDays should be ~8000-9000

Open the web UI: Navigate to http://your-host:3000.

The cron scheduler starts automatically and will keep data fresh going forward.

Updating

docker compose pull && docker compose up -d

The SQLite database lives in a named Docker volume (park_data), so it persists across container recreations.
Schema migrations are applied automatically on backend startup. New columns are added via ALTER TABLE ... ADD COLUMN wrapped in try/catch -- if the column already exists, the error is silently caught.
No manual migration steps are needed.

Backup and Restore

What to Back Up

The SQLite database at /app/backend/data/parks.db inside the park_data Docker volume. WAL journal files (parks.db-wal and parks.db-shm) must be included for a consistent backup.

Backup Methods

Method 1: Copy from the container

docker compose cp backend:/app/backend/data/parks.db ./backup/parks.db
docker compose cp backend:/app/backend/data/parks.db-wal ./backup/parks.db-wal 2>/dev/null
docker compose cp backend:/app/backend/data/parks.db-shm ./backup/parks.db-shm 2>/dev/null

Method 2: Mount the volume to the host Add a bind mount in docker-compose.yml:

volumes:
  - ./data:/app/backend/data

Restore

Stop the backend: docker compose stop backend
Replace the database files in the volume
Restart: docker compose start backend

Note on Reproducibility

All data is sourced from external APIs and is fully reproducible. If the database is lost, simply restart the backend (which auto-creates an empty database) and trigger a full scrape:

curl -X POST http://localhost:3001/api/scrape/trigger?scope=full

Backups are recommended for continuity (avoiding the 5-10 minute re-scrape window) but are not critical.

Scheduler Operations

Tiered Cron Schedule

The backend runs five scraping tiers via node-cron:

Tier	Cron Expression	Schedule	Scope	Delay
1	`0 * * 3-12 *`	Hourly, March through December	Today's hours for all parks	500ms
2	`0 /6 * *`	Every 6 hours	Current month for all parks	1000ms
3	`0 3,15 * * *`	3 AM and 3 PM	Current + next month	1000ms
4	`0 3 * * *`	Daily at 3 AM	Full year (all 12 months)	1000ms
5	`/5 * * *`	Every 5 minutes	Wait-time samples for currently-open parks into `ride_wait_samples`	parallel chunks of 6

Staleness: Tiers 2-4 skip any park-month that was scraped within PARK_HOURS_STALENESS_HOURS (default 72h). Tier 1 always fetches (uses diff-before-write instead). Tier 5 only samples parks whose park_days row marks them open today and whose current local time is inside the operating window (with a 1-hour closing buffer).

Off-season: Tier 1 only runs from March through December. The month constraint 3-12 in the cron expression skips January and February when most parks are closed. Tier 5 runs year-round but is effectively a no-op when no parks are open.

Concurrency latches: Every tier is wrapped in withLatch() (see backend/src/services/scheduler.ts). If a tick is still running when the next would fire, the new tick is skipped and logged with a previous run still in progress warning rather than stacking. Each tier has its own latch so a slow Tier-4 doesn't block Tier-5's 5-minute cadence.

Weather-delayed parks skipped from sampling: Tier 5 detects the "rides exist but all closed during scheduled hours" case and skips writes for that park, so a storm doesn't poison the uptime statistics with hours of is_open=0 samples.

Startup Behavior

On boot, the scheduler checks getParkDayCount() against a threshold of 50 rows:

Empty / nearly-empty database (< 50 rows): runs scrapeToday() followed by scrapeFullYear() in sequence. Logs [scheduler.startup] lines for each phase.
Populated database (≥ 50 rows): skips the startup scrape and relies on cron tiers. Logs skipping startup scrape — relying on cron.

This replaces the earlier behavior of full-scraping on every container start, which doubled outbound API load and delayed readiness on every deploy.

Timezone Sensitivity

Cron expressions execute in the process timezone, controlled by the TZ environment variable. In production this is set to America/New_York so that "3 AM" aligns with US Eastern time.

The per-park timezone (e.g. America/Los_Angeles for Magic Mountain) is used separately for operating window detection -- it does not affect cron schedule timing.

The 3 AM Switchover

getTodayLocal() in lib/env.ts implements a 3 AM local-time switchover: before 3 AM, the system considers it "yesterday." This prevents the calendar from flipping to the next day at midnight while park visitors are still out. The switchover uses the server's local time (influenced by TZ), not individual park timezones.

Manual Scraping

Trigger a scrape at any time via the backend API:

curl -X POST http://localhost:3001/api/scrape/trigger?scope=<scope>

Scope Options

Scope	Behavior	Duration
`today`	Fetches today's hours for all 24 parks. Diffs against database before writing. 500ms delay.	~15s
`month`	Current month for all parks. Respects staleness window. 1000ms delay.	~30s
`upcoming`	Current + next month. Respects staleness.	~1min
`full`	All 12 months. Respects staleness.	~5-10min
`force`	All 12 months. Ignores staleness -- forces re-fetch of everything.	~5-10min

Response

{
    "scope": "today",
    "fetched": 24,
    "skipped": 0,
    "errors": 0,
    "updated": 3,
    "startedAt": "2026-04-23T14:00:00.000Z",
    "finishedAt": "2026-04-23T14:00:12.000Z"
}

Typical Use Cases

After initial deployment: scope=full to populate the database
After an extended outage: scope=force to refresh all data regardless of staleness
Investigating a specific park: scope=today to get fresh data quickly
Before peak season: scope=full to ensure complete coverage

Health Monitoring

Health Endpoint

curl http://localhost:3001/api/status

Response

{
    "status": "ok",
    "uptime": 86400,
    "parks": 24,
    "database": {
        "totalDays": 8760,
        "lastScrape": "2026-04-23T14:00:12.000Z"
    },
    "lastScrapeResult": {
        "scope": "today",
        "fetched": 24,
        "skipped": 0,
        "errors": 0,
        "updated": 3,
        "startedAt": "2026-04-23T14:00:00.000Z",
        "finishedAt": "2026-04-23T14:00:12.000Z"
    }
}

Key Metrics

Metric	Expected Value	Concern If
`status`	`"ok"`	Not `"ok"` (always `"ok"` currently, but confirms the endpoint is reachable)
`uptime`	Increasing	Drops to 0 (container restarted)
`database.totalDays`	8,000-9,000 (full year)	Much lower (scraping not running) or 0 (empty database)
`database.lastScrape`	Within the last hour (during operating season)	More than a few hours old (scheduler may be broken)
`lastScrapeResult.errors`	0	Consistently high (API may be blocking requests)

Suggested Alerting

Alert if database.lastScrape is more than 12 hours old during operating season (March-December)
Alert if lastScrapeResult.errors exceeds 5 on consecutive scrapes
Alert if the health endpoint is unreachable

Troubleshooting

No data showing in the calendar

Check if the backend is running:
```
docker compose logs backend --tail 50
```
Look for an [INFO] [startup] listening url=http://localhost:3001 line.
Check if the database has data:
```
curl http://localhost:3001/api/status | jq .database.totalDays
```
If 0, trigger a manual scrape: curl -X POST http://localhost:3001/api/scrape/trigger?scope=full
Check the BACKEND_URL in the web container:
```
docker compose exec web env | grep BACKEND_URL
```
Should be http://backend:3001 (not localhost, which won't resolve inside Docker).

Ride counts not appearing on the home page

Ride counts only appear for parks that are currently within their operating window, as determined by isWithinOperatingWindow(). Outside of park hours, no rides are shown.
Queue-Times data is cached for 5 minutes. Recent park openings may take up to 5 minutes to appear.
Weather delay (blue badge) means the park is within its hours but all rides report closed -- this is expected during weather-related closures.
Verify the park has a Queue-Times mapping in lib/queue-times-map.ts.

Stale data / not updating

Check scheduler logs:
```
docker compose logs backend | grep scheduler
```
You should see periodic [scheduler] tier-X: scraping... messages.
Verify timezone:
```
docker compose exec backend date
```
Should match the TZ environment variable (America/New_York).
Check staleness threshold: Data within PARK_HOURS_STALENESS_HOURS (default 72h) is skipped by tiers 2-4. If you recently changed park data manually, it may not be re-fetched until the staleness window expires.

Force a refresh:

curl -X POST http://localhost:3001/api/scrape/trigger?scope=force

Rate limited by Six Flags API

Look for [rate-limited] messages in the backend logs:

docker compose logs backend | grep rate-limited

The client uses exponential backoff: 30s, 60s, 120s, then throws a RateLimitError and moves to the next park.
If rate limiting is persistent, increase PARK_HOURS_STALENESS_HOURS to reduce scrape frequency (e.g. 96 or 120).
The inter-park delay is hardcoded at 1000ms (500ms for the today tier) in backend/src/services/scraper.ts.

Wrong timezone / incorrect dates

getTodayLocal() uses the server's local time (set by TZ env var) with a 3 AM cutover. Before 3 AM, the system considers it "yesterday."
Each park has its own IANA timezone (stored in lib/parks.ts) used for operating window checks. The TZ env var only affects cron schedule timing and the "today" determination.

If dates seem off, check both TZ and the server's system clock:

docker compose exec backend date
docker compose exec backend node -e "console.log(new Date().toISOString())"

Database corruption

If the database becomes corrupted (unlikely with SQLite WAL mode, but possible after a hard crash):

Stop the backend: docker compose stop backend

Delete the database files from the volume:

docker compose run --rm backend rm -f /app/backend/data/parks.db /app/backend/data/parks.db-wal /app/backend/data/parks.db-shm

Restart: docker compose start backend (auto-creates empty database)
Re-scrape: curl -X POST http://localhost:3001/api/scrape/trigger?scope=full

Log Reference

The backend uses a small structured logger (backend/src/log.ts). Every line has the format:

<ISO timestamp> [<LEVEL>] [<tag>] <message> key1=value1 key2=value2 …

Levels are INFO, WARN, ERROR. ERROR writes to stderr; the others write to stdout. Grep-friendly: filter by tag (grep '\[scheduler.tier1\]') or by key (grep 'park=cedarpoint').

Tag	Source	Meaning
`startup`	`index.ts`	Config loaded, DB initialized, server listening
`shutdown`	`index.ts`	`SIGTERM`/`SIGINT` received; graceful shutdown progress
`http`	`index.ts`	One line per request: `method`, `path`, `status`, `ms`
`scheduler`	`scheduler.ts`	Cron job registration summary on boot
`scheduler.tier1` … `scheduler.tier5`	`scheduler.ts`	Each tier's tick; includes skip-due-to-latch warnings
`scheduler.startup`	`scheduler.ts`	Result of the "database empty" startup scrape
`today` / `month`	`scraper.ts`	Per-park / per-month scrape results
`wait-sampler`	`wait-sampler.ts`	Tier-5 per-park sample writes, errors, weather-delay skips
`rate-limit`	`middleware/rate-limit.ts`	`blocked` event with `ip`, `count`, `retryAfter`
`rides`	`routes/rides.ts`	Per-request warnings when upstream calls fail
`rate-limited`	`lib/scrapers/sixflags.ts`	HTTP 429/503 from Six Flags with backoff timing

Example log output:

2026-04-23T14:00:00.012Z [INFO] [startup] config loaded port=3001 nodeEnv=production parkHoursStalenessHours=72 rateLimitPerMin=60
2026-04-23T14:00:00.034Z [INFO] [startup] database initialized
2026-04-23T14:00:00.041Z [INFO] [scheduler] cron jobs registered tiers="tier1=hourly(Mar-Dec) tier2=6h tier3=3am+3pm tier4=3am-daily tier5=5min"
2026-04-23T14:00:00.042Z [INFO] [scheduler] skipping startup scrape — relying on cron existingRows=8742
2026-04-23T14:00:00.045Z [INFO] [startup] listening url=http://localhost:3001
2026-04-23T14:00:00.123Z [INFO] [http] GET /api/calendar/week status=200 ms=18
2026-04-23T14:00:10.001Z [INFO] [scheduler.tier1] scraping today
2026-04-23T14:05:00.001Z [INFO] [scheduler.tier5] sample run complete parksSampled=14 parksSkipped=10 samplesWritten=612 weatherDelayed=0 errors=0

Performance Tuning

Aspect	Current Setting	Notes
SQLite WAL mode	Enabled	Allows concurrent reads during writes. No configuration needed.
In-memory cache	TtlCache (5 min TTL)	Bounded by park count -- at most ~72 entries (24 parks x 3 caches). Memory impact is negligible.
Staleness window	72 hours	Controls how often park data is re-fetched from the API. Lower values = fresher data but more API calls and higher rate-limit risk.
Inter-park delay	1000ms / 500ms	Hardcoded in `scraper.ts`. Provides respectful pacing against the Six Flags API.
ISR revalidation	60-300s per route	Controlled in Next.js fetch calls. Lower values = fresher pages but more backend requests.
Next.js standalone	Enabled	Produces a minimal server bundle without unused dependencies.

20 KiB Raw Permalink Blame History

Operations

Deployment Overview

Docker Images

Multi-Stage Build

Image Tags

Building Locally

Docker Compose

Environment Variables

Web Container

Backend Container

CI/CD Pipeline

Gitea Actions Workflow

Required Configuration

Setting Up CI/CD from Scratch

Initial Deployment Checklist

Updating

Backup and Restore

What to Back Up

Backup Methods

Restore

Note on Reproducibility

Scheduler Operations

Tiered Cron Schedule

Startup Behavior

Timezone Sensitivity

The 3 AM Switchover

Manual Scraping

Scope Options

Response

Typical Use Cases

Health Monitoring

Health Endpoint

Response

Key Metrics

Suggested Alerting

Troubleshooting

No data showing in the calendar

Ride counts not appearing on the home page

Stale data / not updating

Rate limited by Six Flags API

Wrong timezone / incorrect dates

Database corruption

Log Reference

Performance Tuning

20 KiB

Raw Permalink Blame History