From eeed646203101b1257a86f5fd3cbf096da0e7a48 Mon Sep 17 00:00:00 2001 From: josh Date: Sat, 4 Apr 2026 16:48:22 -0400 Subject: [PATCH] docs: rewrite deployment section for two-image setup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Document web/:web and scraper/:scraper image split - Full first-time setup flow: pull → discover → RCDB IDs → scrape → up - Document all scraper env vars (TZ, PARK_HOURS_STALENESS_HOURS, COASTER_STALENESS_HOURS) - Add manual scrape commands, update workflow, add npm test to dev section Co-Authored-By: Claude Sonnet 4.6 --- README.md | 90 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 78 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index da9e9d1..9e4d2ee 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Scrape operating hours for the full year: npm run scrape ``` -Force a full re-scrape (ignores the 7-day staleness window): +Force a full re-scrape (ignores the staleness window): ```bash npm run scrape:force @@ -79,6 +79,12 @@ npm run debug -- --park kingsisland --date 2026-06-15 Output is printed to the terminal and saved to `debug/{parkId}_{date}.txt`. +### Run tests + +```bash +npm test +``` + ### Run the dev server ```bash @@ -91,33 +97,93 @@ Open [http://localhost:3000](http://localhost:3000). Navigate weeks with the ` ## Deployment -The app uses Next.js standalone output. The SQLite database is stored in a Docker volume at `/app/data`. +The app ships as two separate Docker images that share a named volume for the SQLite database: + +| Image | Tag | Purpose | +|-------|-----|---------| +| Next.js web server | `:web` | Reads DB, serves content. No scraping tools. | +| Scraper + scheduler | `:scraper` | Nightly data refresh. No web server. | + +Images are built and pushed automatically by CI on every push to `main`. + +### First-time setup + +**1. Pull the images** + +```bash +docker pull gitea.thewrightserver.net/josh/sixflagssupercalendar:web +docker pull gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper +``` + +**2. Discover park API IDs** + +This one-time step opens a headless browser for each park to find its internal Six Flags API ID. Run it against the scraper image so Playwright is available: + +```bash +docker run --rm -v thoosie_park_data:/app/data \ + gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper \ + npm run discover +``` + +**3. Set RCDB IDs for the coaster filter** + +Open `data/park-meta.json` in the Docker volume and set `rcdb_id` for each park to the numeric ID from the RCDB URL (e.g. `https://rcdb.com/4529.htm` → `4529`). You can curl it directly from the repo: + +```bash +curl -o /var/lib/docker/volumes/thoosie_park_data/_data/park-meta.json \ + https://gitea.thewrightserver.net/josh/SixFlagsSuperCalendar/raw/branch/main/data/park-meta.json +``` + +**4. Run the initial scrape** + +```bash +docker run --rm -v thoosie_park_data:/app/data \ + gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper \ + npm run scrape +``` + +**5. Start services** ```bash docker compose up -d ``` -### Seed the database inside the container +Both services start. The scraper runs nightly at 3 AM (container timezone, set via `TZ`). -The production image includes Playwright and Chromium, so discovery and scraping run directly against the container's volume: +### Updating ```bash -docker compose exec web npm run discover -docker compose exec web npm run scrape +docker compose pull && docker compose up -d ``` -Or as a one-off against the named volume: +### Scraper environment variables + +Set these in `docker-compose.yml` under the `scraper` service to override defaults: + +| Variable | Default | Description | +|----------|---------|-------------| +| `TZ` | `UTC` | Timezone for the nightly 3 AM run (e.g. `America/New_York`) | +| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is re-fetched | +| `COASTER_STALENESS_HOURS` | `720` | Hours before RCDB coaster lists are re-fetched (720 = 30 days) | + +### Manual scrape + +To trigger a scrape outside the nightly schedule: ```bash -docker run --rm -v sixflagssupercalendar_park_data:/app/data \ - gitea.thewrightserver.net/josh/sixflagssupercalendar:latest \ - npm run scrape +docker compose exec scraper npm run scrape +``` + +Force re-scrape of all data (ignores staleness): + +```bash +docker compose exec scraper npm run scrape:force ``` --- ## Data Refresh -The scraper skips any park + month already scraped within the last 72 hours. The nightly Docker scraper service handles this automatically. Parks or months not yet in the database show a `—` placeholder; parks with no open days in the displayed week are hidden from the calendar automatically. +The scraper skips any park + month already scraped within the staleness window (`PARK_HOURS_STALENESS_HOURS`, default 72h). Past dates are never overwritten — once a day occurs, the API stops returning data for it, so the record written when it was a future date is preserved forever. The nightly scraper handles refresh automatically. -Roller coaster lists (from RCDB) are refreshed every 30 days on each `npm run scrape` run, for parks with a configured `rcdb_id`. +Roller coaster lists (from RCDB) are refreshed per `COASTER_STALENESS_HOURS` (default 720h = 30 days) for parks with a configured `rcdb_id`.