docs: rewrite deployment section for two-image setup
All checks were successful
Build and Deploy / Build & Push (push) Successful in 52s
All checks were successful
Build and Deploy / Build & Push (push) Successful in 52s
- Document web/:web and scraper/:scraper image split - Full first-time setup flow: pull → discover → RCDB IDs → scrape → up - Document all scraper env vars (TZ, PARK_HOURS_STALENESS_HOURS, COASTER_STALENESS_HOURS) - Add manual scrape commands, update workflow, add npm test to dev section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
90
README.md
90
README.md
@@ -63,7 +63,7 @@ Scrape operating hours for the full year:
|
|||||||
npm run scrape
|
npm run scrape
|
||||||
```
|
```
|
||||||
|
|
||||||
Force a full re-scrape (ignores the 7-day staleness window):
|
Force a full re-scrape (ignores the staleness window):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npm run scrape:force
|
npm run scrape:force
|
||||||
@@ -79,6 +79,12 @@ npm run debug -- --park kingsisland --date 2026-06-15
|
|||||||
|
|
||||||
Output is printed to the terminal and saved to `debug/{parkId}_{date}.txt`.
|
Output is printed to the terminal and saved to `debug/{parkId}_{date}.txt`.
|
||||||
|
|
||||||
|
### Run tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm test
|
||||||
|
```
|
||||||
|
|
||||||
### Run the dev server
|
### Run the dev server
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -91,33 +97,93 @@ Open [http://localhost:3000](http://localhost:3000). Navigate weeks with the `
|
|||||||
|
|
||||||
## Deployment
|
## Deployment
|
||||||
|
|
||||||
The app uses Next.js standalone output. The SQLite database is stored in a Docker volume at `/app/data`.
|
The app ships as two separate Docker images that share a named volume for the SQLite database:
|
||||||
|
|
||||||
|
| Image | Tag | Purpose |
|
||||||
|
|-------|-----|---------|
|
||||||
|
| Next.js web server | `:web` | Reads DB, serves content. No scraping tools. |
|
||||||
|
| Scraper + scheduler | `:scraper` | Nightly data refresh. No web server. |
|
||||||
|
|
||||||
|
Images are built and pushed automatically by CI on every push to `main`.
|
||||||
|
|
||||||
|
### First-time setup
|
||||||
|
|
||||||
|
**1. Pull the images**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker pull gitea.thewrightserver.net/josh/sixflagssupercalendar:web
|
||||||
|
docker pull gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper
|
||||||
|
```
|
||||||
|
|
||||||
|
**2. Discover park API IDs**
|
||||||
|
|
||||||
|
This one-time step opens a headless browser for each park to find its internal Six Flags API ID. Run it against the scraper image so Playwright is available:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm -v thoosie_park_data:/app/data \
|
||||||
|
gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper \
|
||||||
|
npm run discover
|
||||||
|
```
|
||||||
|
|
||||||
|
**3. Set RCDB IDs for the coaster filter**
|
||||||
|
|
||||||
|
Open `data/park-meta.json` in the Docker volume and set `rcdb_id` for each park to the numeric ID from the RCDB URL (e.g. `https://rcdb.com/4529.htm` → `4529`). You can curl it directly from the repo:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -o /var/lib/docker/volumes/thoosie_park_data/_data/park-meta.json \
|
||||||
|
https://gitea.thewrightserver.net/josh/SixFlagsSuperCalendar/raw/branch/main/data/park-meta.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**4. Run the initial scrape**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm -v thoosie_park_data:/app/data \
|
||||||
|
gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper \
|
||||||
|
npm run scrape
|
||||||
|
```
|
||||||
|
|
||||||
|
**5. Start services**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
### Seed the database inside the container
|
Both services start. The scraper runs nightly at 3 AM (container timezone, set via `TZ`).
|
||||||
|
|
||||||
The production image includes Playwright and Chromium, so discovery and scraping run directly against the container's volume:
|
### Updating
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker compose exec web npm run discover
|
docker compose pull && docker compose up -d
|
||||||
docker compose exec web npm run scrape
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Or as a one-off against the named volume:
|
### Scraper environment variables
|
||||||
|
|
||||||
|
Set these in `docker-compose.yml` under the `scraper` service to override defaults:
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `TZ` | `UTC` | Timezone for the nightly 3 AM run (e.g. `America/New_York`) |
|
||||||
|
| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is re-fetched |
|
||||||
|
| `COASTER_STALENESS_HOURS` | `720` | Hours before RCDB coaster lists are re-fetched (720 = 30 days) |
|
||||||
|
|
||||||
|
### Manual scrape
|
||||||
|
|
||||||
|
To trigger a scrape outside the nightly schedule:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run --rm -v sixflagssupercalendar_park_data:/app/data \
|
docker compose exec scraper npm run scrape
|
||||||
gitea.thewrightserver.net/josh/sixflagssupercalendar:latest \
|
```
|
||||||
npm run scrape
|
|
||||||
|
Force re-scrape of all data (ignores staleness):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose exec scraper npm run scrape:force
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Data Refresh
|
## Data Refresh
|
||||||
|
|
||||||
The scraper skips any park + month already scraped within the last 72 hours. The nightly Docker scraper service handles this automatically. Parks or months not yet in the database show a `—` placeholder; parks with no open days in the displayed week are hidden from the calendar automatically.
|
The scraper skips any park + month already scraped within the staleness window (`PARK_HOURS_STALENESS_HOURS`, default 72h). Past dates are never overwritten — once a day occurs, the API stops returning data for it, so the record written when it was a future date is preserved forever. The nightly scraper handles refresh automatically.
|
||||||
|
|
||||||
Roller coaster lists (from RCDB) are refreshed every 30 days on each `npm run scrape` run, for parks with a configured `rcdb_id`.
|
Roller coaster lists (from RCDB) are refreshed per `COASTER_STALENESS_HOURS` (default 720h = 30 days) for parks with a configured `rcdb_id`.
|
||||||
|
|||||||
Reference in New Issue
Block a user