docs: update README for web + backend architecture

Remove references to Playwright discovery, RCDB scraping, scraper container, and npm run scripts. Document the new two-container setup, tiered scheduling, backend API endpoints, and local dev workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-23 21:54:45 -04:00
parent c5c9f750a3
commit 4922dce8ac
1 changed files with 76 additions and 108 deletions
@@ -1,6 +1,6 @@
 # Thoosie Calendar

-A week-by-week calendar showing operating hours for all Six Flags Entertainment Group theme parks — including the former Cedar Fair parks. Data is scraped from the Six Flags internal API and stored locally in SQLite. Click any park to see its full month calendar and live ride status with current wait times.
+A week-by-week calendar showing operating hours for all Six Flags Entertainment Group theme parks — including the former Cedar Fair parks. Data is fetched from the Six Flags internal API via a backend service and stored in SQLite. Click any park to see its full month calendar and live ride status with current wait times.

 ## Parks

@@ -14,14 +14,26 @@ A week-by-week calendar showing operating hours for all Six Flags Entertainment
 | **Texas & South** | Over Texas, Fiesta Texas (TX), Frontier City (OK) |
 | **West & International** | Magic Mountain (CA), Discovery Kingdom (CA), Knott's Berry Farm (CA), California's Great America (CA), Mexico |

+## Architecture
+
+The app runs as two containers:
+
+| Container | Port | Purpose |
+|-----------|------|---------|
+| **web** | 3000 | Next.js frontend — pure presentation layer, fetches all data from the backend API |
+| **backend** | 3001 | Hono API server — owns the SQLite database, runs tiered cron scheduling, handles all external API calls |
+
+The frontend makes no direct database or external API calls. All data flows through the backend.
+
 ## Tech Stack

 - **Next.js 15** — App Router, Server Components, standalone output
 - **Tailwind CSS v4** — `@theme {}` CSS variables, no config file
- **SQLite** via `better-sqlite3` — persisted in `/app/data/parks.db`
- **Playwright** — one-time headless browser run to discover each park's internal API ID
- **Six Flags CloudFront API** — `https://d18car1k0ff81h.cloudfront.net/operating-hours/park/{id}?date=YYYYMM`
- **Queue-Times.com API** — live ride open/closed status and wait times, updated every 5 minutes
+- **Hono** — lightweight TypeScript API framework for the backend
+- **SQLite** via `better-sqlite3` — owned exclusively by the backend
+- **node-cron** — tiered scheduling (hourly → daily) for data freshness
+- **Six Flags CloudFront API** — park operating hours and ride schedules
+- **Queue-Times.com API** — live ride open/closed status and wait times

 ## Ride Status

@@ -29,14 +41,31 @@ The park detail page shows ride open/closed status using a two-tier approach:

 1. **Live data (Queue-Times.com)** — when a park is operating, ride status and wait times are fetched from the [Queue-Times.com API](https://queue-times.com/en-US/pages/api) and cached for 5 minutes. All 24 parks are mapped. Displays a **Live** badge with per-ride wait times.

-2. **Schedule fallback (Six Flags API)** — the Six Flags operating-hours API drops the current day from its response once a park opens. When Queue-Times data is unavailable, the app falls back to the nearest upcoming date from the Six Flags schedule API as an approximation.
+2. **Schedule fallback (Six Flags API)** — when Queue-Times data is unavailable, the app falls back to the nearest upcoming date from the Six Flags schedule API as an approximation.

 ### Roller Coaster Filter

-When live data is shown, a **Coasters only** toggle appears if roller coaster data has been populated for that park. Coaster lists are sourced from [RCDB](https://rcdb.com) and stored in `data/park-meta.json`. To populate them:
+When live data is shown, a **Coasters only** toggle filters to roller coasters. Coaster lists are hardcoded in `lib/coaster-data.ts`.

-1. Open `data/park-meta.json` and set `rcdb_id` for each park to the numeric RCDB park ID (visible in the URL: `https://rcdb.com/4529.htm` → `4529`).
-2. Run `npm run scrape` — coaster lists are fetched from RCDB and stored in the JSON file. They refresh automatically every 30 days on subsequent scrapes.
+## Data Refresh
+
+The backend runs a tiered scraping schedule via node-cron:
+
+| Tier | Schedule | Scope |
+|------|----------|-------|
+| 1 | Hourly (Mar–Dec) | Today's hours for all parks |
+| 2 | Every 6 hours | Current month for all parks |
+| 3 | Twice daily (3 AM, 3 PM) | Current + next month |
+| 4 | Daily at 3 AM | Full year (respects 72h staleness window) |
+
+Past dates are never overwritten. The hourly tier compares live data against the database before writing — unchanged data is skipped.
+
+A manual trigger is available via the backend API:
+
+```bash
+curl -X POST http://localhost:3001/api/scrape/trigger?scope=today
+# scope: today | month | upcoming | full | force
+```

 ---

@@ -45,29 +74,29 @@ When live data is shown, a **Coasters only** toggle appears if roller coaster da
 **Prerequisites:** Node.js 22+, npm

 ```bash
+# Install frontend dependencies
 npm install
-npx playwright install chromium
+
+# Install backend dependencies
+cd backend && npm install && cd ..
 ```

-### Seed the database
-
-Run once to discover each park's internal API ID (opens a headless browser per park):
+### Start the backend

 ```bash
-npm run discover
+cd backend
+npm run dev
 ```

-Scrape operating hours for the full year:
+The backend starts on port 3001, initializes the database, and begins the cron schedule. On first run it creates an empty database — the schedulers will populate it automatically, or trigger a manual scrape.
+
+### Start the frontend

 ```bash
-npm run scrape
+npm run dev
 ```

-Force a full re-scrape (ignores the staleness window):
-
-```bash
-npm run scrape:force
-```
+Open [http://localhost:3000](http://localhost:3000). Navigate weeks with the `←` / `→` buttons, or pass `?week=YYYY-MM-DD` directly. Click any park name to open its detail page.

 ### Debug a specific park + date

@@ -77,78 +106,38 @@ Inspect raw API data and parsed output for any park and date:
 npm run debug -- --park kingsisland --date 2026-06-15
 ```

-Output is printed to the terminal and saved to `debug/{parkId}_{date}.txt`.
-
 ### Run tests

 ```bash
 npm test
 ```

-### Run the dev server
-
-```bash
-npm run dev
-```
-
-Open [http://localhost:3000](http://localhost:3000). Navigate weeks with the `←` / `→` buttons, or pass `?week=YYYY-MM-DD` directly. Click any park name to open its detail page.
-
 ---

 ## Deployment

-The app ships as two separate Docker images that share a named volume for the SQLite database:
-
-| Image | Tag | Purpose |
-|-------|-----|---------|
-| Next.js web server | `:web` | Reads DB, serves content. No scraping tools. |
-| Scraper + scheduler | `:scraper` | Nightly data refresh. No web server. |
-
-Images are built and pushed automatically by CI on every push to `main`.
-
-### First-time setup
-
-**1. Pull the images**
-
-```bash
-docker pull gitea.thewrightserver.net/josh/sixflagssupercalendar:web
-docker pull gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper
-```
-
-**2. Discover park API IDs**
-
-This one-time step opens a headless browser for each park to find its internal Six Flags API ID. Run it against the scraper image so Playwright is available:
-
-```bash
-docker run --rm -v root_park_data:/app/data \
-  gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper \
-  npm run discover
-```
-
-**3. Set RCDB IDs for the coaster filter**
-
-Open `data/park-meta.json` in the Docker volume and set `rcdb_id` for each park to the numeric ID from the RCDB URL (e.g. `https://rcdb.com/4529.htm` → `4529`). You can curl it directly from the repo:
-
-```bash
-curl -o /var/lib/docker/volumes/root_park_data/_data/park-meta.json \
-  https://gitea.thewrightserver.net/josh/SixFlagsSuperCalendar/raw/branch/main/data/park-meta.json
-```
-
-**4. Run the initial scrape**
-
-```bash
-docker run --rm -v root_park_data:/app/data \
-  gitea.thewrightserver.net/josh/sixflagssupercalendar:scraper \
-  npm run scrape
-```
-
-**5. Start services**
+The app ships as two Docker images:

 ```bash
 docker compose up -d
 ```

-Both services start. The scraper runs nightly at 3 AM (container timezone, set via `TZ`).
+Images are built and pushed automatically by CI on every push to `main`.
+
+### Environment variables
+
+**web:**
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `BACKEND_URL` | `http://backend:3001` | Backend API base URL (Docker internal networking) |
+
+**backend:**
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `TZ` | `UTC` | Timezone for cron schedules (e.g. `America/New_York`) |
+| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is re-fetched |

 ### Updating

@@ -156,34 +145,13 @@ Both services start. The scraper runs nightly at 3 AM (container timezone, set v
 docker compose pull && docker compose up -d
 ```

-### Scraper environment variables
+### Backend API endpoints

-Set these in `docker-compose.yml` under the `scraper` service to override defaults:
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `TZ` | `UTC` | Timezone for the nightly 3 AM run (e.g. `America/New_York`) |
-| `PARK_HOURS_STALENESS_HOURS` | `72` | Hours before park schedule data is re-fetched |
-| `COASTER_STALENESS_HOURS` | `720` | Hours before RCDB coaster lists are re-fetched (720 = 30 days) |
-
-### Manual scrape
-
-To trigger a scrape outside the nightly schedule:
-
-```bash
-docker compose exec scraper npm run scrape
-```
-
-Force re-scrape of all data (ignores staleness):
-
-```bash
-docker compose exec scraper npm run scrape:force
-```
-
---
-
-## Data Refresh
-
-The scraper skips any park + month already scraped within the staleness window (`PARK_HOURS_STALENESS_HOURS`, default 72h). Past dates are never overwritten — once a day occurs, the API stops returning data for it, so the record written when it was a future date is preserved forever. The nightly scraper handles refresh automatically.
-
-Roller coaster lists (from RCDB) are refreshed per `COASTER_STALENESS_HOURS` (default 720h = 30 days) for parks with a configured `rcdb_id`.
+| Endpoint | Description |
+|----------|-------------|
+| `GET /api/calendar/week?start=YYYY-MM-DD` | Week calendar for all parks |
+| `GET /api/calendar/:parkId/month?month=YYYY-MM` | Month calendar for one park |
+| `GET /api/parks/:id/rides` | Live rides or schedule fallback |
+| `GET /api/parks` | Park list with metadata |
+| `GET /api/status` | Health check, scrape timestamps, DB stats |
+| `POST /api/scrape/trigger?scope=...` | Manual scrape trigger |