Convert Scraper to proper backend #17

Open
opened 2026-04-05 16:57:41 -04:00 by josh · 1 comment
Owner

So that maybe we can dig ourselves out of this nice hole

Done. Here's what changed end-to-end:

fetchToday(apiId, revalidate?) — new function, hits the dateless endpoint, returns today's DayResult with a 5-minute ISR cache when called from server components

upsertDay — > → >= so today's row can be updated during the day (weather delays, early closures)

Scrape script — after the monthly loop, does a second pass fetching today fresh for every park and writing it to the DB

Homepage — fetches live today data for all parks in parallel (5-min ISR), merges it into the calendar data before computing open/closing/weather delay status. Combined with router.refresh() every 2 minutes, today's hours are always ≤5 minutes stale

Park detail page — reads live today from the API instead of the DB, so hours shown match Six Flags' current state (including weather delay banners)
So that maybe we can dig ourselves out of this nice hole ``` Done. Here's what changed end-to-end: fetchToday(apiId, revalidate?) — new function, hits the dateless endpoint, returns today's DayResult with a 5-minute ISR cache when called from server components upsertDay — > → >= so today's row can be updated during the day (weather delays, early closures) Scrape script — after the monthly loop, does a second pass fetching today fresh for every park and writing it to the DB Homepage — fetches live today data for all parks in parallel (5-min ISR), merges it into the calendar data before computing open/closing/weather delay status. Combined with router.refresh() every 2 minutes, today's hours are always ≤5 minutes stale Park detail page — reads live today from the API instead of the DB, so hours shown match Six Flags' current state (including weather delay banners) ```
Author
Owner

Transitioning Scraper to a Formal Backend

Current Design Issues:

  • Scraper has to run as a second container (not terrible because a backend would also be a second container), but just runs hacky npm commands
  • Scraper is very limited in how flexible it can be. With an actual backend, we could fetch today every hour, this week every day, and the next 30 days every 3 days and so on and so forth
  • Having to seed the database with discover isn't a huge flaw, but we need to move park-meta out of /data so that we can have it shipped with the container
  • With this method we could likely do a small comparison check when we are doing fetchToday and upsert against the database if there is a difference (an update)

Forseen Impact:

  • Will likely need to refactor a fair portion of the project. Will probably also need to rewrite a fair amount of code to properly separate the two. I figure we just make the backend an API and have the front end call to it
  • In order to simplify things we may need to rewrite the scraper functions. Currently we don't actually scrape anything for the Six Flags data, we are just utilizing the API. We are also just using the API for queue-times. The only thing we are scraping is the RCDB, but I want to reapproach this if possible
  • CI/CD flow will need to be rewritten

Questions Worth Asking:

  • Do we absolutely need the discover command? How are we getting the park APIs from Six Flags? Is it something the backend could do on it's own?

Wants But Not Needs:

  • I want to entirely move away from any NPM scripts. The biggest hangup is the discovery function. I would really like to rethink our approach for something that fits more neatly into a production workflow. Ideally I do not have to manually update or seed anything in the long run, but I understand that the coasters list might be tricky.
# Transitioning Scraper to a Formal Backend Current Design Issues: - Scraper has to run as a second container (not terrible because a backend would also be a second container), but just runs hacky npm commands - Scraper is very limited in how flexible it can be. With an actual backend, we could fetch today every hour, this week every day, and the next 30 days every 3 days and so on and so forth - Having to seed the database with discover isn't a huge flaw, but we need to move park-meta out of /data so that we can have it shipped with the container - With this method we could likely do a small comparison check when we are doing fetchToday and upsert against the database if there is a difference (an update) Forseen Impact: - Will likely need to refactor a fair portion of the project. Will probably also need to rewrite a fair amount of code to properly separate the two. I figure we just make the backend an API and have the front end call to it - In order to simplify things we may need to rewrite the scraper functions. Currently we don't actually scrape anything for the Six Flags data, we are just utilizing the API. We are also just using the API for queue-times. The only thing we are scraping is the RCDB, but I want to reapproach this if possible - CI/CD flow will need to be rewritten Questions Worth Asking: - Do we absolutely need the discover command? How are we getting the park APIs from Six Flags? Is it something the backend could do on it's own? Wants But Not Needs: - I want to entirely move away from any NPM scripts. The biggest hangup is the discovery function. I would really like to rethink our approach for something that fits more neatly into a production workflow. Ideally I do not have to manually update or seed anything in the long run, but I understand that the coasters list might be tricky.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: josh/SixFlagsSuperCalendar#17