@diosmosis opened this Issue on November 6th 2019 Member

Now that we invalidate archives when a visit is tracked, we have a couple more opportunities for refactoring:

  1. during invalidation we can create a new archive w/ done flag value = DONE_INVALIDATED if the archives for the dates do not exist
  2. in CronArchive, instead of pulling individual sites and checking if there have been visits for those sites, just pull individual invalidated archives, set the done value to DONE_IN_PROGRESS, and initiate archiving for that one archive.

This should allow splitting archiving of large websites across separate processes and simplify the code quite a bit.

Some things to keep in mind:

  • by default we would look at the table for today's date and for everything in the invalidated sites list instead of iterating over every table, though there could be an option to do that.
  • we should not even start archiving if raw data has been deleted for an invalidated archive.
@tsteur commented on November 7th 2019 Member
Powered by GitHub Issue Mirror