@tsteur opened this Issue on August 28th 2019 Member

Weekly we have a cronjob to delete orphaned segments in CoreAdminHome\Tasks. Meaning we delete all segment archives of segments that no longer exist.

This behaviour currently works like this:

  • Get a list of all segments
  • Iterate over each archive table
    • Get the idArchives of archives whose done flag is not in the list of existing segments
    • Delete the idArchives

Now that this command ran a few times and we have cleaned up all segment archives, and now that we implement a proper soft delete in https://github.com/matomo-org/matomo/issues/14826 we could make this behaviour more efficient:

  • Get all segments where deleted = 1 and ts_last_edit >= 1 week ago (assuming we run the task weekly). These are the segments that were deleted last week
  • Iterate over all archive tables
    • Get the idarchives of segments having the done flag for the deleted segment
    • Delete these idarchives from numeric and blob table

I was thinking we would only need to loop over archive tables since ts_created but we need to loop likely over all archive tables since some users will have enabled to automatically archive segments for all historical data which is also the default process_new_segments_from = "beginning_of_time"

This way things will be faster as we only execute logic if a segment was deleted

Powered by GitHub Issue Mirror