Example use case:
- archiving completes without any errors,
- there's missing data for one of periods (e.g. week) ore for one of 100 defined segments.
It's only one case, but that is something that happens really often on production servers and is random. I know that the main goal should be to improve Piwik platform to fix such bugs, however now it's not possible (I don't know how to reproduce issue with randomly vanishing archives).
A scheduled task triggered after archiving would do following:
- iterate over recently archived periods (might be cached during archiving probably to prevent overkill with checking ?)
- for each done flag it should count number of numeric values and blob
- blob count should be counted only on 0-level (as we cannot predict how many subtables can there be for each period)
- we should be able to calculate number of expected blobs and numeric (maybe based on active plugins list or just take maximum number for given period, because it's not likely that ALL sites have broken archives. also always at least one site should have complete metrics list)
- if number of numerics is smaller than expected, then such idarchive is suspected to be incomplete
- there should be number of allowed discrepancies - i.e. no ecommerce report if site isn't e-commerce enabled, might have 0 in nb_visits (then such archive is ok to be empty) etc. Especially this task should be aware of some values not being inserted into archives (i.e. empty datatable,s zeroes in numerics).
- after iterating over all recently archived ids, report should be logged informing which idsites, periods, segment names, dates seem to be invalid
- also it would be nice to maybe log some values that would allow to easily invalidate such archives along with piwik_option timestamps, so next archiving would reprocess those?