@quba opened this Issue on October 2nd 2014 Contributor

Example use case:

  • archiving completes without any errors,
  • there's missing data for one of periods (e.g. week) ore for one of 100 defined segments.

It's only one case, but that is something that happens really often on production servers and is random. I know that the main goal should be to improve Piwik platform to fix such bugs, however now it's not possible (I don't know how to reproduce issue with randomly vanishing archives).

Proposed solution:
A scheduled task triggered after archiving would do following:

  • iterate over recently archived periods (might be cached during archiving probably to prevent overkill with checking ?)
  • for each done flag it should count number of numeric values and blob
  • blob count should be counted only on 0-level (as we cannot predict how many subtables can there be for each period)
  • we should be able to calculate number of expected blobs and numeric (maybe based on active plugins list or just take maximum number for given period, because it's not likely that ALL sites have broken archives. also always at least one site should have complete metrics list)
  • if number of numerics is smaller than expected, then such idarchive is suspected to be incomplete
  • there should be number of allowed discrepancies - i.e. no ecommerce report if site isn't e-commerce enabled, might have 0 in nb_visits (then such archive is ok to be empty) etc. Especially this task should be aware of some values not being inserted into archives (i.e. empty datatable,s zeroes in numerics).
  • after iterating over all recently archived ids, report should be logged informing which idsites, periods, segment names, dates seem to be invalid
  • also it would be nice to maybe log some values that would allow to easily invalidate such archives along with piwik_option timestamps, so next archiving would reprocess those?

refs #5805

@mattab commented on October 23rd 2014 Member

Thanks for the suggestion!

Such a scheduled task would be very useful to ensure high data availability in Piwik even when edge cases bugs may be present.

in Piwik 2.8.3 there are reasons to hope that there will be less cases of "missing data".
(that's why I didn't set Major label until the bug is confirmed to occur again quite regularly)

@mattab commented on October 31st 2014 Member

Moving this out of 2.9.0 until we have confirmation that this bug still occurs in 2.8.3 - please comment if this bug is reproduced in 2.8.3

@mattab commented on July 18th 2016 Member

Since the issue was created we have released some better tools such as the core:purge-old-archive-data console command in #7377 #7181 - which seems to have solve the problem for users :+1:

This Issue was closed on July 18th 2016
Powered by GitHub Issue Mirror