Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a task that validates archiving process #6370

Closed
quba opened this issue Oct 2, 2014 · 3 comments
Closed

Add a task that validates archiving process #6370

quba opened this issue Oct 2, 2014 · 3 comments
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@quba
Copy link
Contributor

quba commented Oct 2, 2014

Example use case:

  • archiving completes without any errors,
  • there's missing data for one of periods (e.g. week) ore for one of 100 defined segments.

It's only one case, but that is something that happens really often on production servers and is random. I know that the main goal should be to improve Piwik platform to fix such bugs, however now it's not possible (I don't know how to reproduce issue with randomly vanishing archives).

Proposed solution:
A scheduled task triggered after archiving would do following:

  • iterate over recently archived periods (might be cached during archiving probably to prevent overkill with checking ?)
  • for each done flag it should count number of numeric values and blob
  • blob count should be counted only on 0-level (as we cannot predict how many subtables can there be for each period)
  • we should be able to calculate number of expected blobs and numeric (maybe based on active plugins list or just take maximum number for given period, because it's not likely that ALL sites have broken archives. also always at least one site should have complete metrics list)
  • if number of numerics is smaller than expected, then such idarchive is suspected to be incomplete
  • there should be number of allowed discrepancies - i.e. no ecommerce report if site isn't e-commerce enabled, might have 0 in nb_visits (then such archive is ok to be empty) etc. Especially this task should be aware of some values not being inserted into archives (i.e. empty datatable,s zeroes in numerics).
  • after iterating over all recently archived ids, report should be logged informing which idsites, periods, segment names, dates seem to be invalid
  • also it would be nice to maybe log some values that would allow to easily invalidate such archives along with piwik_option timestamps, so next archiving would reprocess those?

refs #5805

@mattab mattab added this to the Piwik 2.9.0 milestone Oct 23, 2014
@mattab mattab added the Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. label Oct 23, 2014
@mattab
Copy link
Member

mattab commented Oct 23, 2014

Thanks for the suggestion!

Such a scheduled task would be very useful to ensure high data availability in Piwik even when edge cases bugs may be present.

in Piwik 2.8.3 there are reasons to hope that there will be less cases of "missing data".
(that's why I didn't set Major label until the bug is confirmed to occur again quite regularly)

@mattab
Copy link
Member

mattab commented Oct 31, 2014

Moving this out of 2.9.0 until we have confirmation that this bug still occurs in 2.8.3 - please comment if this bug is reproduced in 2.8.3

@mattab mattab modified the milestones: Short term, Piwik 2.9.0 Oct 31, 2014
@mattab mattab modified the milestones: Short term, Mid term Apr 7, 2015
@mattab
Copy link
Member

mattab commented Jul 18, 2016

Since the issue was created we have released some better tools such as the core:purge-old-archive-data console command in #7377 #7181 - which seems to have solve the problem for users 👍

@mattab mattab closed this as completed Jul 18, 2016
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label Jul 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

2 participants