Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all visits taken into account during cron archiving #11027

Closed
adaqus opened this issue Dec 16, 2016 · 4 comments
Closed

Not all visits taken into account during cron archiving #11027

adaqus opened this issue Dec 16, 2016 · 4 comments
Labels
answered For when a question was asked and we referred to forum or answered it. Bug For errors / faults / flaws / inconsistencies etc. Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users.

Comments

@adaqus
Copy link

adaqus commented Dec 16, 2016

This issue happens when following conditions are met:

  1. Website has 1-2 visits per day.
  2. Total number of websites to archive is big and hence archiving takes few hours.
  3. No option --force-all-websites or --force-idsites is provided to core:archive.

Expected behaviour: archiving of certain website should be triggered by any new visit since last archiving, regardles of time it took place.
Actual behaviour: when a visit takes place during archiving, it won't trigger website's preprocessing during next archiving.

This is problematic for small websites with few visits per day, since it causes lack of reports for some days.

Detailed description

Archiving works as follows:

  1. Websites to archive are fetched.
  2. For each website it is checked whether there are new visits since last full archiving finished successfuly.
  3. If there are new visits, archiving of website is performed.

But:

Let's say archiving started at 10:00. Website with id 1 is processed at the beginning of archiving process, there are no new visits. Archiving processes the rest of 5000 websites in 5.5 hours and last successful archiving time is saved (15:30). But meanwhile, at 14:00, another visit took place. Next archiving starts at 16:00. It checks whether since last successful archiving (15:30) there were new visits and it skips website 1 since there are no new visits. Visit at 14:00 is missed.

To avoid such situation, archiving could check for new visits since last successful archiving of given website for day period. There is still a chance that some visits may be missed, but not so big.

The fastest workaround for this is to use --force-all-websites option in core:archive.

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Dec 25, 2016
@mattab
Copy link
Member

mattab commented Dec 25, 2016

Thanks @adaqus for the detailed report.

For each website it is checked whether there are new visits since last full archiving finished successfuly.

wondering if you managed to locate this in the code?

I just took a (quick 5min) look and couldn't find this logic implemented. What I saw was that it would skip websites if they didn't have visit since midnight in the website's timezone, which seems correct. Also noticed the comment was incorrect so fixed it in #11079

Looking forward to hear more details from your investigation 👍

@mattab mattab added this to the 3.0.0 Backlog milestone Dec 25, 2016
@mattab mattab added the Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. label Dec 25, 2016
@adaqus
Copy link
Author

adaqus commented Jan 2, 2017

@mattab Yes, in CronArchive::hadWebsiteTrafficSinceMidnightInTimezone it is checked whether there were new visits since midnight or since last successful archiving (https://github.com/piwik/piwik/blob/3.x-dev/core/CronArchive.php#L1176).

Let's assume that we run core:archive few times a day (I should've mention about that). In such case seconds since last successful archiving will be taken into account (because such period is smaller than since midnight (https://github.com/piwik/piwik/blob/3.x-dev/core/CronArchive.php#L1177). So for if a website has 1-2 visits a day and they happen dufing archiving process, they won't trigger archiving for this website.

@adaqus
Copy link
Author

adaqus commented Jan 2, 2017

Also I'm not sure whether changes introduced in #11079 are good. They may be misleading since time since last archiving may be taken into account.

@mattab
Copy link
Member

mattab commented Jun 12, 2020

Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again!

@mattab mattab closed this as completed Jun 12, 2020
@mattab mattab added the answered For when a question was asked and we referred to forum or answered it. label Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it. Bug For errors / faults / flaws / inconsistencies etc. Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users.
Projects
None yet
Development

No branches or pull requests

2 participants