New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL query to find websites with traffic since last successful archiving can take 6+ hours #8571
Conversation
Follow up note:
|
Marked method as deprecated |
FYI: We're running the archiver on a big instance and the big query is definitely gone, the archiver starts immediately |
FYI: it breaks MetaSites because there's no raw data for them. |
Added more log output. Re meta sites I need to have a look how it was worked around there. In theory it should not change behaviour |
Think I found it already... |
To make sure we're compatible with MetaSites plugin I added https://github.com/PiwikPRO/plugin-MetaSites/pull/17 |
Can someone please have a look again? |
Let's test it tomorrow. |
Rebased :) just FYI |
@@ -21,6 +21,9 @@ This is a changelog for Piwik platform developers. All changes for our HTTP API' | |||
### Internal Change | |||
* The option `branch` of the console command `development:sync-system-test-processed` was removed as it is no longer needed. | |||
|
|||
### APIs | |||
* There is a new event `CronArchive.getIdSitesNotUsingTracker` that allows you to set a list of idSites that do not use the Tracker API to make sure we archive these sites if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that important, but I think CronArchive.getIdSitesToForceArchivingFor
would be a better name (since it is explicit about what the site IDs will be used for). getIdSitesNotUsingTracker
I think requires knowing how CronArchive works.
Also, I don't think this event is enough to force archiving for the sites, since if there are no visits archiving will be skipped for them (see: https://github.com/piwik/piwik/blob/master/core/ArchiveProcessor/Loader.php#L121). This is a problem for plugins that display 3rd party data, for example. Should be noted here and the event docs.
OR we could call this event in the core archiver to force archiving even if there are no visits for that site... though perhaps this clashes w/ MetaSites. In this case, a new event could be created. Anyway, this hypothetical new event isn't required for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it something like getIdSitesToForceArchivingFor
(before I squashed them) but found it's not the right name as forcing might be misleading etc. Eg when someone is using skip-sites
or force-sites
this site might be not archived and therefore not forced. Also it might be not forced if there are no visits see your link. I found the word force
too strong. I think from a 3rd party developer point of view it's maybe better understandable to have it getIdSitesNotUsingTracker
. It only requires knowing whether a sites is using the Tracker or not. Maybe getIdSitesNotUsingTrackerAPI
would be clearer?
I added it to make it work for MultiSites but haven't tested it for any actual 3rd data that only uses it's own plugin archiver (like MobileAppStore maybe does?). Do you think it's easy to fix? One problem that I'm not sure about is that it looks like we're working with multiple idSites there and not just one site see https://github.com/piwik/piwik/blob/master/core/ArchiveProcessor/Loader.php#L133 $this->params->getIdSites()
. Probably it's always only one site though. I reckon that's what you suggested and I now understand what you mean with it clashes with MetaSites. It does sound like a new event would be needed but that's kinda too complex. Would it be something like getIdSitesNotUsingLogTables
? It's maybe a bit too technical as many developers won't know the log tables.
- MetaSites is not using Tracker API but Log tables
- MobileAppStore is not using Tracker API and not Log tables (I think)
Do you have an idea for that event name? I could maybe implement it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe getIdSitesNotUsingTrackerAPI would be clearer?
In general, the issue I have w/ these names is that it's not clear what the intent is. The event is used to force CronArchive to send API requests to initiate archiving, but getIdSitesNotUsingTrackerAPI
doesn't convey that information. I think it might be very easy for a plugin developer to overlook the event entirely. Maybe CronArchive.getIdSitesToForceArchivingRequestsFor
? Or if you think getIdSitesNotUsingTrackerAPI
might be re-used in a different context, it would be ok.
Do you have an idea for that event name? I could maybe implement it
How about Archiver.getIdSitesDisplayingThirdPartyData
or Archver.getIdSitesDisplayingNonPiwikData
? This isn't needed for this PR, but I think it would be a good stop-gap measure for 2.15. Since 3.0 won't be released for say at least 6 months or so, this event would help plugin developers in the meantime. What do you think? If you agree, I'll create a new issue.
Created follow up issue re: the archiving event here: #8631 |
@tsteur do you have anything else to say regarding the event name or shall I merge? |
Nah not really. I wouldn't wanna use any event that contains I think after refactoring the archiver we might not need an event anymore at all, hopefully. And if so, we can maybe find a better name or provide a method for it somehow. |
Ok, will merge then. |
Fixes #8066, do not use SQL query in CronArchive to find websites with traffic since last successful archiving since it can take 6+ hours, instead use fast query for each individual website and use shorter timeframe for visits check.
refs #8066
Previous behaviour:
Difference to before:
websiteDayHasFinishedSinceLastRun
or ifisOldReportInvalidatedForWebsite
We might try to run the archiver on a large Piwik system tmrw. I'm not really sure how to test it and to make sure we actually do not regress anything. Maybe we can deploy it on all our Piwik demos as well