@tsteur opened this Pull Request on August 13th 2015 Member

refs #8066

Previous behaviour:

  • We either processed specifically specified sites
  • We processed all sites if specified
  • or by default we checked which sites had visits since last archiver run and which sites needed to be reprocessed because of invalidation etc. The got the list of all sites that had queries since last archiving run by doing one big query
  • We checked whether there were visits for a rather long timeframe

Difference to before:

  • Instead of executing the slow query each time an archiver starts we only execute one small query for just one site before a site is actually about to be processed.
  • We do not have to execute this query if we have to archive the site anyway because of other reasons: eg if websiteDayHasFinishedSinceLastRun or if isOldReportInvalidatedForWebsite
  • We check for visits in a shorter timeframe (since midnight in website timezone or since last archiving, whatever is smaller)
  • The log output of core archiver can be bigger since we by default archive all websites and log each site that can be skipped (eg if it had no visits)

We might try to run the archiver on a large Piwik system tmrw. I'm not really sure how to test it and to make sure we actually do not regress anything. Maybe we can deploy it on all our Piwik demos as well

@mattab commented on August 13th 2015 Member

Follow up note:

  • Maybe in Piwik 3.0 we can delete the API SitesManager.getSitesIdWithVisits since:
    • it's not used in core,
    • it's not tested,
    • it's slow when many websites.
@tsteur commented on August 14th 2015 Member

Marked method as deprecated

@tsteur commented on August 14th 2015 Member

FYI: We're running the archiver on a big instance and the big query is definitely gone, the archiver starts immediately

@quba commented on August 14th 2015 Contributor

FYI: it breaks MetaSites because there's no raw data for them.

@tsteur commented on August 14th 2015 Member

Added more log output. Re meta sites I need to have a look how it was worked around there. In theory it should not change behaviour

@tsteur commented on August 14th 2015 Member

Think I found it already...

@tsteur commented on August 17th 2015 Member

To make sure we're compatible with MetaSites plugin I added https://github.com/PiwikPRO/plugin-MetaSites/pull/17

@tsteur commented on August 17th 2015 Member

Can someone please have a look again?

@quba commented on August 18th 2015 Contributor

Let's test it tomorrow.

@tsteur commented on August 21st 2015 Member

Rebased :) just FYI

@diosmosis commented on August 24th 2015 Member

Created follow up issue re: the archiving event here: https://github.com/piwik/piwik/issues/8631

@diosmosis commented on August 30th 2015 Member

@tsteur do you have anything else to say regarding the event name or shall I merge?

@tsteur commented on August 31st 2015 Member

Nah not really. I wouldn't wanna use any event that contains force in archiving since it's simply not really forcing and there would be multiple events needed that contain the word force. getIdSitesDisplayingNonPiwikData or getIdSitesDisplayingThirdPartyData is not really correct either see eg MultiSites plugin that actually uses this event currently. It's not third party data that is shown. The only difference to normal behaviour is that it doesn't use the tracker.

I think after refactoring the archiver we might not need an event anymore at all, hopefully. And if so, we can maybe find a better name or provide a method for it somehow.

@diosmosis commented on August 31st 2015 Member

Ok, will merge then.

This Pull Request was closed on August 31st 2015
Powered by GitHub Issue Mirror