Background: Our Piwik instance currently has 13,000 sites associated with it, where only 100-200 of them are active at a time (both receiving views and viewing reports / analytics). This means our core:archive output looks like this:
INFO CoreConsole[2015-06-30 14:25:31] - Will process 101 websites with new visits since 1 days 0 hours , IDs: [...] INFO CoreConsole[2015-06-30 14:25:31] - Will process 91 other websites because some old data reports have been invalidated (eg. using the Log Import script) , IDs: [...] INFO CoreConsole[2015-06-30 14:25:31] - Will process 13212 other websites because the last time they were archived was on a different day (in the website's timezone) , IDs:
It is pretty easy to efficiently process 100-200 sites, but 13,000 takes about 7 days (!).
Proposed Solution: Create a flag for the core:archive command such as "--ignore-untouched-sites" that would not archive sites that were last archived on a different day, if there were no views or actions. If at a later time these sites did get views, it may take longer to archive them, but I would rather have the option to frontload the time savings and ignore sites that have not been interacted with in the last day (or any time period specified by --force-all-periods).
Let me know what you think!
Edit: it appears I can't label this, but if I could it'd be a feature request / enhancement!
I think this is a duplicate of #5922 . Please comment/reopen if not. Cheers!