We can safely multithread archive.sh script, by running different website ID processing on multiple cores. We can not run archiving for a same website on multiple core, but it will improve performance to run different website ID processing at the same time.
It will improve performance especially in the case of a Piwik install with hundreds/thousands of small/medium websites.
Attachment:
archive.multithreaded.sh.diff
Here is a first version against trunk. Not compatible with #2440, and I'm not sure it's possible to have both features at the same time, at least with the approach I took (using xargs).
See also #2327
I've read in several tickets related to this one that multi-threading should not be activated on mono core.
Please correct me if I'm wrong but during archiving there is a lot of communication done with the database. As far as I understand scheduling, the scheduler is supposed to switch threads when they are waiting on resources (ie. network and database). During that time, threads needing CPU time (ie. for calculations and aggregations) are given CPU time.
Does it logically follow that activating multi-threading on a single core would increase performance in case the web server and the database are located on different hardware ?
I think considering the huge work done on the new archive.php, we should completely deprecate archive.sh and .ps1 and change all doc to use archive.php and only maintain this one.