@tsteur opened this Pull Request on October 14th 2021 Member

Description:

This PR is based on https://github.com/matomo-org/matomo/pull/18152 which we'll want to merge first.

Previously, we were executing all cli commands asynchronous. This comes with a big overhead that we need to check the list of processes constantly, need to work with pid files, write output to disk, etc. On one of our systems we noticed this can lead to an overhead of close to 40% CPU on the archiving. It's probably bit over-reported but it had definitely a big impact and it's maybe more realistically maybe 10-20%. However, because a lot of the time is archiving spent is waiting for the dedicated DB the CPU increase percentage spent on this might be higher after all.

When core:archive runs with --concurrent-requests-per-website=1 then there's actually no need to run a command asynchronous. Instead we could simply run the command synchronous meaning we won't need to check constantly if the process is still running and then sleep for a bit and check again. Instead we will be notified immediately when the command is finished. Also we 100% accurately notice immediately if command is aborted, avoid writing result to disk, using pids etc.

This would not be a massive improvement for a regular system but for some systems where there are heaps of archivers launching constantly this can make quite an improvement.

Technically, this way we could in the future even support more systems possibly using CLI archiving as we wouldn't need awk and ps command to work anymore etc. However, changing that be a bit more complicated and it's impact is limited because it's only the case when using --concurrent-requests-per-website=1 when the default value is 3. So by default, nobody will benefit from this improvement unless that --concurrent-requests-per-website=1 is set. I wouldn't advice setting --concurrent-requests-per-website=1 for this performance improvement when previously running multiple in parallel. It be mostly a nice performance benefit if already using --concurrent-requests-per-website=1 for various reasons anyway.

Added some tests for the StaticOutput class. There are actually already tests for CliMulti when using CLI and when there is only one concurrent archiver so didn't add any other ones.

Review

This Pull Request was closed on October 18th 2021
Powered by GitHub Issue Mirror