New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cli archiving faster if only one concurrent request per website is configured #18157
Conversation
@@ -162,48 +165,67 @@ private function start($piwikUrls) | |||
|
|||
if ($shouldStart === Request::ABORT) { | |||
// output is needed to ensure same order of url to response | |||
$output = new Output($cmdId); | |||
$output = new StaticOutput($cmdId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was actually not needed to write this to disc, instead we can store this in memory
} else { | ||
$output = new StaticOutput($cmdId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, there was actually never a need to write this to disc, now that we have StaticOutput
we don't need to use the regular output anymore (which should be named ideally like FileOutput
but wasn't sure if it's used anywhere else)
@@ -415,7 +452,9 @@ private function requestUrls(array $piwikUrls) | |||
$elapsed = time() - $startTime; | |||
$timeToWait = $this->getTimeToWaitBeforeNextCheck($elapsed); | |||
|
|||
usleep($timeToWait); | |||
if (!count($this->processes)) { | |||
usleep($timeToWait); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we use HTTP or syncCli we will now no longer need to wait as we'll know we have the result already at this point and the command is finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look through the code changes. Merged in the lastest changes from 4.x-dev. If tests are passing, guess this should be good to merge.
Description:
This PR is based on #18152 which we'll want to merge first.
Previously, we were executing all cli commands asynchronous. This comes with a big overhead that we need to check the list of processes constantly, need to work with pid files, write output to disk, etc. On one of our systems we noticed this can lead to an overhead of close to 40% CPU on the archiving. It's probably bit over-reported but it had definitely a big impact and it's maybe more realistically maybe 10-20%. However, because a lot of the time is archiving spent is waiting for the dedicated DB the CPU increase percentage spent on this might be higher after all.
When
core:archive
runs with--concurrent-requests-per-website=1
then there's actually no need to run a command asynchronous. Instead we could simply run the command synchronous meaning we won't need to check constantly if the process is still running and then sleep for a bit and check again. Instead we will be notified immediately when the command is finished. Also we 100% accurately notice immediately if command is aborted, avoid writing result to disk, using pids etc.This would not be a massive improvement for a regular system but for some systems where there are heaps of archivers launching constantly this can make quite an improvement.
Technically, this way we could in the future even support more systems possibly using CLI archiving as we wouldn't need
awk
andps
command to work anymore etc. However, changing that be a bit more complicated and it's impact is limited because it's only the case when using--concurrent-requests-per-website=1
when the default value is 3. So by default, nobody will benefit from this improvement unless that--concurrent-requests-per-website=1
is set. I wouldn't advice setting--concurrent-requests-per-website=1
for this performance improvement when previously running multiple in parallel. It be mostly a nice performance benefit if already using--concurrent-requests-per-website=1
for various reasons anyway.Added some tests for the
StaticOutput
class. There are actually already tests for CliMulti when using CLI and when there is only one concurrent archiver so didn't add any other ones.Review