@EreMaijala opened this Pull Request on November 16th 2021 Contributor

Description:

This change makes it possible to limit the process lifetime to avoid excessive memory usage (possibly due to PHP bugs such as 79519).

Rationale:

We have an issue with the archiver and a large number (~3900) of sites where the archiver process runs out of memory. We have tried to increase memory_limit up to 12 GB to no avail, so this is an attempt to limit the lifetime of the archiver to avoid this excessive memory usage.

As a sidenote there seem to be some static or long-living arrays that consume memory during archiving, such as the transient cache, but they don't seem to explain everything. I suspect PHP issues such as 79519 contribute to this.

Review

@EreMaijala commented on November 16th 2021 Contributor

Looks like I can't add the "Needs Review" label.

@tsteur commented on November 17th 2021 Member

@EreMaijala @bx80 I wonder if it would generally make more sense to stop after a certain amount of archives that were processed? This way it may work better (for example if one site has heaps of invalidations and therefore lots to archive which can happen eg when creating/changing a segment) and it would potentially also work better for other people with less sites but similar issue? Not sure if that would help or make any sense?

@EreMaijala commented on November 17th 2021 Contributor

@tsteur Stopping after a certain number of archives would probably be even better, but as the process currently works, the next execution seems to always pick the next site instead of next archive to process, and it would need much more extensive changes to make it work smoothly with archives.

@EreMaijala commented on November 17th 2021 Contributor

@tsteur To elaborate on the previous: since the queue contains site IDs and each archiver takes a site ID from the queue to process its archives, it's not easy to stop in the middle of a site. Perhaps it would be possible to put the site back into the queue, but that feels like a hack to me.

@tsteur commented on November 17th 2021 Member

@EreMaijala not sure if it helps, I was thinking something like this might work:

diff --git a/core/CronArchive.php b/core/CronArchive.php
index bbdd8a691..a48e1098c 100644
--- a/core/CronArchive.php
+++ b/core/CronArchive.php
@@ -389,6 +389,13 @@ class CronArchive

             $successCount = $this->launchArchivingFor($archivesToProcess, $queueConsumer);
             $numArchivesFinished += $successCount;
+
+
+            if ($this->maxArchivesToProcess && $numArchivesFinished >= $this->maxArchivesToProcess) {
+                $this->logger->info("Maximum number of archives to process per execution has been reached.");
+                break;
+            }
+
         }

It's not the full code as maxArchivesToProcess isn't set initially. Also haven't tested it.

@EreMaijala commented on November 18th 2021 Contributor

@tsteur I believe that's what I initially tried, but that resulted in the situation I described, where the execution can stop in the middle of a site, and the next execution then starts from the next site. As far as I can see, archiver will detect the skipped ones after a day (with the message "Found X failed jobs (ts_invalidated older than 1 day), resetting status to try them again."), but that means that their processing will be delayed, and potentially skipped again if processing stops at the right moment.

To illustrate this, I added some output and a hard-coded limit of 10 archives:

            $successCount = $this->launchArchivingFor($archivesToProcess, $queueConsumer);
            $numArchivesFinished += $successCount;
            echo "Archives finished: $numArchivesFinished\nLast processed:\n";
            foreach ($archivesToProcess as $current) {
                echo "Site: {$current['idsite']}, archive: {$current['idarchive']}, invalidation: {$current['idinvalidation']}, period: {$current['period']}\n";
            }
            if ($numArchivesFinished >= 10) {
                echo "STOP\n";
                break;
            }

Results from first execution:

INFO [2021-11-18 10:27:52] 61920  START
INFO [2021-11-18 10:27:52] 61920  Starting Matomo reports archiving...
INFO [2021-11-18 10:27:52] 61920  Start processing archives for site 423.
INFO [2021-11-18 10:27:53] 61920  Archived website id 423, period = week, date = 2021-11-15, segment = '', 1 visits found. Time elapsed: 0.822s
Archives finished: 1
Last processed:
Site: 423, archive: , invalidation: 1765, period: 2
INFO [2021-11-18 10:27:53] 61920  Archived website id 423, period = month, date = 2021-11-01, segment = '', 1 visits found. Time elapsed: 0.584s
Archives finished: 2
Last processed:
Site: 423, archive: , invalidation: 1766, period: 3
INFO [2021-11-18 10:27:54] 61920  Archived website id 423, period = year, date = 2021-01-01, segment = '', 1 visits found. Time elapsed: 0.570s
Archives finished: 3
Last processed:
Site: 423, archive: , invalidation: 1768, period: 4
INFO [2021-11-18 10:27:54] 61920  Finished archiving for site 423, 3 API requests, Time elapsed: 2.445s [1 / 580 done]
INFO [2021-11-18 10:27:54] 61920  Start processing archives for site 424.
INFO [2021-11-18 10:27:55] 61920  Archived website id 424, period = week, date = 2021-11-15, segment = '', 1 visits found. Time elapsed: 0.805s
Archives finished: 4
Last processed:
Site: 424, archive: , invalidation: 1769, period: 2
INFO [2021-11-18 10:27:56] 61920  Archived website id 424, period = month, date = 2021-11-01, segment = '', 1 visits found. Time elapsed: 0.574s
Archives finished: 5
Last processed:
Site: 424, archive: , invalidation: 1770, period: 3
INFO [2021-11-18 10:27:57] 61920  Archived website id 424, period = year, date = 2021-01-01, segment = '', 1 visits found. Time elapsed: 0.586s
Archives finished: 6
Last processed:
Site: 424, archive: , invalidation: 1772, period: 4
INFO [2021-11-18 10:27:57] 61920  Finished archiving for site 424, 3 API requests, Time elapsed: 2.413s [2 / 580 done]
INFO [2021-11-18 10:27:57] 61920  Start processing archives for site 425.
INFO [2021-11-18 10:27:58] 61920  Archived website id 425, period = week, date = 2021-11-15, segment = '', 1 visits found. Time elapsed: 0.820s
Archives finished: 7
Last processed:
Site: 425, archive: , invalidation: 1773, period: 2
INFO [2021-11-18 10:27:58] 61920  Archived website id 425, period = month, date = 2021-11-01, segment = '', 1 visits found. Time elapsed: 0.574s
Archives finished: 8
Last processed:
Site: 425, archive: , invalidation: 1774, period: 3
INFO [2021-11-18 10:27:59] 61920  Archived website id 425, period = year, date = 2021-01-01, segment = '', 1 visits found. Time elapsed: 0.582s
Archives finished: 9
Last processed:
Site: 425, archive: , invalidation: 1776, period: 4
INFO [2021-11-18 10:27:59] 61920  Finished archiving for site 425, 3 API requests, Time elapsed: 2.421s [3 / 580 done]
INFO [2021-11-18 10:27:59] 61920  Start processing archives for site 426.
INFO [2021-11-18 10:28:00] 61920  Archived website id 426, period = week, date = 2021-11-15, segment = '', 1 visits found. Time elapsed: 0.821s
Archives finished: 10
Last processed:
Site: 426, archive: , invalidation: 1777, period: 2
STOP
INFO [2021-11-18 10:28:00] 61920  Done archiving!

Results from second execution:

INFO [2021-11-18 10:28:10] 61981  START
INFO [2021-11-18 10:28:10] 61981  Starting Matomo reports archiving...
INFO [2021-11-18 10:28:10] 61981  Start processing archives for site 427.
INFO [2021-11-18 10:28:11] 61981  Archived website id 427, period = week, date = 2021-11-15, segment = '', 1 visits found. Time elapsed: 0.832s
Archives finished: 1
Last processed:
Site: 427, archive: , invalidation: 1781, period: 2
[...]

It starts from the next site (427) instead of finishing periods 3 and 4 for site 426. This is the reason I made it stop after processing all archives for a site.

This said, it would be possible to do the check only when getNextArchivesToProcess returns an empty array, which indicates that it has completed archiving for the current website. That, however, would make the point where it stops inexact, more like 'Stop after processing at least X archives', which would make it not so different from max sites to process, but might be easier to use when there are lots of invalidations but only occasionally. I'll add that as an option, and you can then deside whether to drop either one. For us it doesn't make much difference as we can usually process something like 3000 invalidations before memory_limit is exceeded.

@EreMaijala commented on November 18th 2021 Contributor

@tsteur New option added. I think I'd prefer to keep both options since that'd allow specifying both and stop processing when either limit has been reached. I also considered naming the option something else than max-archives-to-process, but decided against that since it could perhaps become a hard maximum if the queue processing changes in future.

@tsteur commented on November 18th 2021 Member

That, however, would make the point where it stops inexact, more like 'Stop after processing at least X archives', which would make it not so different from max sites to process, but might be easier to use when there are lots of invalidations but only occasionally.

I reckon it wouldn't be a big issue since most of the time it's only launching like 3 or so in parallel I think.

believe that's what I initially tried, but that resulted in the situation I described, where the execution can stop in the middle of a site, and the next execution then starts from the next site. ... It starts from the next site (427) instead of finishing periods 3 and 4 for site 426. This is the reason I made it stop after processing all archives for a site.

Indeed. It would continue with the next sites, then once all the other sites have been worked on, continue again from the beginning and process the ones that haven't been processed yet. Depending how quick the archiving is, and how many sites there are, it would continue the aborted site in say 2 hours or maybe later. It should usually not be an issue but it depends on the situation.

@EreMaijala commented on November 19th 2021 Contributor

@tsteur Ah, I've missed that part and thought it will always take 24 hours for it to resume. It's not that bad then, sorry for the confusion. I'm still not convinced that it's not a problem at all as you could get a bit wonky results when looking at analytics for a site that's only partially archived, no? I mean that it's obviously not processed if the information is not available at all, but if you have like day and week but no month or year for several hours, it could be confusing.

@tsteur commented on November 21st 2021 Member

@EreMaijala it totally depends on the users of Matomo. For us it's not an issue and we're even calculating on our own Matomo month and year for example less often as having it up to date is not as import. Like if the data in the year is say 2-4 hours behind, out of up to 8800 hours, then it's too big of an issue. If you were to some up every week and day manually and then come to a different result maybe as the month period, then it can be confusing.

@EreMaijala commented on November 22nd 2021 Contributor

@tsteur Fair enough. I've now updated this to make --max-archives-to-process a hard limit while keeping --max-websites-to-process as well. I believe this allows enough flexibility for most use cases.

@bx80 commented on November 24th 2021 Contributor

This all looks good to me and works well in testing, thanks @EreMaijala for adding both options. @tsteur Okay to merge?

@tsteur commented on November 24th 2021 Member

Yes, all good to merge. Technically we might want to create an FAQ for it. Maybe needs to be mentioned at least on https://matomo.org/docs/setup-auto-archiving/
image

@bx80 commented on November 25th 2021 Contributor

I've updated the core:archive help output in the setup auto archiving user guide and also created a draft FAQ explaining how these options work and where they should be added.

@EreMaijala commented on November 25th 2021 Contributor

Thank you all!

This Pull Request was closed on November 25th 2021
Powered by GitHub Issue Mirror