@Pilot001 opened this Issue on February 3rd 2021

After upgrading our testing system (with very little traffic) from 3.14.1 to 4.1.1 our monitoring system alerted due to a rapid increase of the database binary logs. Investigation showed that the archive cron job (running once an hour), which usually takes a few seconds to finish, was running about 2 hours (with 2GB max_memory) and then stops because of out of memory.
Next time, the job started, it invalidated the previously created archive and started again. We already tried to invalidate it manually, but it did not help.
Running the job manually with -vvv we see a strange some strange things...
Here is the debug output of the run (just the last lines):

DEBUG [2021-02-03 12:36:33] 91274  Starting archiving for ?module=API&method=CoreAdminHome.archiveReports&idSite=5&period=year&date=2021-01-01&format=json&plugin=MultiChannelConversionAttribution&trigger=archivephp&pluginOnly=1
DEBUG [2021-02-03 12:36:34] 91274  Running command: /usr/bin/php -q -c /srv/apache/php-fcgi/piwik.ini -d memory_limit=4G /srv/apache/htdocs/console climulti:request -q --matomo-domain='piwik.local' --superuser 'module=API&method=CoreAdminHome.archiveReports&idSite=5&period=year&date=2021-01-01&format=json&plugin=MultiChannelConversionAttribution&trigger=archivephp&pluginOnly=1&pid=577e08ac369c7b794c3b4f77df3cd96c35b6d4edf4e70841f96167b72eba602030f37cfd84d04392163627384568ab62ef390&runid=91274' > /srv/apache/htdocs/tmp/climulti/577e08ac369c7b794c3b4f77df3cd96c35b6d4edf4e70841f96167b72eba602030f37cfd84d04392163627384568ab62ef390.output 2>&1 &

Meanwhile the job generates csv files in tmp directory, e. g. "archive_blob_2021_01-1e61491d2fbb205b6f8eeafd01da42d4.csv" (even here, just a few lines):

"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_11_prior90"     "x�K�2���O�"
"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_12_prior7"      "x�K�2���O�"
"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_12_prior30"     "x�K�2���O�"

On one hand, the dates in columns 3 and 4 are out of the time period that the job should be working on (2021-01-01 to 2021-02-03). And when we continue to watch these csv files, we see that the timestamp in column 6 is always just a few seconds behind current time.

@sgiehl commented on February 3rd 2021 Member

@Pilot001 thanks for creating the issue. We are already working on some archiving related issues. Could be possible that the issue causing your problems might already be fixed.
Guess @diosmosis might be able to say more here.

@diosmosis commented on February 4th 2021 Member

@Pilot001 could you explain what you mean by the core:archive command generating CSV files in the tmp directory? This shouldn't be something that command does (at least not with raw blob data).

Did you recently install or activate MultiChannelConversionAttribution? If so, can you deactivate it and see if the archive job runs?

@Pilot001 commented on February 4th 2021

@diosmosis Well, it is active... but not "recently". We had it active with v3.x before and now with 4.x it is still active.

@diosmosis commented on February 5th 2021 Member

@Pilot001 could you answer my other question and if possible provide core:archive output w/ the -vvv option?

@Pilot001 commented on February 5th 2021

@diosmosis Oh, I'm sorry... A misunderstanding. I understood your second sentence about the MultiChannelConversionAttribution, as it was an option how it would be possible that core:archive generates those csv files.

Well, I run an strace on the php process, hoping to find a hint about what was causing that endless run. There I saw those calls "open" to files names "archiveblob*.csv" within the matomo tmp directory. Those files exist less than a second.

Pls. find the log attached up to the moment it gets stuck.
archive-2021-02-05.log

@diosmosis commented on February 8th 2021 Member

@Pilot001

Well, I run an strace on the php process, hoping to find a hint about what was causing that endless run. There I saw those calls "open" to files names "archiveblob*.csv" within the matomo tmp directory. Those files exist less than a second.

I see, those are probably temporary batch insert files (ie, for LOAD DATA INFILE).

Pls. find the log attached up to the moment it gets stuck.

Ok, looking at the logs I can see the issue, and good news is that we've encountered this on cloud a little while ago and have since found a fix (the problem is in the MultiChannelConversionAttribution plugin). The fix isn't out just yet, but we should be able to release it soon. In the meantime, deactivating the plugin would solve the issue for now. When the plugin is updated and reactivated, I would recommend setting [General] rearchive_reports_in_past_last_n_months to last1 (otherwise matomo will try to rearchive the last 6 months for MultiChannelConversionAttribution which would be a waste).

I will for now keep this issue open as a reminder.

@Pilot001 commented on February 8th 2021

Great! Thank you so much for your help and advice!

@diosmosis commented on February 9th 2021 Member

@Pilot001 actually it looks like the change was already released (in version 4.0.4). Can you upgrade your plugin to the latest version?

@Pilot001 commented on February 10th 2021

OK, this version was published right after I downloaded the previous one. ;-)
It looks much better now. Thanks again!

This Issue was closed on February 10th 2021
Powered by GitHub Issue Mirror