Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archiving job runs forever due to MultiChannelConversionAttribution #17185

Closed
Pilot001 opened this issue Feb 3, 2021 · 9 comments
Closed

archiving job runs forever due to MultiChannelConversionAttribution #17185

Pilot001 opened this issue Feb 3, 2021 · 9 comments

Comments

@Pilot001
Copy link

Pilot001 commented Feb 3, 2021

After upgrading our testing system (with very little traffic) from 3.14.1 to 4.1.1 our monitoring system alerted due to a rapid increase of the database binary logs. Investigation showed that the archive cron job (running once an hour), which usually takes a few seconds to finish, was running about 2 hours (with 2GB max_memory) and then stops because of out of memory.
Next time, the job started, it invalidated the previously created archive and started again. We already tried to invalidate it manually, but it did not help.
Running the job manually with -vvv we see a strange some strange things...
Here is the debug output of the run (just the last lines):

DEBUG [2021-02-03 12:36:33] 91274  Starting archiving for ?module=API&method=CoreAdminHome.archiveReports&idSite=5&period=year&date=2021-01-01&format=json&plugin=MultiChannelConversionAttribution&trigger=archivephp&pluginOnly=1
DEBUG [2021-02-03 12:36:34] 91274  Running command: /usr/bin/php -q -c /srv/apache/php-fcgi/piwik.ini -d memory_limit=4G /srv/apache/htdocs/console climulti:request -q --matomo-domain='piwik.local' --superuser 'module=API&method=CoreAdminHome.archiveReports&idSite=5&period=year&date=2021-01-01&format=json&plugin=MultiChannelConversionAttribution&trigger=archivephp&pluginOnly=1&pid=577e08ac369c7b794c3b4f77df3cd96c35b6d4edf4e70841f96167b72eba602030f37cfd84d04392163627384568ab62ef390&runid=91274' > /srv/apache/htdocs/tmp/climulti/577e08ac369c7b794c3b4f77df3cd96c35b6d4edf4e70841f96167b72eba602030f37cfd84d04392163627384568ab62ef390.output 2>&1 &

Meanwhile the job generates csv files in tmp directory, e. g. "archive_blob_2021_01-1e61491d2fbb205b6f8eeafd01da42d4.csv" (even here, just a few lines):

"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_11_prior90"     "x�K�2���O�"
"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_12_prior7"      "x�K�2���O�"
"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_12_prior30"     "x�K�2���O�"

On one hand, the dates in columns 3 and 4 are out of the time period that the job should be working on (2021-01-01 to 2021-02-03). And when we continue to watch these csv files, we see that the timestamp in column 6 is always just a few seconds behind current time.

@sgiehl
Copy link
Member

sgiehl commented Feb 3, 2021

@Pilot001 thanks for creating the issue. We are already working on some archiving related issues. Could be possible that the issue causing your problems might already be fixed.
Guess @diosmosis might be able to say more here.

@diosmosis
Copy link
Member

@Pilot001 could you explain what you mean by the core:archive command generating CSV files in the tmp directory? This shouldn't be something that command does (at least not with raw blob data).

Did you recently install or activate MultiChannelConversionAttribution? If so, can you deactivate it and see if the archive job runs?

@Pilot001
Copy link
Author

Pilot001 commented Feb 4, 2021

@diosmosis Well, it is active... but not "recently". We had it active with v3.x before and now with 4.x it is still active.

@diosmosis
Copy link
Member

@Pilot001 could you answer my other question and if possible provide core:archive output w/ the -vvv option?

@Pilot001
Copy link
Author

Pilot001 commented Feb 5, 2021

@diosmosis Oh, I'm sorry... A misunderstanding. I understood your second sentence about the MultiChannelConversionAttribution, as it was an option how it would be possible that core:archive generates those csv files.

Well, I run an strace on the php process, hoping to find a hint about what was causing that endless run. There I saw those calls "open" to files names "archive_blob_*.csv" within the matomo tmp directory. Those files exist less than a second.

Pls. find the log attached up to the moment it gets stuck.
archive-2021-02-05.log

@diosmosis
Copy link
Member

diosmosis commented Feb 8, 2021

@Pilot001

Well, I run an strace on the php process, hoping to find a hint about what was causing that endless run. There I saw those calls "open" to files names "archive_blob_*.csv" within the matomo tmp directory. Those files exist less than a second.

I see, those are probably temporary batch insert files (ie, for LOAD DATA INFILE).

Pls. find the log attached up to the moment it gets stuck.

Ok, looking at the logs I can see the issue, and good news is that we've encountered this on cloud a little while ago and have since found a fix (the problem is in the MultiChannelConversionAttribution plugin). The fix isn't out just yet, but we should be able to release it soon. In the meantime, deactivating the plugin would solve the issue for now. When the plugin is updated and reactivated, I would recommend setting [General] rearchive_reports_in_past_last_n_months to last1 (otherwise matomo will try to rearchive the last 6 months for MultiChannelConversionAttribution which would be a waste).

I will for now keep this issue open as a reminder.

@diosmosis diosmosis changed the title archiving job runs forever archiving job runs forever due to MultiChannelConversionAttribution Feb 8, 2021
@Pilot001
Copy link
Author

Pilot001 commented Feb 8, 2021

Great! Thank you so much for your help and advice!

@diosmosis
Copy link
Member

@Pilot001 actually it looks like the change was already released (in version 4.0.4). Can you upgrade your plugin to the latest version?

@Pilot001
Copy link
Author

OK, this version was published right after I downloaded the previous one. ;-)
It looks much better now. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants