@baptx opened this Issue on April 7th 2020

I have a backup of 20 MB made with mysqldump (Ver 10.13 Distrib 5.5.62) using MySQL (Ver 14.14 Distrib 5.5.62 on Debian 8) on Matomo / Piwik version 3.10.0 that I had to import on a new server.
This backup was imported in command line with MariaDB (Ver 15.1 Distrib 10.3.22-MariaDB on Debian 10) and used with the latest Matomo version 3.13.4.
To access Matomo administration area, I had to do an upgrade: "Matomo database will be upgraded from version 3.10.0 to the new version 3.13.4.".

When logging in to Matomo, I clicked on "Visitors" -> "Overview" and selected the previous month or year.
Using htop on the server, I saw that 300 MB of RAM were used at the beginning and it kept increasing to more than 1,50 GB while displaying "Loading Visits Over Time...".
It failed with an error "Oops… there was a problem during the request. Maybe the server had a temporary issue, or maybe you requested a report with too much data. Please try again. If this error occurs repeatedly please contact your Matomo administrator for assistance.".
In "/var/log/apache2/error.log", I saw "PHP Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/website/piwik/core/DataTable/Manager.php on line 86" as well as in "/var/www/website/piwik/libs/Zend/Db/Statement/Pdo.php on line 290".

Sometimes the error is even worse on my 2 GB RAM VPS from OVH (SSD 2018): "PHP Fatal error: Out of memory (allocated 127926272) (tried to allocate 131072 bytes) in /var/www/website/piwik/core/DataAccess/ArchiveWriter.php".

If the data was successfully loaded, the next time it is displayed directly. It looks like the issue happens the first time we load data. However when the data is loaded and we do a database export, the backup can have the double of the original size.

Do you have an idea where the problem comes from and is it possible to fix it? I think the PHP parameter max_execution_time = 30 in /etc/php/7.3/apache2/php.ini is enough and the problem comes from Matomo. I have already changed memory_limit = 512M instead of 128M, like recommended (https://matomo.org/docs/setup-auto-archiving/#increase-php-memory-limit).

It is also very slow, this issue may be related: https://github.com/matomo-org/matomo/issues/9532

@tsteur commented on April 7th 2020 Member

Just checking can you confirm you have disabled browser archiving and configured a cron to do the archiving? And this is for the regular "All visits" segment? Do you know what the filter limit is in the evolution graph on the right (meaning the value in bottom right in below screenshot)? And do you know if this happens for day period, just as well as for week and month metric?

image

This could be related to https://github.com/matomo-org/matomo/issues/15726

@baptx commented on April 7th 2020

@tsteur in the general settings, I have "Archive reports when viewed from the browser" set to yes and I don't have a cron job since it should not be necessary for a small personal website, like written in Matomo ("For medium to high traffic websites, it is recommended to disable Matomo archiving to trigger from the browser").
The filter limit value in the bottom right is "30" like on your screenshot. The problem does not happen for day period. It takes time for week but the worst is for month or year period, all RAM becomes used and processes can get killed automatically. I noticed the web server can be down because of this issue.

@tsteur commented on April 7th 2020 Member

The problem is then likely the browser archiving being enabled and it might be archiving heaps of data there. Depending what your dump included, this error might go away in the future as it only once needed to maybe archive this data so far back. Can you still reproduce this issue after reloading it trying it many times maybe?

@baptx commented on April 8th 2020

@tsteur if the data was already displayed once, I cannot reproduce the issue (I updated my report to mention it yesterday) but if I select another month in a previous year or another year that was not already displayed, the issue happens again.
When all RAM is used, I can also see this message in the error log: "mmap() failed: [12] Cannot allocate memory".

Update: a workaround is to use auto-archiving with a cronjob and set "Archive reports when viewed from the browser" to "No". When I run /path/to/matomo/console core:archive on the server the first time, it took 10 minutes.
However it still uses a lot of RAM if I select several years instead of just one in the Visitors Overview.

I have some questions:
Is it possible to fix these RAM issues in a next release? (which also happen with auto-archiving if we select several years)
Is it normal that after archiving, the new database export with mysqldump uses 35MB instead of 20MB originally?
Is it possible to undo the archiving to have a lighter database backup?

Maybe you can also update the documentation https://matomo.org/docs/setup-auto-archiving/ which says "--url=http://example.org/matomo/ is the only required parameter in the script". In my case, I did not need it so I think it is not necessary.
Thanks for your help.

@tsteur commented on April 8th 2020 Member

if the data was already displayed once, I cannot reproduce the issue (I updated my report to mention it yesterday) but if I select another month in a previous year or another year that was not already displayed, the issue happens again.

This would be expected since it would aggregate many reports.

Is it possible to fix these RAM issues in a next release? (which also happen with auto-archiving if we select several years)

I reckon there are usually no issues because these things would usually happen step by step. In your case it might cause memory issues because it needs to archive a lot of missing data which it would usually do just a tiny fraction of it if it does it on a daily/regular basis compared to doing many months or years of data at once. I'd say you'd very soon see no issues anymore re memory in the archiver or they might be already gone? Your memory limit is 128MB right? We usually recommend at least 256MB and likely we won't be doing much here considering the limit is fairly low and it had to archive a lot of data at once and the issue should disappear afterwards. I agree 128MB is quite a bit, but these days memory is usually quite cheap and it's maybe a rather rare edge case. It be great to know if you see these errors go away?

Is it normal that after archiving, the new database export with mysqldump uses 35MB instead of 20MB originally?

Yes that's very much expected since we aggregate the data for each day, week, month, year and have to store this aggregated data in the database.

@baptx commented on April 9th 2020

@tsteur with auto-archiving, there is no more error. However if I select several years in Visitors Overview (for example 2014 to 2020), I have to set memory_limit=512M in /etc/php/7.3/apache2/php.ini to avoid the error PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 16384 bytes) in /var/www/website/piwik/libs/Zend/Db/Statement/Pdo.php on line 228 (by default it uses 128M and even 256M is not enough and causes this error). For PHP through command line instead of Apache web server, the default value is memory_limit=-1 in /etc/php/7.3/cli/php.ini.

Before displaying several years, the RAM usage was 300 MB and it can increase up to 1,40 GB if I reload the page several times, which is also very slow and could produce a RAM overflow error in the future if I want to display additional years.
Maybe there is a way to display data for several years, faster and without using so much RAM, by displaying a summary of the data instead of loading all data. When closing the page, the RAM usage stays at more than 900 MB instead of less than 300 MB originally, could this mean that there is a memory leak? (I have to restart Apache web server to have 300 MB RAM usage again)

Do you also have an answer to my other question:
Is it possible to undo the archiving to have a lighter database backup?

And can you confirm that using --url=http://example.org/matomo/ is not needed to archive data with a cronjob? (I did not need it so it could be clarified on https://matomo.org/docs/setup-auto-archiving/, which says it is required)

@tsteur commented on April 12th 2020 Member

@baptx when you go to Admin => Privacy => Anonymize Data then you can configure the section Delete old aggregated report data and eg configure to delete this data automatically. Eg anything older than 2 months or so
image

You maybe want to keep basic metrics as they don't need much storage or delete all reports. Note: It may mean if/when you request this data, it will be rearchived again (not sure if that's the case when you have Keep basic metrics enabled).

You can also use this command / API https://matomo.org/faq/how-to/faq_155/ and these invalidated reports would then be deleted at some point

@baptx commented on April 13th 2020

@tsteur If I check "Regularly delete old reports from the database", uncheck "Keep basic metrics" and click on "Purge DB Now", it looks like all day / week statistics are definitely lost (if older than the number of months entered) since I cannot see it again when selecting a day or week in "Visits Over Time", even if I archive data again with the command /var/www/website/piwik/console core:archive --force-all-websites --force-all-periods=1. Can you confirm it is a bug or is there another way to get the data by archiving again?
Do you also have an idea for the 2 other questions of my previous comment?

@tsteur commented on April 13th 2020 Member

If "keep basic metrics" is disabled, and all data was deleted, I think it would be expected to archive again. If the checkbox is enabled, I would assume it wouldn't archive again. I think you want to use --force-all-periods=86400 as parameter for example to archive one day back.

The other questions, are you referring to this?

Maybe there is a way to display data for several years, faster and without using so much RAM, by displaying a summary of the data instead of loading all data. When closing the page, the RAM usage stays at more than 900 MB instead of less than 300 MB originally, could this mean that there is a memory leak?

It's hard to say. These things take quite a long time to replicate and investigate. Not sure what you mean by closing the page it stays at more than 900MB? You mean you close the browser page? Do you know if the process is still running in the background? Cause once the request is finished PHP should definitely free the memory from that particular request

@baptx commented on April 14th 2020

@tsteur If "Keep basic metrics" was unchecked before the click on "Purge DB Now", I cannot get the old data back in "Visits Over Time", even when archiving again with --force-all-periods=86400.

By closing the page, I mean just closing the tab or closing the web browser but it should not matter since the Linux server RAM usage should decrease directly after the data is loaded and displayed. The processes I can see on the server are from Apache and MariaDB so they are always running. Could it be that the RAM does not decrease because the web server or database is caching data?

I think it is easy to replicate the issue on any Matomo installation having several years of data, we just need to select several years in "Visitors Overview" (tested from 2014 to 2020), which results in a lot of RAM usage.

I was also referring to this question:

And can you confirm that using --url=http://example.org/matomo/ is not needed to archive data with a cronjob? (I did not need it so it could be clarified on https://matomo.org/docs/setup-auto-archiving/, which says it is required)

@tsteur commented on April 14th 2020 Member

And can you confirm that using --url=http://example.org/matomo/ is not needed to archive data with a cronjob?

It should not be needed 👍

Could it be that the RAM does not decrease because the web server or database is caching data?

No that can't really be it. Is it possible that in the background there is a cronjob running doing eg archiving? Like you could maybe do ps aux | grep climulti or ps aux | grep archive to see if there is anything running in the background

@baptx commented on April 15th 2020

@tsteur no, I cannot see another process with your commands.
But here are the processes using the most RAM, found with the htop command after the data of several years is displayed in "Visits Over Time" (I am not sure if it is normal to see so many apache2 / mysqld processes):

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command                                                                                                         
 3814 www-data   20   0  354M  197M 55700 S  0.0 10.2  0:20.10 /usr/sbin/apache2 -k start                                                                                                     
  737 mysql      20   0 1227M  171M 18432 S  0.5  8.8  1:38.16 /usr/sbin/mysqld
  748 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.00 /usr/sbin/mysqld
  772 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.15 /usr/sbin/mysqld
  781 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:03.12 /usr/sbin/mysqld
  783 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:06.48 /usr/sbin/mysqld
  784 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:03.45 /usr/sbin/mysqld
  785 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:03.06 /usr/sbin/mysqld
  786 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:03.13 /usr/sbin/mysqld
  787 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:03.80 /usr/sbin/mysqld
  788 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:05.28 /usr/sbin/mysqld
  789 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:04.93 /usr/sbin/mysqld
  790 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:05.03 /usr/sbin/mysqld
  791 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:04.46 /usr/sbin/mysqld
  792 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:09.29 /usr/sbin/mysqld
  824 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:03.00 /usr/sbin/mysqld
  825 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:05.37 /usr/sbin/mysqld
  826 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.57 /usr/sbin/mysqld
  827 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.43 /usr/sbin/mysqld
  828 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.58 /usr/sbin/mysqld
  832 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:04.57 /usr/sbin/mysqld
  833 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.58 /usr/sbin/mysqld
  834 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.05 /usr/sbin/mysqld
  835 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.05 /usr/sbin/mysqld
  836 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.06 /usr/sbin/mysqld
  837 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.16 /usr/sbin/mysqld
  838 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:02.92 /usr/sbin/mysqld
  839 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.00 /usr/sbin/mysqld
  840 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.00 /usr/sbin/mysqld
  846 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.00 /usr/sbin/mysqld
  847 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.00 /usr/sbin/mysqld
 7367 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:01.96 /usr/sbin/mysqld
 7370 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:01.76 /usr/sbin/mysqld
 7371 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.98 /usr/sbin/mysqld
 7374 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:01.45 /usr/sbin/mysqld
 7375 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:15.18 /usr/sbin/mysqld
 7376 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.73 /usr/sbin/mysqld
 7377 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:01.42 /usr/sbin/mysqld
 7378 mysql      20   0 1227M  171M 18432 S  0.0  8.8  0:00.91 /usr/sbin/mysqld
 3813 www-data   20   0  270M  113M 55312 S  0.0  5.8  0:04.47 /usr/sbin/apache2 -k start
 7396 www-data   20   0  278M  103M 36452 S  0.0  5.3  0:03.61 /usr/sbin/apache2 -k start
@tsteur commented on April 16th 2020 Member

Sorry I might not be of much help here. When a request finishes, PHP should release all the memory for sure. I suppose there is no other application running on the same server? It's bit hard for me to say what could be happening here as we can't reproduce this. Not sure if that be an issue with that particular PHP version, or what request is running there exactly. Are you maybe bit familiar with apache to see what is happening there?

@baptx commented on April 18th 2020

@tsteur no other app is running, I just have WordPress and Matomo installed. I am using the default Apache configuration with mod_php from Debian 10 (on an OVH SSD 1 VPS with 2GB RAM) so I don't know what else I could do if there is a memory leak. I noticed that the memory is freed in a bit less than 24 hours, probably because of this message that is displayed at midnight in /var/log/apache2/error.log.1 before the rotation log:
[Sat Apr 18 00:00:02.939333 2020] [mpm_prefork:notice] [pid 739] AH00171: Graceful restart requested, doing restart.

About my previous comment, do you have an idea how to get the old data back?

If "Keep basic metrics" was unchecked before the click on "Purge DB Now", I cannot get the old data back in "Visits Over Time", even when archiving again with --force-all-periods=86400.

And is it possible to display data for several years in "Visits Over Time", faster and without using so much RAM? For example by displaying a summary of the data instead of loading all data.

@tsteur commented on April 20th 2020 Member

is it possible to display data for several years in "Visits Over Time", faster and without using so much RAM? For example by displaying a summary of the data instead of loading all data.

Matomo should already load only the data needed to display that data. This loads in a few ms here when browser archiving is disabled. Unless you maybe want to view the entry point for every day within the last few years then it might take indeed up to 30 seconds (when loading the data for eg 700 different days)

About my previous comment, do you have an idea how to get the old data back?

If you have access to the database then you could simple delete all tables starting with matomo_archive_*. and Matomo will automatically recreate these tables and archive all data again from the raw data. You'd probably also want to disable the "delete report data" feature again.

@baptx commented on April 20th 2020

@tsteur thanks, deleting tableprefix_archive_* allowed to archive data again with a command like /var/www/website/piwik/console core:archive --force-all-websites --force-all-periods=86400 but I noticed an out of memory error on my 2GB RAM VPS.
The option memory_limit = -1 was set in /etc/php/7.3/cli/php.ini which means unlimited but even with memory_limit = 512M, it was not enough.
The PHP script tried to archive data for several years (2014 to 2020). I had to use a workaround by archiving no more than 2 years at a time with these commands:

/var/www/website/piwik/console core:archive --force-all-websites --force-all-periods=86400 --force-date-range=2014-01-01,2015-12-31
/var/www/website/piwik/console core:archive --force-all-websites --force-all-periods=86400 --force-date-range=2016-01-01,2017-12-31
/var/www/website/piwik/console core:archive --force-all-websites --force-all-periods=86400 --force-date-range=2018-01-01,2019-12-31
/var/www/website/piwik/console core:archive --force-all-websites --force-all-periods=86400

As you can see, it is not very convenient to execute all these commands instead of just one. But there is probably a way to improve your archiving script by freeing memory as soon as possible after each year is archived (or month or week or even day if possible for very large websites that have a lot of data), instead of freeing memory for several years at the same time after each period type (day, week, month, year), like it is done currently:

INFO [2020-04-20 13:56:59] 7528  START
INFO [2020-04-20 13:56:59] 7528  Starting Matomo reports archiving...
INFO [2020-04-20 13:57:00] 7528  Will pre-process for website id = 1, period = day, date = 2014-01-01,2015-12-31
INFO [2020-04-20 13:57:00] 7528  - pre-processing all visits
INFO [2020-04-20 13:59:11] 7528  Archived website id = 1, period = day, 0 segments, 999 visits in days included in: 2014-01-01,2015-12-31, Time elapsed: 131.493s

-> RAM was freed at this time (checked with htop)

INFO [2020-04-20 13:59:11] 7528  Will pre-process for website id = 1, period = week, date = 2014-01-01,2015-12-31
INFO [2020-04-20 13:59:11] 7528  - pre-processing all visits
INFO [2020-04-20 14:00:04] 7528  Archived website id = 1, period = week, 0 segments, 1002 visits in weeks included in: 2014-01-01,2015-12-31, Time elapsed: 52.796s

-> RAM was freed at this time (checked with htop)

INFO [2020-04-20 14:00:04] 7528  Will pre-process for website id = 1, period = month, date = 2014-01-01,2015-12-31
INFO [2020-04-20 14:00:04] 7528  - pre-processing all visits
INFO [2020-04-20 14:00:17] 7528  Archived website id = 1, period = month, 0 segments, 999 visits in months included in: 2014-01-01,2015-12-31, Time elapsed: 13.127s

-> RAM was freed at this time (checked with htop)

INFO [2020-04-20 14:00:17] 7528  Will pre-process for website id = 1, period = year, date = 2014-01-01,2015-12-31
INFO [2020-04-20 14:00:17] 7528  - pre-processing all visits
INFO [2020-04-20 14:00:19] 7528  Archived website id = 1, period = year, 0 segments, 999 visits in years included in: 2014-01-01,2015-12-31, Time elapsed: 2.210s

-> RAM was freed at this time (checked with htop)

INFO [2020-04-20 14:00:19] 7528  Archived website id = 1, 4 API requests, Time elapsed: 199.657s [1/1 done]
INFO [2020-04-20 14:00:19] 7528  Done archiving!

If a website collected data with Matomo without archiving for several years, they will probably see an out of memory error if they don't do the first archiving with the --force-date-range parameter.
The RAM issue I had originally when using browser archiving is probably similar.

Powered by GitHub Issue Mirror