@toebu opened this Issue on November 8th 2018

Since about 3 weeks the archive data in one of our Matomo instances is growing at a much larger rate than usual. There was no change on the server or the traffic that could explain this:

  • Matomo version 3.5.1 was running for a few months already
  • No new plugins were added or other changes were made on the server
  • Traffic tracked by the instance stayed at around 500k pageviews per week
  • Archiving continued to be done by a cron job running every 15 minutes

The log of the archiving showed the following error, that started at around the same time than the database started to increase in size:

Error: Got invalid response from API request: ?module=API&method=API.get&idSite=11&period=year&date=last7&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:63:"SQLSTATE[HY000]: General error: 2006 MySQL server has gone away";}'

Following the recommendations for that error from the FAQ #183, we tried the following:

  • Set max_allowed_packet to 1GB
  • The wait_timeout was already set to 8 hours (and the archiving script only runs for a few minutes)

This didn't change anything, the same error is still happening.

Other measures we took:

  • Updated Matomo to the latest version (3.6.1)
  • Updated all plugins to the latest versions (HeatmapSessionRecording, MediaAnalytics, SecurityInfo, UsersFlow). Note that we also have 4 custom developed plugins running (already for several years without problems).
  • Installed the DBStats plugin
  • Based on the error, the problem comes from the yearly archive of site id 11 (the site with about 95% of the total traffic). This was confirmed by running archiving only for other sites and other time periods, which completed successfully.

The DBStats plugin confirmed that the yearly archive is bloating the DB. Although also the monthly archive table seems to have way too many rows compared to previous months. Some cleanup seems to happen on some tables, because at least the blob_2018_10 table shrunk drastically in the last days. Other weird stuff seems to happen as there are big changes in tables that should no longer be touched, e.g. blob_2017_12 (if I'm understanding the archiving process correctly).

Here's are two screenshots of the report tables sizes:
5th of November:
screenshot_20181105_104648

8th of November:
screenshot_20181108_112526

Checking the DB directly, we can see the many duplicate archive entries (again, if I'm understanding the process correctly, this query should return no results):

SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_numeric_2018_01 GROUP BY idsite, date1, date2, period, name HAVING count > 1 ORDER BY count DESC;

Here's the first 10 results (the result set has 1550 rows):

+--------+------------+------------+--------+--------------------------------------+-------+
| idsite | date1      | date2      | period | name                                 | count |
+--------+------------+------------+--------+--------------------------------------+-------+
|     11 | 2018-01-01 | 2018-12-31 |      4 | bounce_count                         |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | max_actions                          |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | nb_actions                           |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | nb_visits                            |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | sum_visit_length                     |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | UserCountry_distinctCountries        |  1959 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_sum_time_generation          |  1925 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_nb_hits_with_time_generation |  1925 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_nb_pageviews                 |  1925 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_nb_uniq_pageviews            |  1925 |
+--------+------------+------------+--------+--------------------------------------+-------+

Any hints what might be happening here and how we could go about fixing it? At the current rate of increased disk usage, we have about 2 weeks left before the disk is full. So first thing would be to find a way to delete duplicate archiving data manually to buy us more time. But we of course have to find a real solution that gets the yearly archiving working correctly again (which will hopefully also get the archive purging working again).

(This issue is related to #11322, #10439 and #7181)

@tsteur commented on November 8th 2018 Member

Is browser archiving disabled? Have you thought about running optimize table on egpiwik_archive_blob_2018_01`? If there is lots of duplicate data, it may not change much though (I know InnoDB doesn't support optimize table but it'll recreate the table basically).

I'm not 100% sure but I think this query should give us some information on how many invalidated archives there are etc:

select `value`, period, count(*) from piwik_archive_numeric_2018_01 where `name` like "%done%" group by `value`, period

Do you know if the DB was already this size before updating to 3.6.1? Just checking to find out if there's a problem with archive deletion.

Also just to double check after changing the MySQL settings to fix MySQL server has gone away" MySQL was restarted or you double checked that the settings were applied?

@toebu commented on November 9th 2018

Thanks for the fast answer!

  • Browser archiving is disabled
  • I didn't think about running optimize table. I might try that during during off-hours. But I doubt that it will help.
  • Here's the output of the query you proposed:
    +-------+--------+----------+
    | value | period | count(*) |
    +-------+--------+----------+
    |     1 |      1 |     2499 |
    |     1 |      2 |      383 |
    |     1 |      3 |       65 |
    |     2 |      4 |       42 |
    |     3 |      4 |     3572 |
    |     4 |      1 |        4 |
    |     4 |      2 |        9 |
    |     4 |      3 |        9 |
    |     4 |      4 |     2099 |
    +-------+--------+----------+
  • I'm not completely sure if the DB was already that size before the update to 3.6.1, we only installed the DBStats plugin after. But given that the disk size usage increase stayed the same before and after the update, and that there's nothing else running on the server, I doubt that the update changed anything.
  • I restarted MySQL after changing the db settings, yep. I was wondering if it's possible that the packet that Matomo is trying to insert is bigger than 1 GB and so even the maximum possible max_allowed_packet is not big enough. Do you have any ideas how one might check this?

Another question I didn't yet mention: in #11322 the command ./console core:purge-old-archive-data all is used to clean up the db. Should I try that? Is this save to run (no risk of deleting too much data) and is there a risk of tables locking or the db otherwise being stressed so that is should be done during off-hours?

@tsteur commented on November 11th 2018 Member

I restarted MySQL after changing the db settings, yep. I was wondering if it's possible that the packet that Matomo is trying to insert is bigger than 1 GB and so even the maximum possible max_allowed_packet is not big enough. Do you have any ideas how one might check this?

nope, not really unfortunately.

fyi it seems there are 3572 temporary archives, and 2099 invalidated archives. I'm not quite sure but this could be reasonable as I suppose many archives for the year might be temporary. The invalidated archives could be likely removed I suppose but hard to tell without knowing all the content.

There shouldn't be any risk of deleting too much data with this command, and least I haven't heard of anything. This plus optimize table could reduce the DB size.

@Mayeu commented on November 12th 2018

Dropping in since I am a colleague of @toebu.

I did run the table optimisation via mysqlcheck:

$ mysqlcheck -o piwik -u root -p

This ended up gaining a little bit, around 2 or 3 GB. The screenshot provided in the first comment was done after the optimisation.

@tsteur commented on November 12th 2018 Member

I recommend you try running the other command to purge old archives and maybe this helps. It's hard to say why more archives exist in January than the others without looking at the data. Could be possible there were maybe some other websites tracked that were then deleted etc.

@mattab commented on November 13th 2018 Member

. It's hard to say why more archives exist in January than the others

fyi likely it's because January tables also store yearly archives.

@tsteur commented on November 13th 2018 Member

I compared it with other DBs and there shouldn't be that many more for the yearly. Except when maybe some sites were deleted for example etc.

Powered by GitHub Issue Mirror