Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive tables not cleaned up and growing uncontrollably #13685

Closed
toebu opened this issue Nov 8, 2018 · 12 comments
Closed

Archive tables not cleaned up and growing uncontrollably #13685

toebu opened this issue Nov 8, 2018 · 12 comments
Labels
answered For when a question was asked and we referred to forum or answered it.

Comments

@toebu
Copy link

toebu commented Nov 8, 2018

Since about 3 weeks the archive data in one of our Matomo instances is growing at a much larger rate than usual. There was no change on the server or the traffic that could explain this:

  • Matomo version 3.5.1 was running for a few months already
  • No new plugins were added or other changes were made on the server
  • Traffic tracked by the instance stayed at around 500k pageviews per week
  • Archiving continued to be done by a cron job running every 15 minutes

The log of the archiving showed the following error, that started at around the same time than the database started to increase in size:

Error: Got invalid response from API request: ?module=API&method=API.get&idSite=11&period=year&date=last7&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:63:"SQLSTATE[HY000]: General error: 2006 MySQL server has gone away";}'

Following the recommendations for that error from the FAQ #183, we tried the following:

  • Set max_allowed_packet to 1GB
  • The wait_timeout was already set to 8 hours (and the archiving script only runs for a few minutes)

This didn't change anything, the same error is still happening.

Other measures we took:

  • Updated Matomo to the latest version (3.6.1)
  • Updated all plugins to the latest versions (HeatmapSessionRecording, MediaAnalytics, SecurityInfo, UsersFlow). Note that we also have 4 custom developed plugins running (already for several years without problems).
  • Installed the DBStats plugin
  • Based on the error, the problem comes from the yearly archive of site id 11 (the site with about 95% of the total traffic). This was confirmed by running archiving only for other sites and other time periods, which completed successfully.

The DBStats plugin confirmed that the yearly archive is bloating the DB. Although also the monthly archive table seems to have way too many rows compared to previous months. Some cleanup seems to happen on some tables, because at least the blob_2018_10 table shrunk drastically in the last days. Other weird stuff seems to happen as there are big changes in tables that should no longer be touched, e.g. blob_2017_12 (if I'm understanding the archiving process correctly).

Here's are two screenshots of the report tables sizes:
5th of November:
screenshot_20181105_104648

8th of November:
screenshot_20181108_112526

Checking the DB directly, we can see the many duplicate archive entries (again, if I'm understanding the process correctly, this query should return no results):

SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_numeric_2018_01 GROUP BY idsite, date1, date2, period, name HAVING count > 1 ORDER BY count DESC;

Here's the first 10 results (the result set has 1550 rows):

+--------+------------+------------+--------+--------------------------------------+-------+
| idsite | date1      | date2      | period | name                                 | count |
+--------+------------+------------+--------+--------------------------------------+-------+
|     11 | 2018-01-01 | 2018-12-31 |      4 | bounce_count                         |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | max_actions                          |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | nb_actions                           |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | nb_visits                            |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | sum_visit_length                     |  2010 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | UserCountry_distinctCountries        |  1959 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_sum_time_generation          |  1925 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_nb_hits_with_time_generation |  1925 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_nb_pageviews                 |  1925 |
|     11 | 2018-01-01 | 2018-12-31 |      4 | Actions_nb_uniq_pageviews            |  1925 |
+--------+------------+------------+--------+--------------------------------------+-------+

Any hints what might be happening here and how we could go about fixing it? At the current rate of increased disk usage, we have about 2 weeks left before the disk is full. So first thing would be to find a way to delete duplicate archiving data manually to buy us more time. But we of course have to find a real solution that gets the yearly archiving working correctly again (which will hopefully also get the archive purging working again).

(This issue is related to #11322, #10439 and #7181)

@tsteur
Copy link
Member

tsteur commented Nov 8, 2018

Is browser archiving disabled? Have you thought about running optimize table on eg piwik_archive_blob_2018_01`? If there is lots of duplicate data, it may not change much though (I know InnoDB doesn't support optimize table but it'll recreate the table basically).

I'm not 100% sure but I think this query should give us some information on how many invalidated archives there are etc:

select `value`, period, count(*) from piwik_archive_numeric_2018_01 where `name` like "%done%" group by `value`, period

Do you know if the DB was already this size before updating to 3.6.1? Just checking to find out if there's a problem with archive deletion.

Also just to double check after changing the MySQL settings to fix MySQL server has gone away" MySQL was restarted or you double checked that the settings were applied?

@toebu
Copy link
Author

toebu commented Nov 9, 2018

Thanks for the fast answer!

  • Browser archiving is disabled
  • I didn't think about running optimize table. I might try that during during off-hours. But I doubt that it will help.
  • Here's the output of the query you proposed:
+-------+--------+----------+
| value | period | count(*) |
+-------+--------+----------+
|     1 |      1 |     2499 |
|     1 |      2 |      383 |
|     1 |      3 |       65 |
|     2 |      4 |       42 |
|     3 |      4 |     3572 |
|     4 |      1 |        4 |
|     4 |      2 |        9 |
|     4 |      3 |        9 |
|     4 |      4 |     2099 |
+-------+--------+----------+
  • I'm not completely sure if the DB was already that size before the update to 3.6.1, we only installed the DBStats plugin after. But given that the disk size usage increase stayed the same before and after the update, and that there's nothing else running on the server, I doubt that the update changed anything.
  • I restarted MySQL after changing the db settings, yep. I was wondering if it's possible that the packet that Matomo is trying to insert is bigger than 1 GB and so even the maximum possible max_allowed_packet is not big enough. Do you have any ideas how one might check this?

Another question I didn't yet mention: in #11322 the command ./console core:purge-old-archive-data all is used to clean up the db. Should I try that? Is this save to run (no risk of deleting too much data) and is there a risk of tables locking or the db otherwise being stressed so that is should be done during off-hours?

@tsteur
Copy link
Member

tsteur commented Nov 11, 2018

I restarted MySQL after changing the db settings, yep. I was wondering if it's possible that the packet that Matomo is trying to insert is bigger than 1 GB and so even the maximum possible max_allowed_packet is not big enough. Do you have any ideas how one might check this?

nope, not really unfortunately.

fyi it seems there are 3572 temporary archives, and 2099 invalidated archives. I'm not quite sure but this could be reasonable as I suppose many archives for the year might be temporary. The invalidated archives could be likely removed I suppose but hard to tell without knowing all the content.

There shouldn't be any risk of deleting too much data with this command, and least I haven't heard of anything. This plus optimize table could reduce the DB size.

@Mayeu
Copy link

Mayeu commented Nov 12, 2018

Dropping in since I am a colleague of @toebu.

I did run the table optimisation via mysqlcheck:

$ mysqlcheck -o piwik -u root -p

This ended up gaining a little bit, around 2 or 3 GB. The screenshot provided in the first comment was done after the optimisation.

@tsteur
Copy link
Member

tsteur commented Nov 12, 2018

I recommend you try running the other command to purge old archives and maybe this helps. It's hard to say why more archives exist in January than the others without looking at the data. Could be possible there were maybe some other websites tracked that were then deleted etc.

@mattab
Copy link
Member

mattab commented Nov 13, 2018

. It's hard to say why more archives exist in January than the others

fyi likely it's because January tables also store yearly archives.

@tsteur
Copy link
Member

tsteur commented Nov 13, 2018

I compared it with other DBs and there shouldn't be that many more for the yearly. Except when maybe some sites were deleted for example etc.

@toebu
Copy link
Author

toebu commented Nov 15, 2018

So, I have run the archive purging. It finished almost immediately and didn't change anything. So I'm guessing this task is normally executed through the cron job.

I have also found the command diagnostics:analyze-archive-table that is provided by the Matomo CLI. It gives the following summary for the 2018_01 period:

Total # Archives: 9987
Total # Invalidated Archives: 2497
Total # Temporary Archives: 4476
Total # Error Archives: 67
Total # Segment Archives: 6739
Total Size of Blobs: 11.8 G

For comparison, some other time period:

2017_01:
Total # Archives: 2255
Total # Invalidated Archives: 1023
Total # Temporary Archives: 0
Total # Error Archives: 0
Total # Segment Archives: 2175
Total Size of Blobs: 71.9 M

2018_08:
Total # Archives: 1934
Total # Invalidated Archives: 91
Total # Temporary Archives: 0
Total # Error Archives: 0
Total # Segment Archives: 1718
Total Size of Blobs: 51.3 M

2018_09:
Total # Archives: 1506
Total # Invalidated Archives: 0
Total # Temporary Archives: 0
Total # Error Archives: 0
Total # Segment Archives: 1296
Total Size of Blobs: 46.8 M

2018_10:
Total # Archives: 8932
Total # Invalidated Archives: 12
Total # Temporary Archives: 7342
Total # Error Archives: 0
Total # Segment Archives: 5492
Total Size of Blobs: 146.1 M

2018_11:
Total # Archives: 29017
Total # Invalidated Archives: 32
Total # Temporary Archives: 28396
Total # Error Archives: 0
Total # Segment Archives: 21090
Total Size of Blobs: 1.2 G

I also ran table optimisation again (mysqlcheck -o piwik -u root -p). This didn't change anything significant either.

@tsteur
Copy link
Member

tsteur commented Nov 15, 2018

You may wanna check if there were more idsites in that time range or so maybe by executing

select distinct idsite from piwik_archive_blob_2018_01 group by idsite;
select distinct idsite from piwik_archive_blob_2017_01 group by idsite;

etc.

It's hard to say what is happening there otherwise and whether it is a bug or legit without knowing the exact contents.

If I was you, I would probably make a backup of the table, manually delete all temporary and invalid archives (or even better the whole piwik_archive_blob_2018_01 and piwik_archive_numeric_2018_01), run the archiver again, and see what happens.

Those archives can be identified through the done flags that are stored in the value column in the numeric table. Eg value 3 identifies a temporary table, 4 an invalidated table.

@toebu
Copy link
Author

toebu commented Nov 22, 2018

Sorry for not getting back earlier.

We have finally found the source of the issue. After an embarrassingly long time, we found the following crucial message in the syslog:

[ERROR] InnoDB: The total blob data length (11161224) is greater than 10% of the total redo log size (100663296). Please increase total redo log size.

We therefore changed the MySQL config innodb_log_file_size = 128M (it previously was at the default, which is 48MB). After the server was restarted and the new log file size was in effect, Matomo archiving finally finished successfully again.

It now took a couple of days for Matomo to clean up the tables, but the DB is finally down to a more normal size again. The diagnostics:analyze-archive-table command for the period 2018_01 now shows:

Total # Archives: 3221
Total # Invalidated Archives: 82
Total # Temporary Archives: 191
Total # Error Archives: 1
Total # Segment Archives: 2971
Total Size of Blobs: 326.8 M

All in all, this was a DB issue, not an issue of Matomo. If you agree, this additional scenario for the cause of the Mysql Server has gone away error should be added to this FAQ question. Are these FAQ stores in a repo? If yes, I could make a PR with an updated answer.

@tsteur
Copy link
Member

tsteur commented Nov 22, 2018

Updated the FAQ 👍 That's an awesome find and be for sure super valuable. I'll check if I find related open issues. Cheers for that and glad the size reduces now 🚀

@tsteur tsteur closed this as completed Nov 22, 2018
@tsteur tsteur added the answered For when a question was asked and we referred to forum or answered it. label Nov 22, 2018
@toebu
Copy link
Author

toebu commented Nov 26, 2018

Awesome, thanks again for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it.
Projects
None yet
Development

No branches or pull requests

4 participants