New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archive tables not cleaned up and growing uncontrollably #13685
Comments
Is browser archiving disabled? Have you thought about running I'm not 100% sure but I think this query should give us some information on how many invalidated archives there are etc: select `value`, period, count(*) from piwik_archive_numeric_2018_01 where `name` like "%done%" group by `value`, period Do you know if the DB was already this size before updating to 3.6.1? Just checking to find out if there's a problem with archive deletion. Also just to double check after changing the MySQL settings to fix |
Thanks for the fast answer!
Another question I didn't yet mention: in #11322 the command |
nope, not really unfortunately. fyi it seems there are 3572 temporary archives, and 2099 invalidated archives. I'm not quite sure but this could be reasonable as I suppose many archives for the year might be temporary. The invalidated archives could be likely removed I suppose but hard to tell without knowing all the content. There shouldn't be any risk of deleting too much data with this command, and least I haven't heard of anything. This plus optimize table could reduce the DB size. |
Dropping in since I am a colleague of @toebu. I did run the table optimisation via
This ended up gaining a little bit, around 2 or 3 GB. The screenshot provided in the first comment was done after the optimisation. |
I recommend you try running the other command to purge old archives and maybe this helps. It's hard to say why more archives exist in January than the others without looking at the data. Could be possible there were maybe some other websites tracked that were then deleted etc. |
fyi likely it's because January tables also store yearly archives. |
I compared it with other DBs and there shouldn't be that many more for the yearly. Except when maybe some sites were deleted for example etc. |
So, I have run the archive purging. It finished almost immediately and didn't change anything. So I'm guessing this task is normally executed through the cron job. I have also found the command
For comparison, some other time period:
I also ran table optimisation again ( |
You may wanna check if there were more idsites in that time range or so maybe by executing
etc. It's hard to say what is happening there otherwise and whether it is a bug or legit without knowing the exact contents. If I was you, I would probably make a backup of the table, manually delete all temporary and invalid archives (or even better the whole Those archives can be identified through the done flags that are stored in the value column in the numeric table. Eg value |
Sorry for not getting back earlier. We have finally found the source of the issue. After an embarrassingly long time, we found the following crucial message in the syslog:
We therefore changed the MySQL config It now took a couple of days for Matomo to clean up the tables, but the DB is finally down to a more normal size again. The
All in all, this was a DB issue, not an issue of Matomo. If you agree, this additional scenario for the cause of the |
Updated the FAQ 👍 That's an awesome find and be for sure super valuable. I'll check if I find related open issues. Cheers for that and glad the size reduces now 🚀 |
Awesome, thanks again for your support! |
Since about 3 weeks the archive data in one of our Matomo instances is growing at a much larger rate than usual. There was no change on the server or the traffic that could explain this:
The log of the archiving showed the following error, that started at around the same time than the database started to increase in size:
Error: Got invalid response from API request: ?module=API&method=API.get&idSite=11&period=year&date=last7&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:63:"SQLSTATE[HY000]: General error: 2006 MySQL server has gone away";}'
Following the recommendations for that error from the FAQ #183, we tried the following:
max_allowed_packet
to 1GBwait_timeout
was already set to 8 hours (and the archiving script only runs for a few minutes)This didn't change anything, the same error is still happening.
Other measures we took:
The DBStats plugin confirmed that the yearly archive is bloating the DB. Although also the monthly archive table seems to have way too many rows compared to previous months. Some cleanup seems to happen on some tables, because at least the blob_2018_10 table shrunk drastically in the last days. Other weird stuff seems to happen as there are big changes in tables that should no longer be touched, e.g. blob_2017_12 (if I'm understanding the archiving process correctly).
Here's are two screenshots of the report tables sizes:
5th of November:
8th of November:
Checking the DB directly, we can see the many duplicate archive entries (again, if I'm understanding the process correctly, this query should return no results):
SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_numeric_2018_01 GROUP BY idsite, date1, date2, period, name HAVING count > 1 ORDER BY count DESC;
Here's the first 10 results (the result set has 1550 rows):
Any hints what might be happening here and how we could go about fixing it? At the current rate of increased disk usage, we have about 2 weeks left before the disk is full. So first thing would be to find a way to delete duplicate archiving data manually to buy us more time. But we of course have to find a real solution that gets the yearly archiving working correctly again (which will hopefully also get the archive purging working again).
(This issue is related to #11322, #10439 and #7181)
The text was updated successfully, but these errors were encountered: