Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Database, Deleting Old Raw Data Seems to Not Work #17431

Closed
mddvul22 opened this issue Apr 7, 2021 · 18 comments
Closed

Large Database, Deleting Old Raw Data Seems to Not Work #17431

mddvul22 opened this issue Apr 7, 2021 · 18 comments
Labels
answered For when a question was asked and we referred to forum or answered it.

Comments

@mddvul22
Copy link

mddvul22 commented Apr 7, 2021

We have, what I believe is a very large Matomo database. Up until this week, we have never scheduled any regular deletion of old raw data. Some example table sizes in our database, as of two days ago:
piwik_log_link_visit_action.ibd : 1009G
piwik_log_visit.ibd: 161G
piwik_log_action.ibd: 19G

As you can see, those three tables, combined are over 1TB.

So, yesterday, I configured the section: "Regularly delete old raw data", and told it to delete logs older than 730 days.
image

Well, according to Matomo, it deleted data earlier this morning:
image

But, this morning I logged in to our server and checked and our database has not gotten smaller, at all. Today, it shows:

piwik_log_link_visit_action.ibd : 1010G
piwik_lpiwik_log_action.ibd: 162G
piwik_log_action.ibd: 19G

Why aren't these tables getting smaller? According to this page: https://matomo.org/faq/troubleshooting/faq_42/ it appears that deleting raw data should "free significant database space". Am I doing something wrong?

Expected Behavior

I would expect that those tables would have gotten much smaller.

Current Behavior

Two of the tables actually increased in size. I think the increase is because of new data having been imported overnight. So, most likely, the purge of old data did nothing.

Context

Our database is becoming unmanageable and is causing problems for us, with backups and with Matomo upgrades. We need to reduce its size.

Your Environment

  • Matomo Version: 3.14.1
  • PHP Version: 7.2.24
  • Server Operating System: CentOS 8.3
  • Additionally installed plugins:
  • Browser: Firefox
  • Operating System: Linux
@mddvul22 mddvul22 added the Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. label Apr 7, 2021
@diosmosis
Copy link
Member

Hi @mddvul22, it should have worked to delete the logs, it's possible the task that deletes the logs is failing somehow... would you be able to check the following?

  • Find the visit_last_action_time and idvisit of the first visit in your log_visit table. If it's newer than 730 days, then it's possible the data gets deleted but server space isn't freed for some reason.
  • If you're not seeing any data being deleted, you could try running the delete logs command (note: this would take a lot of time to finish if it runs successfully), for example:
./console core:delete-logs-data --dates=2001-01-01,2019-04-08 --optimize-tables

(Note: 2019-04-08 is 730 days before today, or thereabouts. You could do a shorter range just to test. Also note that optimizing the tables also takes a long time but might also be the reason you're not seeing a change in table size.)

@mddvul22
Copy link
Author

mddvul22 commented Apr 8, 2021

Thanks @diosmosis When looking at the log_visit table, it appears that the data did get deleted. The oldest data in that table is dated April 8th, 2019. What should I be seeing when I look at the log_link_visit_action table? The server_time column of the first row in that table is showing a timestamp of "2014-04-1 13:35:07".

@diosmosis
Copy link
Member

@mddvul22

For the log_visit table, would you be able to OPTIMIZE it to see if it decreases the size? This should be done as part of the purging process, but if the mysql user is not allowed to or it's disabled by matomo config, it won't happen.

Regarding the log_link_visit_action table, rows should be deleted from them as well, however, we don't delete via server_time, but through idvisit. We collect the list of idvisits from log_visit to delete, then delete entries in log_link_visit_action whose idvisit matches one. Can you check if the first row in that table has an idvisit that is still in log_visit?

log_action should also have rows deleted. This is done by deleting every row that is no longer referenced in other tables, and is a rather slow process.

@mddvul22
Copy link
Author

Thanks @diosmosis I think I'm making progress. I ran the following command (giving it a shorter time period, as you mentioned):

./console core:delete-logs-data --dates=2001-01-01,2015-04-08 --optimize-tables

After a few hours, I got the error that PHP had run out of memory. I was already giving it 2GB. But I raised it to 3GB, and started over. This worked and the first step of deleting visits completed. So, I think the core problem the other day was that it ran out of memory.

Then, it started optimizing table log_visit and completed, successfully. Then, it started optimizing table log_link_visit_action. Here is where I ran into a problem that I've not seen before:

Optimizing table log_link_visit_action... WARNING [2021-04-10 10:50:32] 1976738 /var/www/webanalytics/matomo/libs/Zend/Db/Statement/Pdo.php(228): Warning - PDOStatement::exec
ute(): MySQL server has gone away - Matomo 3.14.1 - Please report this message in the Matomo forums: https://forum.matomo.org (please do a search first as it might have been r
eported already)
WARNING [2021-04-10 10:50:32] 1976738 /var/www/webanalytics/matomo/libs/Zend/Db/Statement/Pdo.php(228): Warning - PDOStatement::execute(): Error reading result set's header -
Matomo 3.14.1 - Please report this message in the Matomo forums: https://forum.matomo.org (please do a search first as it might have been reported already)
ERROR [2021-04-10 10:50:32] 1976738 Uncaught exception: /var/www/webanalytics/matomo/libs/Zend/Db/Statement/Pdo.php(234): SQLSTATE[HY000]: General error: 2006 MySQL server ha
s gone away
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
#0 /var/www/webanalytics/matomo/libs/Zend/Db/Statement.php(300): Zend_Db_Statement_Pdo->_execute(Array)
#1 /var/www/webanalytics/matomo/libs/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array)
#2 /var/www/webanalytics/matomo/libs/Zend/Db/Adapter/Pdo/Abstract.php(238): Zend_Db_Adapter_Abstract->query('OPTIMIZE TABLE ...', Array)
#3 /var/www/webanalytics/matomo/core/Db/Adapter/Pdo/Mysql.php(309): Zend_Db_Adapter_Pdo_Abstract->query('OPTIMIZE TABLE ...', Array)
#4 /var/www/webanalytics/matomo/core/Db.php(276): Piwik\Db\Adapter\Pdo\Mysql->query('OPTIMIZE TABLE ...', Array)
#5 /var/www/webanalytics/matomo/core/Db.php(460): Piwik\Db::query('OPTIMIZE TABLE ...')
#6 /var/www/webanalytics/matomo/plugins/CoreAdminHome/Commands/DeleteLogsData.php(190): Piwik\Db::optimizeTables(Array)
#7 /var/www/webanalytics/matomo/plugins/CoreAdminHome/Commands/DeleteLogsData.php(101): Piwik\Plugins\CoreAdminHome\Commands\DeleteLogsData->optimizeTables(Object(Symfony\Comp
onent\Console\Output\ConsoleOutput))
#8 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(257): Piwik\Plugins\CoreAdminHome\Commands\DeleteLogsData->execute(Object(
Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#9 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component
Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#10 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Piwik\Plugin
s\CoreAdminHome\Commands\DeleteLogsData), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#11 [internal function]: Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput)
)
#12 /var/www/webanalytics/matomo/core/Console.php(140): call_user_func(Array, Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\Consol
eOutput))
#13 /var/www/webanalytics/matomo/core/Access.php(644): Piwik\Console->Piwik{closure}()
#14 /var/www/webanalytics/matomo/core/Console.php(141): Piwik\Access::doAsSuperUser(Object(Closure))
#15 /var/www/webanalytics/matomo/core/Console.php(93): Piwik\Console->doRunImpl(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\Cons
oleOutput))
#16 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Piwik\Console->doRun(Object(Symfony\Component\Console\Input\ArgvInput),
Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /var/www/webanalytics/matomo/console(32): Symfony\Component\Console\Application->run()
#18 {main},
caused by: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
#0 /var/www/webanalytics/matomo/libs/Zend/Db/Statement/Pdo.php(228): PDOStatement->execute(Array)
#1 /var/www/webanalytics/matomo/libs/Zend/Db/Statement.php(300): Zend_Db_Statement_Pdo->_execute(Array)
#2 /var/www/webanalytics/matomo/libs/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array)
#3 /var/www/webanalytics/matomo/libs/Zend/Db/Adapter/Pdo/Abstract.php(238): Zend_Db_Adapter_Abstract->query('OPTIMIZE TABLE ...', Array)
#4 /var/www/webanalytics/matomo/core/Db/Adapter/Pdo/Mysql.php(309): Zend_Db_Adapter_Pdo_Abstract->query('OPTIMIZE TABLE ...', Array)
#5 /var/www/webanalytics/matomo/core/Db.php(276): Piwik\Db\Adapter\Pdo\Mysql->query('OPTIMIZE TABLE ...', Array)
#6 /var/www/webanalytics/matomo/core/Db.php(460): Piwik\Db::query('OPTIMIZE TABLE ...')
#7 /var/www/webanalytics/matomo/plugins/CoreAdminHome/Commands/DeleteLogsData.php(190): Piwik\Db::optimizeTables(Array)
#8 /var/www/webanalytics/matomo/plugins/CoreAdminHome/Commands/DeleteLogsData.php(101): Piwik\Plugins\CoreAdminHome\Commands\DeleteLogsData->optimizeTables(Object(Symfony\Component\Console\Output\ConsoleOutput))
#9 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(257): Piwik\Plugins\CoreAdminHome\Commands\DeleteLogsData->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#10 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#11 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Piwik\Plugins\CoreAdminHome\Commands\DeleteLogsData), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#12 [internal function]: Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#13 /var/www/webanalytics/matomo/core/Console.php(140): call_user_func(Array, Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#14 /var/www/webanalytics/matomo/core/Access.php(644): Piwik\Console->Piwik{closure}()
#15 /var/www/webanalytics/matomo/core/Console.php(141): Piwik\Access::doAsSuperUser(Object(Closure))
#16 /var/www/webanalytics/matomo/core/Console.php(93): Piwik\Console->doRunImpl(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /var/www/webanalytics/matomo/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Piwik\Console->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /var/www/webanalytics/matomo/console(32): Symfony\Component\Console\Application->run()
#19 {main}

[Zend_Db_Statement_Exception]
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

[PDOException]
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

core:delete-logs-data [--dates="..."] [--idsite[="..."]] [--limit="..."] [--optimize-tables]

I'm not really sure what that means. The mysql server is still running. It did reduce the size of the piwik_log_link_visit_action table from 1010GB to 844GB.

Do you have any ideas about the mysql server has gone away error?

Thanks!

@diosmosis
Copy link
Member

@mddvul22 it means the connection was automatically closed because it hadn't been used in a certain amount of time. We have a faq about how to fix these errors: https://matomo.org/faq/troubleshooting/faq_183/. Can you see whether any of these work for you?

@mddvul22
Copy link
Author

Hi @diosmosis I've been following that page you linked to for the past several days, raising the values each time and retrying the command to optimize tables, and still it fails each time with:

[PDOException]
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

Here are my current settings, as specified on that page:

I don't have wait_timeout specified. But according to the documentation online, 28800 is already the default. Other values, as requested in that link:
innodb_log_file_size = 1024M
max_allowed_packet = 1024M

And, if I'm understanding my matomo config/global.ini.php file correctly, we are already using PDO/MYSQL:
adapter = PDO\MYSQL

@diosmosis
Copy link
Member

Hi @mddvul22, is it taking longer than 28800 seconds (8 hours) to see this error or less time? Are you able to check if there are any long running DELETE queries in your mysql via SHOW PROCESSLIST?

@mddvul22
Copy link
Author

Definitely more than 8 hours to see the error. I'm now running the command again, but this time I prefaced it with "time", so I can get a real idea of when the error occurs. Also, I'll check what SHOW PROCESSLIST can show throughout this run time.

@diosmosis
Copy link
Member

@mddvul22 ok, if it takes more than 8 hours, it might be that there's just too much data to remove. Upping the wait_timeout to something higher might help for this initial run.

@mddvul22
Copy link
Author

Ok, I've raised the wait_timeout to 86400. Now running the command again.

@mddvul22
Copy link
Author

Ran the script yesterday, after increasing the wait_timeout to 86400 (24 hours). I got the same error as before. It appears to have generated the error just about 10 minutes ago. At this moment, SHOW_PROCESSLIST doesn't show any long running processes, but I wasn't on the server just before it died. It ran for 16 hours and 4 minutes before it generated the error. So, it appears that the problem is not wait_timeout. I will also add that the error comes during the optimization of the log_link_visit_action table:

Optimizing table log_visit... done. Time elapsed: 5718.357s
Optimizing table log_link_visit_action... WARNING [2021-04-16 06:52:05] 2907794 /var/www/webanalytics/matomo/libs/Zend/Db/Statement/Pdo.php(228): Warning - PDOStatement::execute(): MySQL server has gone away - Matomo 3.14.1 - Please report this message in the Matomo forums: https://forum.matomo.org (please do a search first as it might have been reported already)

@diosmosis
Copy link
Member

@mddvul22 the tables are optimized at the end of the process. Could you try and manually optimize the tables?

@mddvul22
Copy link
Author

I'd be happy to try. The command I have been running, all along, is:

./console core:delete-logs-data --dates=2001-01-01,2019-04-08 --optimize-tables

What should I run for manually optimizing the tables? Is it just:
./console database:optimize-archive-tables all

-Bryan

@diosmosis
Copy link
Member

@mddvul22 I meant running the sql to optimize a single table at a time. If it takes so long to optimize a single table that the mysql connection times out, it might be better to run the queries one by one yourself. I'm hoping this is just because the tables are so large right now, and won't be an issue on subsequent runs. Given the log_visit table optimize finishes you can probably skip that one. (I'd also check the size of the log_visit table to see if it was reduced from the size mentioned in the first comment you made here.)

@mddvul22
Copy link
Author

The piwik_log_visit.ibd is definitely smaller. Originally, it was 161GB. Now, it is 138GB. Last night, I ran "OPTIMIZE TABLE piwik_log_link_visit_action;" from inside mariadb. In the mysql output, before I went to bed, it showed that it had completed 100% of the optimization, but it was still doing something, as I had not been returned to a mysql prompt. Checking it this morning, I see that I got the same error:

ERROR 2013 (HY000): Lost connection to MySQL server during query

So, looking at the size of the table, it has gotten much smaller. It is now 775GB, as opposed to its original 1009GB. Here is what I see in /var/log/messages:

Apr 17 04:42:11 sitka kernel: mysqld[2991042]: segfault at 0 ip 000055c21df95a47 sp 00007fa853bafa50 error 6 in mysqld[55c21d315000+1228000]
Apr 17 04:42:11 sitka kernel: Code: c7 45 00 00 00 00 00 8b 7d cc 4c 89 e2 4c 89 f6 e8 5e 4b 83 ff 49 89 c7 49 39 c4 0f 84 02 01 00 00 e8 4d 19 00 00 41 8b 4d 00 <89> 08 85 c9
74 33 49 83 ff ff 0f 84 a9 00 00 00 f6 c3 06 75 24 4d
Apr 17 04:42:11 sitka systemd[1]: Started Process Core Dump (PID 3214473/UID 0).
Apr 17 04:42:15 sitka systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
Apr 17 04:42:15 sitka systemd[1]: mariadb.service: Failed with result 'signal'.

Then there is a long log entry that begins with:
Apr 17 04:42:18 sitka systemd-coredump[3214474]: Process 2991022 (mysqld) of user 27 dumped core ...

Any thoughts? BTW, this machine has 16GB of RAM if that is important.

@diosmosis
Copy link
Member

It sounds like the connection timed out but the query still ran (as it should). I think the tables are just too large to optimize in time.

I would suggest setting the [General] enable_sql_optimize_queries INI option to 0 to make sure Matomo doesn't optimize the tables and run into a connection timeout. Then every week or every month or however often you'd want to run a query like that, optimize them outside of Matomo (ie, manually or via a script).

You'll also want to run OPTIMIZE on the other tables to get their size down.

@mddvul22
Copy link
Author

I wanted to come back to this. I've continued to work on this.

I finally got our raw logs trimmed down to our desired two years. I then completed the table optimization via the mysql command line. Then, I made a few php.ini changes, based upon the information on this page: https://haydenjames.io/mysql-server-has-gone-away-error-solutions/

Finally, I ran the following command last night:

./console core:delete-logs-data --dates=2001-01-01,2019-04-21 --optimize-tables

And it completed successfully! I'm not sure whether the changes to php.ini helped, or not. Perhaps it was just getting the database down to a size that Matomo could manage. Regardless, I was happy it worked without failure.

@diosmosis
Copy link
Member

That's great to hear @mddvul22! I'll close this since it's working for you.

@mattab mattab added the answered For when a question was asked and we referred to forum or answered it. label May 17, 2021
@tsteur tsteur removed the Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. label Sep 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it.
Projects
None yet
Development

No branches or pull requests

4 participants