New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle the log_* tables when years of data and hundreds of Gb #6848
Comments
As you pointed it out, one particular problem is the DB schema upgrade. I have a 160GB database and it took nearly and entire day to upgrade from 2.8.1 to 2.10.0. |
I know the piwik team is thinking about implementing alternate DBMS so it may not be good idea to focus on a mysql-specific solution here (cf. idea 3). |
We are still several versions back because of this issue. Basically, what we have come down to is an annual Piwik upgrade cycle. We want to keep the old log tables, but they will never be upgraded. So, once a year, we make a backup of all of the log tables, remove all but the most recent data, and run an upgrade. Even if the process that removes data after X days archived the data instead of deleting it, that would be good enough for us to upgrade on a more regular basis. |
A first step in the right direction for this issue would be to inform all Piwik users of the best database to use that would allow hot ALTER schema upgrades. For example Mariadb and Mysql 5.6 have good features. feel free to share your findings so it helps other Piwik admins |
I'm aware of the purging feature. Purging is not the same as archiving. With purging there really is nothing to go back to. By archiving (aka something like "INSERT INTO piwik_log_visit_archive SELECT * FROM piwik_log_visit WHERE idvisit < X" followed by a DELETE), it allows for a backup and makes update time go a lot faster. You only need to have Piwik down during the ALTER TABLE for piwik_log_visit and can bring it back up while the ALTER TABLE runs for the archive. |
fyi: If we had such an archive log table, we would still need to Update this archive log table via |
As time goes by, users store more and more historical data in Piwik (and the Cloud!) and as data grows, we must think of how to store it efficiently and keep Piwik manageable. There are loads of users who have 100Gb+ databases and at InnoCraft we have a client with a 600Gb database.
The goal of this issue is to discuss how to approach the technical challenges of Piwik storing hundreds of Gigabytes of log_* data. One particular problem is that DB schema Upgrades can take days or even weeks on such huge databases. How can we make the situation better?
The text was updated successfully, but these errors were encountered: