Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle the log_* tables when years of data and hundreds of Gb #6848

Open
mattab opened this issue Dec 12, 2014 · 7 comments
Open

How to handle the log_* tables when years of data and hundreds of Gb #6848

mattab opened this issue Dec 12, 2014 · 7 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo.

Comments

@mattab
Copy link
Member

mattab commented Dec 12, 2014

As time goes by, users store more and more historical data in Piwik (and the Cloud!) and as data grows, we must think of how to store it efficiently and keep Piwik manageable. There are loads of users who have 100Gb+ databases and at InnoCraft we have a client with a 600Gb database.

The goal of this issue is to discuss how to approach the technical challenges of Piwik storing hundreds of Gigabytes of log_* data. One particular problem is that DB schema Upgrades can take days or even weeks on such huge databases. How can we make the situation better?

@mattab mattab added c: Performance For when we could improve the performance / speed of Matomo. RFC Indicates the issue is a request for comments where the author is looking for feedback. labels Dec 12, 2014
@mattab mattab added this to the Short term milestone Dec 12, 2014
@mrjoops
Copy link

mrjoops commented Jan 29, 2015

As you pointed it out, one particular problem is the DB schema upgrade. I have a 160GB database and it took nearly and entire day to upgrade from 2.8.1 to 2.10.0.
Actually, it could be more acceptable if only tracking worked during that time.
Upgrading the core tables was "quite fast" whereas the archive tables took soooo long and we lost an entire day of data.

@mrjoops
Copy link

mrjoops commented Jan 29, 2015

I know the piwik team is thinking about implementing alternate DBMS so it may not be good idea to focus on a mysql-specific solution here (cf. idea 3).

@mattkolb
Copy link

We are still several versions back because of this issue. Basically, what we have come down to is an annual Piwik upgrade cycle. We want to keep the old log tables, but they will never be upgraded. So, once a year, we make a backup of all of the log tables, remove all but the most recent data, and run an upgrade. Even if the process that removes data after X days archived the data instead of deleting it, that would be good enough for us to upgrade on a more regular basis.

@mnapoli
Copy link
Contributor

mnapoli commented Mar 23, 2015

@mattkolb You can setup Piwik to purge old log data (e.g. older than 180 days), see this FAQ.

@mattab
Copy link
Member Author

mattab commented Apr 7, 2015

A first step in the right direction for this issue would be to inform all Piwik users of the best database to use that would allow hot ALTER schema upgrades. For example Mariadb and Mysql 5.6 have good features.

feel free to share your findings so it helps other Piwik admins

@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Apr 9, 2015
@mattab mattab modified the milestones: Mid term, Short term Apr 9, 2015
@mattab mattab removed the RFC Indicates the issue is a request for comments where the author is looking for feedback. label Sep 13, 2015
@mattkolb
Copy link

mattkolb commented Oct 8, 2015

I'm aware of the purging feature. Purging is not the same as archiving. With purging there really is nothing to go back to. By archiving (aka something like "INSERT INTO piwik_log_visit_archive SELECT * FROM piwik_log_visit WHERE idvisit < X" followed by a DELETE), it allows for a backup and makes update time go a lot faster. You only need to have Piwik down during the ALTER TABLE for piwik_log_visit and can bring it back up while the ALTER TABLE runs for the archive.

@mattab
Copy link
Member Author

mattab commented Dec 22, 2015

it allows for a backup and makes update time go a lot faster.

fyi: If we had such an archive log table, we would still need to Update this archive log table via ALTER and it would be time consuming. advantage is that such ALTER wouldn't "block" tracking in log_* tables, but still one would have to wait hours or days for the archive log tables to be altered. We recommend to use tokudb and mariadb for now, which provide the best performance. feel free to talk to us also as we can help with high traffic piwik servers.

@mattab mattab added duplicate For issues that already existed in our issue tracker and were reported previously. and removed Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. labels Dec 5, 2016
@mattab mattab modified the milestones: Long term, Mid term Dec 5, 2016
@mattab mattab removed the duplicate For issues that already existed in our issue tracker and were reported previously. label Dec 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

No branches or pull requests

4 participants