New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auxiliary database support for big instances #7554
Comments
Yeah it would be great for you could specify a read only database to use. I would be hesitant to use it for archiving or something of the sort though, what if the data is out of sync and you don't know it? A read only dB would be great for use in the front end when displaying reports so as not to take up connections on the master server |
To take away problems with Tracker performance let's try first using QueueTracking: https://github.com/piwik/plugin-QueuedTracking after using this plugin, performance will get much better as it's wrapped within a transaction. Maybe we will still have tracker IO contention issues after using QueuedTracking, but let's try this first (it will happen in the next days and weeks) |
As @MooASh mentioned for archiving it might not work easily / might be risky since there can be a delay on the slave etc. Reports in the UI / API shouldn't bother the database too much as they are more or less simple selects and there shouldn't be too many entries. Starting from Piwik 2.13 the archive blob tables should have much less entries and be faster. The archive tables can become maybe a problem when having like thousands of websites and thousands of segments. It might be worth for the Live API if one queries it a lot but then again if there's a delay it might show outdated data and that's not always ok. The problem is rather the As you said using slave for archiving it would be nice but probably can have side effects. Would require some investigation. @mattab If tracker performance is too slow because of long running archiving all the time, it will make inserting from Queue to DB rather slow as well. Queue does not really have a huge performance improvement over the Tracker if at all. The queue was mainly built to handle peaks easily and to avoid having too many connections to the database etc, not for performance reasons. The insertion from Queue to DB can be faster than normal tracking, but that's rather because of having things cached in PHP across requests and not necessarily related to DB. Depending on the number of log entries it can be faster because of transactions but doesn't have to be. |
what comes to mind after quickly thinking about this:
|
Those kinda things came to my mind as well. It'd be worth it, if it actually relieves the database / tracking. Therefore more importantly, before working on this, we need to profile the database on an instance where there are problems and collect "evidence" that this is actually a problem and that it would make things "faster" and how much etc. |
There are 2 situations where this approach definitely will help:
|
This feature would be very helpful. Currently the Db deadlocks cause the front end to fail to load. |
Goal of this ticket is to discuss and hopefully implement solution allowing single Piwik instance to query two replicant databases.
Outline of issue:
However slave servers could be put to much better use if we can redirect archiving queries onto slave db.
Long running and heavy queries could be triggered against slave, then results processed in PHP and write archives to master server.
This would significantly relieve tracker, take away some other problems with db performance when traffic peak meets long running archiving.
For initial scope, maybe only select queries triggered by LogAggregator could be launched against auxiliary db server, and in future we add option for developers to decide which db to use.
Even putting basic LogAggregator queries into slave would give significant boost to performance.
What do you think?
The text was updated successfully, but these errors were encountered: