Add auxiliary database support for big instances #7554

mgazdzik · 2015-03-27T09:32:19Z

Goal of this ticket is to discuss and hopefully implement solution allowing single Piwik instance to query two replicant databases.

Outline of issue:

big Piwik instances often have master-slave db replication,
currently only use (in our case) of slave servers is to perform db backups, and in emergency cases to switch into master to provide HA

However slave servers could be put to much better use if we can redirect archiving queries onto slave db.
Long running and heavy queries could be triggered against slave, then results processed in PHP and write archives to master server.
This would significantly relieve tracker, take away some other problems with db performance when traffic peak meets long running archiving.

For initial scope, maybe only select queries triggered by LogAggregator could be launched against auxiliary db server, and in future we add option for developers to decide which db to use.
Even putting basic LogAggregator queries into slave would give significant boost to performance.

What do you think?

jloh · 2015-03-28T07:26:13Z

Yeah it would be great for you could specify a read only database to use. I would be hesitant to use it for archiving or something of the sort though, what if the data is out of sync and you don't know it?

A read only dB would be great for use in the front end when displaying reports so as not to take up connections on the master server

mattab · 2015-03-29T23:33:01Z

This would significantly relieve tracker, take away some other problems with db performance when traffic peak meets long running archiving.

To take away problems with Tracker performance let's try first using QueueTracking: https://github.com/piwik/plugin-QueuedTracking

after using this plugin, performance will get much better as it's wrapped within a transaction. Maybe we will still have tracker IO contention issues after using QueuedTracking, but let's try this first (it will happen in the next days and weeks)

tsteur · 2015-04-23T21:07:19Z

As @MooASh mentioned for archiving it might not work easily / might be risky since there can be a delay on the slave etc.

Reports in the UI / API shouldn't bother the database too much as they are more or less simple selects and there shouldn't be too many entries. Starting from Piwik 2.13 the archive blob tables should have much less entries and be faster. The archive tables can become maybe a problem when having like thousands of websites and thousands of segments.

It might be worth for the Live API if one queries it a lot but then again if there's a delay it might show outdated data and that's not always ok.

The problem is rather the log tables. Here, one can purge log entries eg > 6 months when those tables become too big. Partitioning might also help, haven't tried it yet. Other DB's like TokuDB might help as well.

As you said using slave for archiving it would be nice but probably can have side effects. Would require some investigation. @mattab If tracker performance is too slow because of long running archiving all the time, it will make inserting from Queue to DB rather slow as well. Queue does not really have a huge performance improvement over the Tracker if at all. The queue was mainly built to handle peaks easily and to avoid having too many connections to the database etc, not for performance reasons. The insertion from Queue to DB can be faster than normal tracking, but that's rather because of having things cached in PHP across requests and not necessarily related to DB. Depending on the number of log entries it can be faster because of transactions but doesn't have to be.

mattab · 2015-04-27T01:15:04Z

As you said using slave for archiving it would be nice but probably can have side effects. Would require some investigation.

what comes to mind after quickly thinking about this:

when we run archiving queries on slave, how do we make sure the slave is not "too far behind" so we don't run queries on invalid / incompete log_* data ?
to run archiving on the slave, if we query log_* tables on the slave and insert the archive blobs in master DB, then I guess the only way to do this is to connect to 2 DBs at once in same PHP process and decide where to read/write from.
to do: investigate further what edge cases there are

tsteur · 2015-04-27T20:29:11Z

Those kinda things came to my mind as well. It'd be worth it, if it actually relieves the database / tracking.

Therefore more importantly, before working on this, we need to profile the database on an instance where there are problems and collect "evidence" that this is actually a problem and that it would make things "faster" and how much etc.

quba · 2015-06-17T10:41:41Z

There are 2 situations where this approach definitely will help:

querying Live API (or in general APIs) - especially Live API with segments applied,
realtime segments - in worst case scenario we would have DB replication lag instead of issues with DB used to track visitors.

esapozhnikov-wish · 2015-07-21T17:10:52Z

This feature would be very helpful.

Currently the Db deadlocks cause the front end to fail to load.
For people like myself who use a separate server for archiving it would be very useful to have a front end with read-only Db so that the front end always loads reliably.

mgazdzik added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. RFC Indicates the issue is a request for comments where the author is looking for feedback. labels Mar 27, 2015

mattab removed the RFC Indicates the issue is a request for comments where the author is looking for feedback. label Mar 29, 2015

mattab changed the title ~~Add auxuliary database support for big instances~~ Add auxiliary database support for big instances Mar 29, 2015

mattab added this to the Mid term milestone Mar 29, 2015

mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Jun 12, 2015

mattab mentioned this issue Jun 22, 2015

Sequence update improvements #8148

Open

mattab added the Lower priority label Dec 5, 2016

mattab modified the milestones: Long term, Mid term Dec 5, 2016

mattab removed the Lower priority label Sep 13, 2018

tsteur mentioned this issue Jul 21, 2019

Support configuration of a reader database #14681

Merged

tsteur closed this as completed in #14681 Jul 22, 2019

mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add auxiliary database support for big instances #7554

Add auxiliary database support for big instances #7554

mgazdzik commented Mar 27, 2015

jloh commented Mar 28, 2015

mattab commented Mar 29, 2015

tsteur commented Apr 23, 2015

mattab commented Apr 27, 2015

tsteur commented Apr 27, 2015

quba commented Jun 17, 2015

esapozhnikov-wish commented Jul 21, 2015

Add auxiliary database support for big instances #7554

Add auxiliary database support for big instances #7554

Comments

mgazdzik commented Mar 27, 2015

jloh commented Mar 28, 2015

mattab commented Mar 29, 2015

tsteur commented Apr 23, 2015

mattab commented Apr 27, 2015

tsteur commented Apr 27, 2015

quba commented Jun 17, 2015

esapozhnikov-wish commented Jul 21, 2015