Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auxiliary database support for big instances #7554

Closed
mgazdzik opened this issue Mar 27, 2015 · 7 comments · Fixed by #14681
Closed

Add auxiliary database support for big instances #7554

mgazdzik opened this issue Mar 27, 2015 · 7 comments · Fixed by #14681
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.

Comments

@mgazdzik
Copy link
Contributor

Goal of this ticket is to discuss and hopefully implement solution allowing single Piwik instance to query two replicant databases.

Outline of issue:

  • big Piwik instances often have master-slave db replication,
  • currently only use (in our case) of slave servers is to perform db backups, and in emergency cases to switch into master to provide HA

However slave servers could be put to much better use if we can redirect archiving queries onto slave db.
Long running and heavy queries could be triggered against slave, then results processed in PHP and write archives to master server.
This would significantly relieve tracker, take away some other problems with db performance when traffic peak meets long running archiving.

For initial scope, maybe only select queries triggered by LogAggregator could be launched against auxiliary db server, and in future we add option for developers to decide which db to use.
Even putting basic LogAggregator queries into slave would give significant boost to performance.

What do you think?

@mgazdzik mgazdzik added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. RFC Indicates the issue is a request for comments where the author is looking for feedback. labels Mar 27, 2015
@jloh
Copy link

jloh commented Mar 28, 2015

Yeah it would be great for you could specify a read only database to use. I would be hesitant to use it for archiving or something of the sort though, what if the data is out of sync and you don't know it?

A read only dB would be great for use in the front end when displaying reports so as not to take up connections on the master server

@mattab
Copy link
Member

mattab commented Mar 29, 2015

This would significantly relieve tracker, take away some other problems with db performance when traffic peak meets long running archiving.

To take away problems with Tracker performance let's try first using QueueTracking: https://github.com/piwik/plugin-QueuedTracking

after using this plugin, performance will get much better as it's wrapped within a transaction. Maybe we will still have tracker IO contention issues after using QueuedTracking, but let's try this first (it will happen in the next days and weeks)

@mattab mattab removed the RFC Indicates the issue is a request for comments where the author is looking for feedback. label Mar 29, 2015
@mattab mattab changed the title Add auxuliary database support for big instances Add auxiliary database support for big instances Mar 29, 2015
@mattab mattab added this to the Mid term milestone Mar 29, 2015
@tsteur
Copy link
Member

tsteur commented Apr 23, 2015

As @MooASh mentioned for archiving it might not work easily / might be risky since there can be a delay on the slave etc.

Reports in the UI / API shouldn't bother the database too much as they are more or less simple selects and there shouldn't be too many entries. Starting from Piwik 2.13 the archive blob tables should have much less entries and be faster. The archive tables can become maybe a problem when having like thousands of websites and thousands of segments.

It might be worth for the Live API if one queries it a lot but then again if there's a delay it might show outdated data and that's not always ok.

The problem is rather the log tables. Here, one can purge log entries eg > 6 months when those tables become too big. Partitioning might also help, haven't tried it yet. Other DB's like TokuDB might help as well.

As you said using slave for archiving it would be nice but probably can have side effects. Would require some investigation. @mattab If tracker performance is too slow because of long running archiving all the time, it will make inserting from Queue to DB rather slow as well. Queue does not really have a huge performance improvement over the Tracker if at all. The queue was mainly built to handle peaks easily and to avoid having too many connections to the database etc, not for performance reasons. The insertion from Queue to DB can be faster than normal tracking, but that's rather because of having things cached in PHP across requests and not necessarily related to DB. Depending on the number of log entries it can be faster because of transactions but doesn't have to be.

@mattab
Copy link
Member

mattab commented Apr 27, 2015

As you said using slave for archiving it would be nice but probably can have side effects. Would require some investigation.

what comes to mind after quickly thinking about this:

  • when we run archiving queries on slave, how do we make sure the slave is not "too far behind" so we don't run queries on invalid / incompete log_* data ?
  • to run archiving on the slave, if we query log_* tables on the slave and insert the archive blobs in master DB, then I guess the only way to do this is to connect to 2 DBs at once in same PHP process and decide where to read/write from.
  • to do: investigate further what edge cases there are

@tsteur
Copy link
Member

tsteur commented Apr 27, 2015

Those kinda things came to my mind as well. It'd be worth it, if it actually relieves the database / tracking.

Therefore more importantly, before working on this, we need to profile the database on an instance where there are problems and collect "evidence" that this is actually a problem and that it would make things "faster" and how much etc.

@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Jun 12, 2015
@quba
Copy link
Contributor

quba commented Jun 17, 2015

There are 2 situations where this approach definitely will help:

  • querying Live API (or in general APIs) - especially Live API with segments applied,
  • realtime segments - in worst case scenario we would have DB replication lag instead of issues with DB used to track visitors.

@esapozhnikov-wish
Copy link

This feature would be very helpful.

Currently the Db deadlocks cause the front end to fail to load.
For people like myself who use a separate server for archiving it would be very useful to have a front end with read-only Db so that the front end always loads reliably.

@mattab mattab modified the milestones: Long term, Mid term Dec 5, 2016
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants