Piwik more efficient: sharding the data in several databases #471

mattab · 2008-12-09T11:24:57Z

In Piwik, all data is stored in a monolithic database. That is a problem when you have a huge traffic to monitor with Piwik: the database server struggles, the queries take too long to finish. One solution is to record data in several databases, within the same Piwik instance. Piwik will route automatically the data to the right database using the “idsite”.

For example:
- sites 1-1000 in serverA
- site 1001-1100 in serverB

We need to have the idsite in all SQL queries (as a parameter, or in a comment) and then automatically grep their content and route to the right server. The pairs (idsite, server) are stored in a configuration file.

THIS IS NOT FINISHED AND NEEDS MORE WORK.

Johan Mathe built the first dev version of the plugin, attached in the ticket.
We updated the core to add the idsite in some queries, but there is more to do.
This plugin was developed in August 2008 and Piwik has slightly changed since, added more queries, etc.

This plugin would be incredibly useful to all the big users of Piwik; some people are using Piwik to monitor thousands of websites, millions of visits, etc.

Attached is the current development version of the plugin. This is DEV only (it won’t work with current trunk). It is helpful to give an idea on how it could work.

Please post a comment here if you are interested in this plugin development and would like to participate.

mattab · 2008-12-09T11:26:13Z

Attachment: DEV version of sharding Piwik plugin
[Sharding.zip](http://issues.piwik.org/attachments/471/Sharding.zip)

anonymous-matomo-user · 2009-03-26T01:39:10Z

SKype provide its engine for sharding database with PostgreSQL. It is exactly deisgned for the purpose here.
Let’s have a look for those interested : [https://developer.skype.com/SkypeGarage/DbProjects/PlProxy]

And it can be better to dispatch the idsite in a non linear way. The above link provide an example based on a hash.

robocoder · 2009-10-08T19:36:03Z

This plugin will need to be updated to reflect db abstraction changes.

In the absence of sharding, consider providing an option to remove the sharding comments to workaround a query cache bug on older MySQL versions.

mattab · 2009-10-12T00:46:30Z

We should not remove the comments. Even though the plugin is now not in a working state, it is there as a proof of concept. Sharding in Piwik would be a must have feature for high traffic piwik instances.

What is the issue with mysql cache? if it is fixed in stable mysql releases, it is not a blocker for us.

robocoder · 2010-05-11T05:49:22Z

Thought: investigate using the Spider storage engine for MySQL as a more transparent method for partitioning/sharding.

mattab · 2010-11-24T11:27:08Z

Sharding as such is not the way to go... we can open specific tickets for specific implementations (eg. Mysql spider storage) if someone starts work on it.

erickhuang17 · 2020-02-26T09:06:23Z

So,how's it going now?

eramirezprotec · 2020-03-06T14:17:00Z

This was never implemented, right?

Related tickets:

https://forum.matomo.org/t/attach-multiple-dbs-to-piwik-server/9261

https://forum.matomo.org/t/one-config-file-to-support-multiple-database-locations-or-schemas/10374

#4133

tsteur · 2020-03-07T02:41:01Z

It wasn't implemented. We now support a reader though: https://matomo.org/faq/how-to-install/faq_35746/

Also in case you did need sharding you could look if you can configure something through MySQL directly.

mattab added this to the Future releases milestone Jul 8, 2014

mattab added T: Bug labels Jul 8, 2014

mattab assigned zawadzinski Jul 8, 2014

mattab unassigned zawadzinski Nov 23, 2014

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Piwik more efficient: sharding the data in several databases #471

Piwik more efficient: sharding the data in several databases #471

mattab commented Dec 9, 2008

mattab commented Dec 9, 2008

anonymous-matomo-user commented Mar 26, 2009

robocoder commented Oct 8, 2009

mattab commented Oct 12, 2009

robocoder commented May 11, 2010

mattab commented Nov 24, 2010

erickhuang17 commented Feb 26, 2020

eramirezprotec commented Mar 6, 2020

tsteur commented Mar 7, 2020

Piwik more efficient: sharding the data in several databases #471

Piwik more efficient: sharding the data in several databases #471

Comments

mattab commented Dec 9, 2008

mattab commented Dec 9, 2008

anonymous-matomo-user commented Mar 26, 2009

robocoder commented Oct 8, 2009

mattab commented Oct 12, 2009

robocoder commented May 11, 2010

mattab commented Nov 24, 2010

erickhuang17 commented Feb 26, 2020

eramirezprotec commented Mar 6, 2020

tsteur commented Mar 7, 2020