In Piwik, all data is stored in a monolithic database. That is a problem when you have a huge traffic to monitor with Piwik: the database server struggles, the queries take too long to finish. One solution is to record data in several databases, within the same Piwik instance. Piwik will route automatically the data to the right database using the "idsite".
We need to have the idsite in all SQL queries (as a parameter, or in a comment) and then automatically grep their content and route to the right server. The pairs (idsite, server) are stored in a configuration file.
THIS IS NOT FINISHED AND NEEDS MORE WORK.
Johan Mathe built the first dev version of the plugin, attached in the ticket.
We updated the core to add the idsite in some queries, but there is more to do.
This plugin was developed in August 2008 and Piwik has slightly changed since, added more queries, etc.
This plugin would be incredibly useful to all the big users of Piwik; some people are using Piwik to monitor thousands of websites, millions of visits, etc.
Attached is the current development version of the plugin. This is DEV only (it won't work with current trunk). It is helpful to give an idea on how it could work.
Please post a comment here if you are interested in this plugin development and would like to participate.
SKype provide its engine for sharding database with PostgreSQL. It is exactly deisgned for the purpose here.
Let's have a look for those interested : [https://developer.skype.com/SkypeGarage/DbProjects/PlProxy]
And it can be better to dispatch the idsite in a non linear way. The above link provide an example based on a hash.
This plugin will need to be updated to reflect db abstraction changes.
In the absence of sharding, consider providing an option to remove the sharding comments to workaround a query cache bug on older MySQL versions.
We should not remove the comments. Even though the plugin is now not in a working state, it is there as a proof of concept. Sharding in Piwik would be a must have feature for high traffic piwik instances.
What is the issue with mysql cache? if it is fixed in stable mysql releases, it is not a blocker for us.
Thought: investigate using the Spider storage engine for MySQL as a more transparent method for partitioning/sharding.
Sharding as such is not the way to go... we can open specific tickets for specific implementations (eg. Mysql spider storage) if someone starts work on it.