New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Piwik more efficient: sharding the data in several databases #471
Comments
Attachment: DEV version of sharding Piwik plugin |
SKype provide its engine for sharding database with PostgreSQL. It is exactly deisgned for the purpose here. And it can be better to dispatch the idsite in a non linear way. The above link provide an example based on a hash. |
This plugin will need to be updated to reflect db abstraction changes. In the absence of sharding, consider providing an option to remove the sharding comments to workaround a query cache bug on older MySQL versions. |
We should not remove the comments. Even though the plugin is now not in a working state, it is there as a proof of concept. Sharding in Piwik would be a must have feature for high traffic piwik instances. What is the issue with mysql cache? if it is fixed in stable mysql releases, it is not a blocker for us. |
Thought: investigate using the Spider storage engine for MySQL as a more transparent method for partitioning/sharding. |
Sharding as such is not the way to go... we can open specific tickets for specific implementations (eg. Mysql spider storage) if someone starts work on it. |
So,how's it going now? |
This was never implemented, right? Related tickets: https://forum.matomo.org/t/attach-multiple-dbs-to-piwik-server/9261 https://forum.matomo.org/t/one-config-file-to-support-multiple-database-locations-or-schemas/10374 |
It wasn't implemented. We now support a reader though: https://matomo.org/faq/how-to-install/faq_35746/ Also in case you did need sharding you could look if you can configure something through MySQL directly. |
In Piwik, all data is stored in a monolithic database. That is a problem when you have a huge traffic to monitor with Piwik: the database server struggles, the queries take too long to finish. One solution is to record data in several databases, within the same Piwik instance. Piwik will route automatically the data to the right database using the “idsite”.
For example:
- sites 1-1000 in serverA
- site 1001-1100 in serverB
We need to have the idsite in all SQL queries (as a parameter, or in a comment) and then automatically grep their content and route to the right server. The pairs (idsite, server) are stored in a configuration file.
THIS IS NOT FINISHED AND NEEDS MORE WORK.
Johan Mathe built the first dev version of the plugin, attached in the ticket.
We updated the core to add the idsite in some queries, but there is more to do.
This plugin was developed in August 2008 and Piwik has slightly changed since, added more queries, etc.
This plugin would be incredibly useful to all the big users of Piwik; some people are using Piwik to monitor thousands of websites, millions of visits, etc.
Attached is the current development version of the plugin. This is DEV only (it won’t work with current trunk). It is helpful to give an idea on how it could work.
Please post a comment here if you are interested in this plugin development and would like to participate.
The text was updated successfully, but these errors were encountered: