The size of binlog files used to replicate data is so big, that the slave DB is not able to process them. Currently, after 2 minutes we have 400 huge queries to
piwik_option to update this value. Situation will be even worse when running a few archiving threads which is a must have when you have thousands of sites.
Maybe we could use Redis to store this option and maybe even all Piwik settings?
It's a matter of \Piwik\CronArchive\SharedSiteIds::getNextSiteId().
If it's caused by
\Piwik\CronArchive\SharedSiteIds::getNextSiteId() then we probably have to use Redis for it as we kinda have to always perform this update.
If there are like 10.000 sites, shouldn't there be like max 10.000 queries to update the option table entry? To put this in relation we might execute much more inserts for archive record entries (eg like 100 per site which would be 1mio). Is it actually inserting or updating? Probably doesn't make a difference :)
Standard insert or:
(option_name option_value autoload) VALUES ('SharedSiteIdsToArchive' '' '') ON DUPLICATE KEY UPDATE option_value = '' ...
The workaround could be to force-idsites one by one, but this makes it really complicated if you want to start a few threads..
Summary of what we discussed:
We already have various caches like
Piwik\Cache\Transient see https://github.com/piwik/piwik/blob/2.16.1-b1/core/Cache.php#L17-L57 that can be configured to use different kind of backends in
global.ini.php see https://github.com/piwik/piwik/blob/2.16.1-b1/config/global.ini.php#L74-L97.
Databasewhich we will configured as default for this distributed cache. The database backend will simply store its content in the
piwik_optiontable. We possibly won't add this database backend to https://github.com/piwik/component-cache as it's very Piwik specific and the work needed to have it generic with abstraction is probably not needed for now.
Sentinelby using the https://github.com/colinmollenhour/credis as we cannot use the existing
Redisbackend adapter as it does not support Sentinel. We'll basically need to add a new backend to https://github.com/piwik/component-cache and configure it like here https://github.com/piwik/component-cache/blob/master/src/Backend/Factory.php#L58 to make it use
Redis. A connection via
crediscan be made like this: https://github.com/piwik/plugin-QueuedTracking/commit/2c89543914cfe9421483abd3acd46b9550228f30#diff-6e558271034627f36a0be1d5152fffaeR22 .
setRedis()method to accept a different interface.
Distributedcache needs to be used here https://github.com/piwik/piwik/blob/2.16.0/core/CronArchive/SharedSiteIds.php#L96-L115 instead of the hardcoded call to Piwik option table.
rediswill be possible to use for distributed cache. Config name could be eg
[DistributedCache]backend=redis? Or maybe
We are aware that several servers could request the same idSite at the same time because we do not have a proper lock in place but this is ok for now. We need to ideally fix the problem that multiple archives at the same time for same idSite could be created and slow down each other.