New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dimensions cache site aware since CustomDimensions adds different dimensions based on the current site. #12618
Conversation
@@ -747,7 +747,7 @@ protected function generateIdFromClass($className) | |||
*/ | |||
public static function getAllDimensions() | |||
{ | |||
$cacheId = CacheId::pluginAware('AllDimensions'); | |||
$cacheId = CacheId::siteAware(CacheId::pluginAware('AllDimensions')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it is in another PR, but can be also make widgetsProvider siteAware and maybe reportsProvider? There is also DimensionsProvider
class which needs to be siteAware (getMapOfNameToDimension
). This is a new class that will at some point replace the old Dimension::getAll()
as well.
core/CacheId.php
Outdated
public static function siteAware($cacheId, array $idSites = null) | ||
{ | ||
if ($idSites === null) { | ||
$idSitesStr = Common::getRequestVar('idSite', false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we cast idSite
to int? Just to be sure it doesn't in the end include like ../../../etc/passwd
or something funny?
I was thinking to otherwise use Site::getIdSitesFromIdSitesString(Common::getRequestVar('idSite', false))
but this could potentially result in lots of different caches depending on what idSite=all
translates to.
I was also wondering if there could be any security issue when idSite=all
and the cache was generated by a super user, but then later accessed by an admin who was only access to some sites... but I believe this should not be a problem and the same case as it is currently anyway.
Maybe we can check idSiteStr
is numeric, and if not, use 0
or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it is 'all'? Will 0 work in that case? I at first wanted to call the SitesManager API, but thought that would be too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0
will work. Calling sitesManager (eg done by Site::getIdSitesFromIdSitesString
) would result in heaps of different caches which we maybe want to avoid. (As each user may have access to a different set of sites)
core/ArchiveProcessor/Loader.php
Outdated
@@ -63,20 +64,30 @@ protected function mustProcessVisitCount($visits) | |||
|
|||
public function prepareArchive($pluginName) | |||
{ | |||
$this->params->setRequestedPlugin($pluginName); | |||
$originalIdSite = Common::getRequestVar('idSite', false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we may need to be careful here since this will read from $_GET+$_POST
. I'm thinking of possible edge cases where an attacker might send a different value for $_GET
and $_POST
and I wonder if this could result in random outcomes or accessing data later after the restoring of originalIdSite
for example. I haven't thought it through but there could be certainly cases I suppose depending on the application code that this might be a problem. Also the reset would maybe not work or not be applied since it wouldn't overwrite a posted idsite.
2b1b9b0
to
f7c902c
Compare
Updated with tests, code to handle idSites query param, md5 for long cache keys, sort/array_unique (no filter since it will remove 0 values). Used siteAware in a couple other places. ReportsProvider::getAllReports() doesn't seem to depend on site, ReportsProvider::getMapOfModuleActionsToReport() already has site id in cache key & WidgetsProvider doesn't seem to use any caching. |
core/ArchiveProcessor/Loader.php
Outdated
return $idArchive; | ||
// temporarily set the idSite query parameter so archiving will end up using | ||
// the correct site aware caches | ||
$originalIdSite = isset($_GET['idSite']) ? $_GET['idSite'] : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better! I wonder what happens when someone posts idSite
though... I think plugins that use Common:getRequestVar('idSite')
eg in addDimensions()
would still use the POST value. This might be an issue when archiving through webarchiving/browser archiving eg in an API request. Should be maybe also have some logic for this and check also POST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think $_GET + $_POST
will use $_GET
over $_POST
, and here we don't care if there's a $_POST
, since we are really just passing the idSite to downstream code during archiving.
But it would definitely be safer to check for $_POST too, I'll add it.
…t dimensions based on the current site.
f7c902c
to
e367a7f
Compare
@tsteur updated again |
core/ArchiveProcessor/Loader.php
Outdated
$originalPostIdSite = isset($_POST['idSite']) ? $_POST['idSite'] : null; | ||
|
||
try { | ||
$_GET['idSite'] = $_POST['idSite'] = $this->params->getSite()->getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis I tested it and it worked 👍 Its good to merge from my point of view. I'm now a little bit concerned by this here but I don't think it is actually a problem. I was thinking of random cases like what if $_GET[idSites]=3
is set, we then go in here and archive the child sites 1
and 2
where we then set for example $_GET[idSites]=1
. I was wondering if this could cause a problem when eg in the archiver another API is called eg via API\Request::processRequest()
and suddenly we might end up with some random result and hard to explain bugs. For example https://github.com/matomo-org/matomo/blob/3.2.0/plugins/API/API.php#L269-L278 here we accept idSites
and idSite
(because we want to deprecate idSites
but not break BC). Or some random behaviour for example with the caches in siteAware
(but I checked the code and it is fine).
I don't think it is an actual problem though and more a problem in theory. To be safe I now wonder if we should also temporarily set GET/POST[idSites]=$site->getId()
but not sure if this could cause a bug, don't think so. Maybe it would be safe to unset temporarily any possible GET/POST[idSites]
value?
I leave it up to you, I'm not quite sure. I'm thinking I would maybe unset(idsites)
temporarily to make sure we work in the context of that site.
core/CacheId.php
Outdated
|
||
private static function getIdSiteList($queryParamName) | ||
{ | ||
$idSiteParam = Common::getRequestVar($queryParamName, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benaka I just had a look how I handled this cacheId implementation just recently and found https://github.com/matomo-org/matomo/blob/3.2.0/core/Columns/MetricsList.php#L97-L108 . However, it does not take into consideration idSites
but the different $_GET
and $_POST
values. I wonder, to be safe, if we should mix the logic of that implementation for the cache and this implementation.
Specifically, MetricsList
is missing the idSites
parameter, and here we could add to also check for $_GET and $_POST
and maybe also idsite
in lower case. The lower case idsite
is important to recognize as well when someone is using CacheId::siteAware()
during tracking.
Looking at $_GET and $_POST and adding both values to the cache key could prevent some random security issues if someone passes different values in $_GET and $_POST and for some reason some plugin only looks at $_GET (mainly 3rd party plugins I would say as we always use Common::getRequestVar
and in most cases do not set a custom $requestArrayToUse
).
@tsteur updated the PR w/ more changes |
core/CacheId.php
Outdated
$originalPostIdSite = isset($_POST['idSite']) ? $_POST['idSite'] : null; | ||
|
||
$originalGetIdSites = isset($_GET['idSites']) ? $_GET['idSites'] : null; | ||
$originalPostIdSites = isset($_POST['idSites']) ? $_POST['idSites'] : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if we are in TrackerMode
and then also update idsite
(lowercase)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would end up adding the ID to the cache twice, right? Eg, for idSite=1, idsite=1, the key would be $key-1-1
, will that be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true... now I wonder if we should do something like if trackermode then idsite else idSite
... so complicated... I think it would be fine to end up like this as usually in tracking there would be only idsite
set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i'll set idsite in 3...2...1...
…et values if they were not set to begin w/ + add test for CacheId::overwriteIdSiteForCache().
@tsteur updated the PR, but looks like tests are failing now, not sure why |
core/CacheId.php
Outdated
* @param callable $callback | ||
* @return mixed returns result of $callback | ||
*/ | ||
public static function overwriteIdSiteForCache($idSite, $callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis sorry to ping here again. It's funny I'm just working on a case where I actually next exactly the same feature. I have a task that iterates over an array of entries that belong to different sites and would need this as well. As this feature is not in any way cache related and can be also used in other places, maybe we move this awesome new feature somewhere else? eg to the Site
class or so? Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move it to a Context
class, like, Context::changeIdSite()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works as well 👍
@tsteur Updated the PR and fixed another test failure (apparently some code sets |
413dfbb
to
83b00db
Compare
Looks good to me if tests succeed 👍 |
Looks like a very nicely done implementation! 👍 Looking forward to having this live and confirming our two long lasting important issues finally solved. |
Fixes an issue where if multiple sites w/ custom dimensions are archived in the same process, archiving of latter sites in the list will not have access to the correct dimensions (since the previous site's dimensions will be cached).
Quick fix that makes the dimensions cache site aware, and sets the idSite query param for the to-be-archived site in ArchiveProcessor/Loader.
Note: all segments cache is already site aware: https://github.com/matomo-org/matomo/blob/3.x-dev/plugins/API/API.php#L178-L181
refs #12560