Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dimensions cache site aware since CustomDimensions adds different dimensions based on the current site. #12618

Merged
merged 10 commits into from Mar 20, 2018

Conversation

diosmosis
Copy link
Member

@diosmosis diosmosis commented Mar 15, 2018

Fixes an issue where if multiple sites w/ custom dimensions are archived in the same process, archiving of latter sites in the list will not have access to the correct dimensions (since the previous site's dimensions will be cached).

Quick fix that makes the dimensions cache site aware, and sets the idSite query param for the to-be-archived site in ArchiveProcessor/Loader.

Note: all segments cache is already site aware: https://github.com/matomo-org/matomo/blob/3.x-dev/plugins/API/API.php#L178-L181

refs #12560

@mattab mattab added this to the 3.3.1 milestone Mar 15, 2018
@@ -747,7 +747,7 @@ protected function generateIdFromClass($className)
*/
public static function getAllDimensions()
{
$cacheId = CacheId::pluginAware('AllDimensions');
$cacheId = CacheId::siteAware(CacheId::pluginAware('AllDimensions'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is in another PR, but can be also make widgetsProvider siteAware and maybe reportsProvider? There is also DimensionsProvider class which needs to be siteAware (getMapOfNameToDimension). This is a new class that will at some point replace the old Dimension::getAll() as well.

core/CacheId.php Outdated
public static function siteAware($cacheId, array $idSites = null)
{
if ($idSites === null) {
$idSitesStr = Common::getRequestVar('idSite', false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we cast idSite to int? Just to be sure it doesn't in the end include like ../../../etc/passwd or something funny?

I was thinking to otherwise use Site::getIdSitesFromIdSitesString(Common::getRequestVar('idSite', false)) but this could potentially result in lots of different caches depending on what idSite=all translates to.

I was also wondering if there could be any security issue when idSite=all and the cache was generated by a super user, but then later accessed by an admin who was only access to some sites... but I believe this should not be a problem and the same case as it is currently anyway.
Maybe we can check idSiteStr is numeric, and if not, use 0 or so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is 'all'? Will 0 work in that case? I at first wanted to call the SitesManager API, but thought that would be too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 will work. Calling sitesManager (eg done by Site::getIdSitesFromIdSitesString) would result in heaps of different caches which we maybe want to avoid. (As each user may have access to a different set of sites)

@@ -63,20 +64,30 @@ protected function mustProcessVisitCount($visits)

public function prepareArchive($pluginName)
{
$this->params->setRequestedPlugin($pluginName);
$originalIdSite = Common::getRequestVar('idSite', false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we may need to be careful here since this will read from $_GET+$_POST. I'm thinking of possible edge cases where an attacker might send a different value for $_GET and $_POST and I wonder if this could result in random outcomes or accessing data later after the restoring of originalIdSite for example. I haven't thought it through but there could be certainly cases I suppose depending on the application code that this might be a problem. Also the reset would maybe not work or not be applied since it wouldn't overwrite a posted idsite.

@diosmosis diosmosis force-pushed the site-aware-dimensions-cache branch 2 times, most recently from 2b1b9b0 to f7c902c Compare March 15, 2018 03:12
@diosmosis
Copy link
Member Author

Updated with tests, code to handle idSites query param, md5 for long cache keys, sort/array_unique (no filter since it will remove 0 values).

Used siteAware in a couple other places. ReportsProvider::getAllReports() doesn't seem to depend on site, ReportsProvider::getMapOfModuleActionsToReport() already has site id in cache key & WidgetsProvider doesn't seem to use any caching.

return $idArchive;
// temporarily set the idSite query parameter so archiving will end up using
// the correct site aware caches
$originalIdSite = isset($_GET['idSite']) ? $_GET['idSite'] : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better! I wonder what happens when someone posts idSite though... I think plugins that use Common:getRequestVar('idSite') eg in addDimensions() would still use the POST value. This might be an issue when archiving through webarchiving/browser archiving eg in an API request. Should be maybe also have some logic for this and check also POST?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think $_GET + $_POST will use $_GET over $_POST, and here we don't care if there's a $_POST, since we are really just passing the idSite to downstream code during archiving.

But it would definitely be safer to check for $_POST too, I'll add it.

@diosmosis
Copy link
Member Author

@tsteur updated again

$originalPostIdSite = isset($_POST['idSite']) ? $_POST['idSite'] : null;

try {
$_GET['idSite'] = $_POST['idSite'] = $this->params->getSite()->getId();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis I tested it and it worked 👍 Its good to merge from my point of view. I'm now a little bit concerned by this here but I don't think it is actually a problem. I was thinking of random cases like what if $_GET[idSites]=3 is set, we then go in here and archive the child sites 1 and 2 where we then set for example $_GET[idSites]=1. I was wondering if this could cause a problem when eg in the archiver another API is called eg via API\Request::processRequest() and suddenly we might end up with some random result and hard to explain bugs. For example https://github.com/matomo-org/matomo/blob/3.2.0/plugins/API/API.php#L269-L278 here we accept idSites and idSite (because we want to deprecate idSites but not break BC). Or some random behaviour for example with the caches in siteAware (but I checked the code and it is fine).

I don't think it is an actual problem though and more a problem in theory. To be safe I now wonder if we should also temporarily set GET/POST[idSites]=$site->getId() but not sure if this could cause a bug, don't think so. Maybe it would be safe to unset temporarily any possible GET/POST[idSites] value?
I leave it up to you, I'm not quite sure. I'm thinking I would maybe unset(idsites) temporarily to make sure we work in the context of that site.

core/CacheId.php Outdated

private static function getIdSiteList($queryParamName)
{
$idSiteParam = Common::getRequestVar($queryParamName, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaka I just had a look how I handled this cacheId implementation just recently and found https://github.com/matomo-org/matomo/blob/3.2.0/core/Columns/MetricsList.php#L97-L108 . However, it does not take into consideration idSites but the different $_GET and $_POST values. I wonder, to be safe, if we should mix the logic of that implementation for the cache and this implementation.

Specifically, MetricsList is missing the idSites parameter, and here we could add to also check for $_GET and $_POST and maybe also idsite in lower case. The lower case idsite is important to recognize as well when someone is using CacheId::siteAware() during tracking.

Looking at $_GET and $_POST and adding both values to the cache key could prevent some random security issues if someone passes different values in $_GET and $_POST and for some reason some plugin only looks at $_GET (mainly 3rd party plugins I would say as we always use Common::getRequestVar and in most cases do not set a custom $requestArrayToUse).

@diosmosis
Copy link
Member Author

@tsteur updated the PR w/ more changes

core/CacheId.php Outdated
$originalPostIdSite = isset($_POST['idSite']) ? $_POST['idSite'] : null;

$originalGetIdSites = isset($_GET['idSites']) ? $_GET['idSites'] : null;
$originalPostIdSites = isset($_POST['idSites']) ? $_POST['idSites'] : null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if we are in TrackerMode and then also update idsite (lowercase)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would end up adding the ID to the cache twice, right? Eg, for idSite=1, idsite=1, the key would be $key-1-1, will that be ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true... now I wonder if we should do something like if trackermode then idsite else idSite... so complicated... I think it would be fine to end up like this as usually in tracking there would be only idsite set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i'll set idsite in 3...2...1...

…et values if they were not set to begin w/ + add test for CacheId::overwriteIdSiteForCache().
@diosmosis
Copy link
Member Author

@tsteur updated the PR, but looks like tests are failing now, not sure why

core/CacheId.php Outdated
* @param callable $callback
* @return mixed returns result of $callback
*/
public static function overwriteIdSiteForCache($idSite, $callback)
Copy link
Member

@tsteur tsteur Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis sorry to ping here again. It's funny I'm just working on a case where I actually next exactly the same feature. I have a task that iterates over an array of entries that belong to different sites and would need this as well. As this feature is not in any way cache related and can be also used in other places, maybe we move this awesome new feature somewhere else? eg to the Site class or so? Any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move it to a Context class, like, Context::changeIdSite()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works as well 👍

@diosmosis
Copy link
Member Author

@tsteur Updated the PR and fixed another test failure (apparently some code sets idSites to an actual array and not just an array of strings).

@tsteur
Copy link
Member

tsteur commented Mar 16, 2018

Looks good to me if tests succeed 👍

@mattab
Copy link
Member

mattab commented Mar 20, 2018

Looks like a very nicely done implementation! 👍

Looking forward to having this live and confirming our two long lasting important issues finally solved.

@mattab mattab merged commit 715f2d6 into 3.x-dev Mar 20, 2018
@mattab mattab deleted the site-aware-dimensions-cache branch March 20, 2018 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants