Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to define plugin/site specific custom tracker config and use in tracker clients #15374

Closed
wants to merge 21 commits into from

Conversation

diosmosis
Copy link
Member

Fixes #15191

Also includes an new UI test that tests the JS tracker.

@diosmosis diosmosis added the Needs Review PRs that need a code review label Jan 10, 2020
@diosmosis diosmosis added this to the 4.0.0 milestone Jan 10, 2020
@tsteur tsteur changed the base branch from 3.x-dev to 4.x-dev January 14, 2020 04:00
return;
}

saveTrackerExtraConfig(trackerIdSite, config);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis so far the PR looks straight forward. Haven't tested it yet though.

Re saving it in session...

How Heatmaps & Session Recording worked so far is that:

  • It does a request on each page like configs.php?url=https://example.com
  • Then the logic checks for magic heatmaps or sessions, and outputs the matching ones
  • It also does some other server side logic based on configured screenshot_url etc.

Now that we cache this info (which is great), I reckon we simply return all recorded heatmap/session recording information and then do this logic client side? It means we basically do return the info of all configured heatmaps/sessions to the user but they can kind of see it anyway I suppose (if they were to view all page urls) so should be all good.

I'm planning to spend some time next week to rewrite that logic in heatmap/session recording to verify the PR works well with plugins.

Will probably do the same for form analytics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 sounds like that makes sense, less server side logic, and just one cached query server side in the tracker, if I understand what you're saying.

@@ -109,6 +110,16 @@ public function main(Handler $handler, RequestSet $requestSet)
$this->init();
$handler->init($this, $requestSet);

// if the tracker config is requested, don't bother trying to track anything
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw one thing that might be interesting looking at is similar behaviour to Configs class in getCachedWebsiteAttributes method in HeatmapSessionRecording where we basically try to read the website specific tracker cache and if it exists, avoid loading plugins. We might be able to later add some small performance tweak in matomo.php to behave similar as in configs.php that we directly get $_GET['idsite']; when the configs=1 parameter is set, and then before even doing anything tracker specific only check if the website cache exists. This could also have the benefit that it would maybe work quite well with QueuedTracking etc. To be seen though and something we can add later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this would be a problem for a plugin that didn't want to use the tracker cache, for example, getting the next page to scan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting the next page to scan

Not sure what you mean here? It's eventually cached in the browser anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client side cache isn't important for this use case, since the scan would happen once at the start of the visit and then, some other visit would scan another page. But server side, we would not cache the "next page to scan", since it would have to be fetched each time. If we check for whether the website specific tracker cache exists and avoid loading the tracker plugins, then we won't be able to execute the event and the plugin won't be able to return the right data, correct? Ie, some plugins may not want to use the cache. I'm not sure I'm understanding the proposed optimization though, it is to not have to load tracker plugins and execute the event server side, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite getting it yet re next page scan but the idea is basically, since this request would be executed for each visit at least once (and many visits have only one page view), that we try to make this as efficient as possible - if possible. Basically, it would mean moving more decision making into the client when possible to save a lot of resources etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I understood the idea to be, but it poses a problem for plugins that don't use the cache, since even if the cache exists, there will be no value to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it poses a problem for plugins that don't use the cache, since even if the cache exists, there will be no value to read.

Sorry not getting this part. Not sure which cache or plugins you mean etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's say a plugin uses this event to:

function getTrackerConfigs(&$configs)
{
    $configs['MyPlugin']['nextPageToScan'] = $this->getNextPageToScan(); // ... performs a query on the DB ...
}

then the Tracker checks for a cache entry:

$cache = Cache::getCacheWebsiteAttributes($idSite);
if (!empty($cache['trackerconfigs'])) {
    return $cache['trackerconfigs']; // in this case, `$configs['MyPlugin']['nextPageToScan']` is either absent because we don't cache it, or incorrect and old (if we cache the result of the event)
}

// ... load tracker plugins ...
Piwik::postEvent('Tracker.getTrackerConfigs', $configs);
return $configs;

Since the value used by the plugin is dynamic it can't be cached. For this optimization, we'd have to do something like:

$pluginsToLoad = [];
foreach ($trackerPlugins as $plugin) {
    if (empty($cache['trackerconfigs'][$plugin]) && isPluginUsingTrackerCacheForConfigs($plugin)) {
        $pluginsToLoad[] = $plugin;
    }
}

// load $pluginsToLoad
Piwik::postEvent('Tracker.getTrackerConfigs', $configs);
$cache['trackerconfigs'] = array_merge($cache['trackerconfigs'], getConfigsForPluginsUsingCache($configs));

I hope that makes sense. This isn't for an existing plugin, btw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it now 👍 And we clearly wouldn't want to return all nextPagesToScan as it could be a long list etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsteur I added the optimization for this, I haven't tested it since I'm not completely setup for it, but it should be possible to avoid loading some plugins when it's called if data is cached.

core/Tracker.php Outdated
* ]
* ```
*/
Piwik::postEvent('Tracker.getTrackerConfigs', [&$configs]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not too important, but might be good to post some additional params here like we have in some places. eg idSite? eg Piwik::postEvent('Tracker.getTrackerConfigs', [&$configs, array('idSite' => $idSite)]); or Piwik::postEvent('Tracker.getTrackerConfigs', [&$configs, $idSite]);? This way don't have to get the idSite in every event and might be easier to test it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

js/piwik.js Outdated
function loadExtraConfig(tracker, callback) {
extraTrackerConfigCallbacks.push(callback);

if (isAlreadyFetchingExtraConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if important but do we maybe need to wait to make sure that tracker.getPiwikUrl and tracker.getIdSite() both return a value and are configured? I see there was some code around this in Heatmaps...

Eg there could be an edge case when user configures the tracker manually, that maybe no Matomo URL or idSite is configured just yet when executing this logic (Heatmaps executes this logic on various occasions like onDomReady and onLoad). Eg we recommend to first only set a tracker URL and then at a later point the idSite see eg matomo-org/tag-manager@f0e368d#diff-45027e0449eb2627a9d7ae2d953aa2ef . If there was some time between these two events, it be technically possible that a plugin asks to load the config in that moment. Practically, it is rather unlikely though that this logic would be executed after a tracker URL was set and before setting an ID since JS wouldn't execute an event in between those lines... It could happen though depending how a user sets the idSite.

Also the flow could be like this:

  • new Tracker('trackerUrl')
  • A plugin is listening to the TrackerSetup event
  • During that event a plugin requests the config to have it available as soon as possible
  • The idSite might not be set yet

I reckon in this case it be a false implementation of the tracker plugin as it shouldn't do that.

Probably my comment is confusing and I don't think we need it just yet maybe unless it's easy to implement in which case it would help with edge cases maybe.

I suppose it could be for the idSite eg like this

this.setIdSite = function (idSite) {
    configIdSite = idSite;
    if (extraTrackerConfigCallbacks.length > 0 && ! isAlreadyFetchingExtraConfig) {
        loadExtraConfig(this);
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed this case, can you take a look?

@tsteur
Copy link
Member

tsteur commented Feb 3, 2020

FYI @diosmosis I've added some new URL API to the tracker which I need in Heatmaps and Forms... basically matching page rules used to be done server side on every request, but now I need to basically return these configurations as part of the configs and match them client side. To not needing to re-implement these things all the time I added it to Piwik. Thinking might eventually also add other things I'm using repeatedly between plugins. Eg type.isArray, type.isString etc. Might be good to expose more APIs

@tsteur
Copy link
Member

tsteur commented Feb 4, 2020

FYI:
image

  • Haven't looked yet why Matomo would print a stack trace when there is a notice... wondering if an error handler needs to be registered to basically mute these errors? Otherwise the JSONP won't work...

  • I noticed the session storage key is eg mtm.extraTrackerConfig.1. I reckon this could be problematic since we cannot guarantee that each tracker always get the same ID? Even ourselves on some of our pages we might sometimes create a different tracker first and then the logic wouldn't work anymore. I suppose we would maybe need to include the idSite and matomo tracker url as a hash instead of the unique tracker ID?

  • Also I wonder if we could maybe send the config request along with a trackPageView tracking request for example (like &send_config=1)? It wouldn't be needed but I reckon it could save us some extra request on occasion. That would of course only work when it is not using bulk tracking and not using queued tracking I suppose. In many or most cases it would maybe improve things though. As often visits have only one page view, it could save us quite a bit. The problem with this config request is, that in worst case it double our tracking requests eg on Cloud (but also for a regular user). It won't double it, but maybe increase tracking requests by say 50% if there are many visitors with only one page view. So if we could avoid needing to request this info in a separate request, that could be a great improvement. Here's some of the thoughts:

Eg FormAnalytics...
Most pages have forms (eg search). When page is loaded, and Form Analytics detects a form, we would need to issue a configs request. Meaning when someone is using Form Analytics (which we do eg on Cloud), then per visit there will be at least one request to configs. However, if the config be returned as part of eg trackPageView, then this extra request could be avoided. Meanwhile, during a page view, it would basically only mean we need to trigger that event and don't need to bootstrap Matomo again etc

Note for myself: If we wanted to troubleshoot the ExtraConfig logic, and needed to basically clear the cache, we can execute eg sessionStorage.removeItem('mtm.extraTrackerConfig.1') or sessionStorage.clear() as part of the tracking code.

Otherwise it'll still take me a few more days to fully integrate Heatmaps and Form Analytics as it's not that trivial.

@diosmosis
Copy link
Member Author

I suppose we would maybe need to include the idSite and matomo tracker url as a hash instead of the unique tracker ID?

It should be using the idSite, not the tracker ID. It used the tracker ID at first then I realized it would make more sense to set it based on idSite.

Also I wonder if we could maybe send the config request along with a trackPageView tracking request for example (like &send_config=1)?

I thought about this, but noticed it wasn't simple. I can't remember exactly why, but I believe it was due to the fact that in the tracker we use an Image to get the response. We'd have to conditionally do something different then?

@tsteur
Copy link
Member

tsteur commented Feb 4, 2020

It should be using the idSite, not the tracker ID. It used the tracker ID at first then I realized it would make more sense to set it based on idSite.

Wouldn't it also need to include the trackerUrl since you might track to different Matomo's that use both same idSite? We do these things partially eg on matomo.org

@diosmosis
Copy link
Member Author

Wouldn't it also need to include the trackerUrl since you might track to different Matomo's that use both same idSite? We do these things partially eg on matomo.org

Ah, I didn't think of that, will make that change as well.

@tsteur
Copy link
Member

tsteur commented Feb 4, 2020

I thought about this, but noticed it wasn't simple. I can't remember exactly why, but I believe it was due to the fact that in the tracker we use an Image to get the response. We'd have to conditionally do something different then?

Yeah I would say the first request could do something like if (!extraConfigLoading && !extraConfigLoaded) { $requestUrl.= 'send_image=0&send_configs=1; $extraConfigLoading = true; }`

The problem is then though... that we might request this info even though no plugins actually requests it and in the end we transfer more data. I suppose in this case the output/response of configs be empty though so it wouldn't be a major overhead. Or we could find some way to potentially ask plugins to register very early while the tracking code is being parsed that they need extra config and only then we'd do this... (that be not too complicated eg they would only need to listen to the TrackerSetup event) but it may not be needed.

I just realise actually that we have now enabled sendBeacon by default and AFAIK we cannot get the response from this so the whole idea would actually not work anyway. :(

@diosmosis
Copy link
Member Author

I just realise actually that we have now enabled sendBeacon by default and AFAIK we cannot get the response from this so the whole idea would actually not work anyway. :(

That's it! That was the reason why I couldn't figure out how to do it :)

@diosmosis
Copy link
Member Author

Haven't looked yet why Matomo would print a stack trace when there is a notice... wondering if an error handler needs to be registered to basically mute these errors? Otherwise the JSONP won't work...

Added an error handler for this case.

@diosmosis
Copy link
Member Author

I noticed the session storage key is eg mtm.extraTrackerConfig.1. I reckon this could be problematic since we cannot guarantee that each tracker always get the same ID? Even ourselves on some of our pages we might sometimes create a different tracker first and then the logic wouldn't work anymore. I suppose we would maybe need to include the idSite and matomo tracker url as a hash instead of the unique tracker ID?

Fixed. Issues should be fixed.

@diosmosis
Copy link
Member Author

Note: added an extra url= parameter to the JS query so plugins can see what the current page URL is.

@diosmosis
Copy link
Member Author

Actually adding the url= parameter is probably a mistake... I'll remove it.

  • remove url= param

@diosmosis diosmosis closed this Feb 7, 2020
@diosmosis diosmosis deleted the 15191-tracker-config branch April 15, 2020 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standardize site-specific tracker configuration based on server side data
2 participants