@sgiehl opened this Pull Request on May 20th 2022 Member

Description:

While writing tests for the referrer attribution, I came across an issue where the referrer url was not updated correctly.
Having a closer look at the code around dimensions, visitor properties and how they are updated showed, that we might have an issue there.

Currently when a new action happens all visitor properties are loaded and the onExistingVisit hook is triggered. This one might return a value if it needs to be changed. If that is the case the property is directly updated.

This might actually cause problems as it's not possible to rely on the data that is provided. I'll try to explain that with a simple example around the referrer detection.

The referrer url has an onExistingVisit hook, that checks if the referrer type is DIRECT. If that is the case, the URL will be updated.
The referrer type basically does the same. The problem that might occur here is if the referrer type property is updated before the referrer url, the type might have already been changed to another value, causing the referrer url not to update, as the referrer type isn't DIRECT anymore.
If both dimensions are updated in reversed order both dimensions would change their value. So the result basically depends on the order of the dimensions, which depends on the order the plugins are loaded. So that can be kind of random.

So what this PR basically changes is, that (when triggering an hook for updating dimensions) the hooks for all dimensions will be triggered and the values that need to be changed will be gathered. After all hooks were triggered all dimensions will be updated at once. This guarantees that all dimension hooks will be triggered with the same set of properties.

As this is quite a fundamental change on how that works, it might be good if @tsteur might have a look here as well.

Additionally this PR adds tests with around 440 test cases for referrer attribution.

refs #18612

Review

@tsteur commented on May 23rd 2022 Member

@sgiehl have you checked every dimension (including non-core plugins) to see if this could cause any regression there?

Do I see this right that we have a method $visit->getPreviousVisitColumn() for this use case? Would utilising this work and be less risky of a change? I believe it was supposed to work as is in that it updates the values directly, and if we wanted to access a previous value we would use that method.

@sgiehl commented on May 24th 2022 Member

@tsteur I was thinking about using that method in the first place. But looking through the code I wasn't sure if it really contains what I would have expected. The previous visit property might either contain the current visit data or the previous visit data, if a new one was started.

Anyway, shouldn't we use the same data for triggering every dimensions hook? Otherwise it can always end in unknown results based on the order the dimensions are walked through.
Or we might need some sort of dependency. So e.g. the referrer url column can define it depends on der referrer type, so the dimensions are triggered in an order where dependent dimensions are triggered later. But this for sure might be a bit more effort.

I also had a look through the usages of getVisitorColumn. Besides in core, where all usages seem to be not affected by the changes, the method is also used in https://github.com/matomo-org/plugin-MarketingCampaignsReporting/blob/aabd08e7dea97077c39c281b61270ac037831551/Columns/CampaignName.php#L79, but there it should actually also have a positive effect like for the referrer columns in core.

@tsteur commented on May 29th 2022 Member

But looking through the code I wasn't sure if it really contains what I would have expected. The previous visit property might either contain the current visit data or the previous visit data, if a new one was started.

What was the unexpected there @sgiehl ? I'm not so much into that part right now so it's not quite clear. If it's the different data for existing vs new visit, maybe we could look changing this?

Anyway, shouldn't we use the same data for triggering every dimensions hook? Otherwise it can always end in unknown results based on the order the dimensions are walked through.

I don't think that needs to be the case if we have clear naming for those methods for whether we access data from previous visit or current visit that may have been updated already. Maybe getVisitorColumn could be renamed in Matomo 5 to something like getCurrentVisitorColumn and then we have the flexibility to either access a potentially modified value or to access data from a previous visit.

AFAIK dimensions are sorted depending on the plugin order. Meaning core plugins are updated first and other non-core plugins might want to see for example what was updated etc.

Does that make sense? Possible I misunderstand something

@sgiehl commented on May 30th 2022 Member

I got confused by the naming of the method already. For me getPreviousVisitColumn sound like it would serve the value of the column the visitor had in the previous visit. And not that it serves the column the visitor previously had in the current visit.
Looking at the code it seems the method might be doing a bit of both. If I didn't see that wrong, for a new visit, the method would return data from the previous visit, but for an existing visit it will return the previously set data.

But back to the actual problem I'm seeing with directly updating column properties. As I tried to describe before, doing that might bring up results in an unexpected state. The order in which the columns are processed is kind of defined by the plugins list in global config. All additional plugins are processed afterwards in sorted order. The problem I'm seeing with that, is that the result for getVisitorColumn in the hooks can be different, depending on whether the plugin calling it is processed earlier or later than the column being requested.
If working with updated values is required in some cases, we might need to introduce some sort of dependency, to ensure other columns are updated before.

The current problem with incorrectly set referrer url in some cases might also be solvable by changing the columns to use getPreviousVisitColumn. Maybe that would even be more correct, as they should never use updated values in that case. Nevertheless we imho should try to solve the other problem, to prevent any similar issues, that are really hard to identify.
But we could maybe also postpone this to Matomo 5 and revisit this topic then again 🤔

@tsteur commented on May 31st 2022 Member

Regarding the dependency I think the dependency is maybe less between non-core plugins but more like that non-core plugins should be executed after core plugins.

Indeed the naming is really not clear. I think that's the biggest problem here. Because with getPreviousVisitColumn we can already access the data that we want. It's just that the names of all these methods are not clear and maybe with Matomo 5 we could rename them to make it more clear. Then meanwhile referrerUrl etc could use getPreviousVisitColumn?

This way we would keep having the flexibility to access the original values or the changed values (knowing that there's only basic dependency management which may be good enough)

@sgiehl commented on May 31st 2022 Member

@tsteur I tried updating the PR to use getPreviousVisitColumn instead, but that actually doesn't work due to the code structure of the Referrers plugin. It uses the same detection methods for detection the referrer for a new and an existing visit (e.g. onNewVisit and onExistingVisit hooks) as well as for detecting new visits (shouldForceNewVisit hook). The problem here is, that the previousVisitProperties for new visits contain the data from the previous visit or are not even set (in shouldForceNewVisit hook). So this currently results in incorrect data, which can be seen in the failing tests.

Refactoring and renaming the methods in Matomo 5 sounds good. We maybe should also remove the different use cases for previousVisitProperties or at least implement some possibility to check if it contains the current visits properties or the previous visits ones.

But to solve this here now, we either use the "update all at once" I had implemented before, or I need to spend another day on refactoring the referrer detection.

@tsteur commented on June 1st 2022 Member

While writing tests for the referrer attribution, I came across an issue where the referrer url was not updated correctly.

Coming back to the original issue. Can we maybe understand the problem a bit better to know how big the problem actually is? And then potentially we could actually create an issue for it so it can be scheduled.

I'm not really sure we want to apply the original update all at once as it could cause a regression and it's really hard to tell. I looked at some code but it's hard to tell if it would be fine or not and it removes some flexibility maybe.

Maybe for now we only merge the tests and create a new bug report for this issue so it can be prioritised and scheduled if needed?

@sgiehl commented on June 1st 2022 Member

@tsteur The tests are failing without the change, as the referrer url is not updated correctly.
Each time the initial tracking request doesn't contain a referrer (e.g. direct), but an ongoing request contains one, the referrer is updated. Currently the referrer_url is staying empty, which possibly causes some incorrect numbers in the referrer reports, at least where the url is fully shown.
I can for sure adjust the tests so the reflect the current incorrect behavior 🤷

@tsteur commented on June 2nd 2022 Member

I can for sure adjust the tests so the reflect the current incorrect behavior 🤷

👍 Let's do this for now and create a new issue describing the problem what it means for users. It's not really fully clear yet what the impact of it is. And be also good to qualify if it would create a new visit in some cases?

I've tried to confirm implications re the initial solution but it would take quite a while to be fully sure it doesn't cause any regression thinking of a case like location_provider where I'm not sure it be fine to change it or not for example.

Updating them after a hook it's also still not clear that the next time a different hook is triggered, the visit is updated after all.

Maybe in Matomo 5 it could be fixed properly or maybe Tracker/Visitor would need yet another method for this case or we would just need two different ways: One to access the unchanged visit/action, and one to change the changed/updated visit maybe.

@sgiehl commented on June 3rd 2022 Member

@tsteur I've now adjusted the tests so they work without any other changes, but added some todos so it's clear which parts are actual incorrect behavior.

Also created a follow-up issue for Matomo 5: #19316 Feel free to update the description or add a comment if I forgot to mention something important.

@tsteur commented on June 7th 2022 Member

Great, thanks 👍

@sgiehl commented on June 9th 2022 Member

Tests are passing, so I'll merge this one now. Everything else will be handled in the follow-up issue.

This Pull Request was closed on June 9th 2022
Powered by GitHub Issue Mirror