Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds tests for referrer attribution of visits & conversions #19250

Merged
merged 7 commits into from Jun 9, 2022

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented May 20, 2022

Description:

While writing tests for the referrer attribution, I came across an issue where the referrer url was not updated correctly.
Having a closer look at the code around dimensions, visitor properties and how they are updated showed, that we might have an issue there.

Currently when a new action happens all visitor properties are loaded and the onExistingVisit hook is triggered. This one might return a value if it needs to be changed. If that is the case the property is directly updated.

This might actually cause problems as it's not possible to rely on the data that is provided. I'll try to explain that with a simple example around the referrer detection.

The referrer url has an onExistingVisit hook, that checks if the referrer type is DIRECT. If that is the case, the URL will be updated.
The referrer type basically does the same. The problem that might occur here is if the referrer type property is updated before the referrer url, the type might have already been changed to another value, causing the referrer url not to update, as the referrer type isn't DIRECT anymore.
If both dimensions are updated in reversed order both dimensions would change their value. So the result basically depends on the order of the dimensions, which depends on the order the plugins are loaded. So that can be kind of random.

So what this PR basically changes is, that (when triggering an hook for updating dimensions) the hooks for all dimensions will be triggered and the values that need to be changed will be gathered. After all hooks were triggered all dimensions will be updated at once. This guarantees that all dimension hooks will be triggered with the same set of properties.

As this is quite a fundamental change on how that works, it might be good if @tsteur might have a look here as well.

Additionally this PR adds tests with around 440 test cases for referrer attribution.

refs #18612

Review

@sgiehl sgiehl added Bug For errors / faults / flaws / inconsistencies etc. Needs Review PRs that need a code review labels May 20, 2022
@sgiehl sgiehl added this to the 4.11.0 milestone May 20, 2022
@sgiehl sgiehl requested a review from tsteur May 20, 2022 15:44
@sgiehl sgiehl force-pushed the attributiontests branch 2 times, most recently from 0d450f6 to b5d7709 Compare May 20, 2022 21:38
@tsteur
Copy link
Member

tsteur commented May 23, 2022

@sgiehl have you checked every dimension (including non-core plugins) to see if this could cause any regression there?

Do I see this right that we have a method $visit->getPreviousVisitColumn() for this use case? Would utilising this work and be less risky of a change? I believe it was supposed to work as is in that it updates the values directly, and if we wanted to access a previous value we would use that method.

@sgiehl
Copy link
Member Author

sgiehl commented May 24, 2022

@tsteur I was thinking about using that method in the first place. But looking through the code I wasn't sure if it really contains what I would have expected. The previous visit property might either contain the current visit data or the previous visit data, if a new one was started.

Anyway, shouldn't we use the same data for triggering every dimensions hook? Otherwise it can always end in unknown results based on the order the dimensions are walked through.
Or we might need some sort of dependency. So e.g. the referrer url column can define it depends on der referrer type, so the dimensions are triggered in an order where dependent dimensions are triggered later. But this for sure might be a bit more effort.

I also had a look through the usages of getVisitorColumn. Besides in core, where all usages seem to be not affected by the changes, the method is also used in https://github.com/matomo-org/plugin-MarketingCampaignsReporting/blob/aabd08e7dea97077c39c281b61270ac037831551/Columns/CampaignName.php#L79, but there it should actually also have a positive effect like for the referrer columns in core.

@sgiehl sgiehl force-pushed the attributiontests branch 2 times, most recently from be5c5d8 to ce9fc46 Compare May 24, 2022 14:34
@tsteur
Copy link
Member

tsteur commented May 29, 2022

But looking through the code I wasn't sure if it really contains what I would have expected. The previous visit property might either contain the current visit data or the previous visit data, if a new one was started.

What was the unexpected there @sgiehl ? I'm not so much into that part right now so it's not quite clear. If it's the different data for existing vs new visit, maybe we could look changing this?

Anyway, shouldn't we use the same data for triggering every dimensions hook? Otherwise it can always end in unknown results based on the order the dimensions are walked through.

I don't think that needs to be the case if we have clear naming for those methods for whether we access data from previous visit or current visit that may have been updated already. Maybe getVisitorColumn could be renamed in Matomo 5 to something like getCurrentVisitorColumn and then we have the flexibility to either access a potentially modified value or to access data from a previous visit.

AFAIK dimensions are sorted depending on the plugin order. Meaning core plugins are updated first and other non-core plugins might want to see for example what was updated etc.

Does that make sense? Possible I misunderstand something

@sgiehl
Copy link
Member Author

sgiehl commented May 30, 2022

I got confused by the naming of the method already. For me getPreviousVisitColumn sound like it would serve the value of the column the visitor had in the previous visit. And not that it serves the column the visitor previously had in the current visit.
Looking at the code it seems the method might be doing a bit of both. If I didn't see that wrong, for a new visit, the method would return data from the previous visit, but for an existing visit it will return the previously set data.

But back to the actual problem I'm seeing with directly updating column properties. As I tried to describe before, doing that might bring up results in an unexpected state. The order in which the columns are processed is kind of defined by the plugins list in global config. All additional plugins are processed afterwards in sorted order. The problem I'm seeing with that, is that the result for getVisitorColumn in the hooks can be different, depending on whether the plugin calling it is processed earlier or later than the column being requested.
If working with updated values is required in some cases, we might need to introduce some sort of dependency, to ensure other columns are updated before.

The current problem with incorrectly set referrer url in some cases might also be solvable by changing the columns to use getPreviousVisitColumn. Maybe that would even be more correct, as they should never use updated values in that case. Nevertheless we imho should try to solve the other problem, to prevent any similar issues, that are really hard to identify.
But we could maybe also postpone this to Matomo 5 and revisit this topic then again 🤔

@tsteur
Copy link
Member

tsteur commented May 31, 2022

Regarding the dependency I think the dependency is maybe less between non-core plugins but more like that non-core plugins should be executed after core plugins.

Indeed the naming is really not clear. I think that's the biggest problem here. Because with getPreviousVisitColumn we can already access the data that we want. It's just that the names of all these methods are not clear and maybe with Matomo 5 we could rename them to make it more clear. Then meanwhile referrerUrl etc could use getPreviousVisitColumn?

This way we would keep having the flexibility to access the original values or the changed values (knowing that there's only basic dependency management which may be good enough)

@sgiehl
Copy link
Member Author

sgiehl commented May 31, 2022

@tsteur I tried updating the PR to use getPreviousVisitColumn instead, but that actually doesn't work due to the code structure of the Referrers plugin. It uses the same detection methods for detection the referrer for a new and an existing visit (e.g. onNewVisit and onExistingVisit hooks) as well as for detecting new visits (shouldForceNewVisit hook). The problem here is, that the previousVisitProperties for new visits contain the data from the previous visit or are not even set (in shouldForceNewVisit hook). So this currently results in incorrect data, which can be seen in the failing tests.

Refactoring and renaming the methods in Matomo 5 sounds good. We maybe should also remove the different use cases for previousVisitProperties or at least implement some possibility to check if it contains the current visits properties or the previous visits ones.

But to solve this here now, we either use the "update all at once" I had implemented before, or I need to spend another day on refactoring the referrer detection.

@tsteur
Copy link
Member

tsteur commented Jun 1, 2022

While writing tests for the referrer attribution, I came across an issue where the referrer url was not updated correctly.

Coming back to the original issue. Can we maybe understand the problem a bit better to know how big the problem actually is? And then potentially we could actually create an issue for it so it can be scheduled.

I'm not really sure we want to apply the original update all at once as it could cause a regression and it's really hard to tell. I looked at some code but it's hard to tell if it would be fine or not and it removes some flexibility maybe.

Maybe for now we only merge the tests and create a new bug report for this issue so it can be prioritised and scheduled if needed?

@sgiehl
Copy link
Member Author

sgiehl commented Jun 1, 2022

@tsteur The tests are failing without the change, as the referrer url is not updated correctly.
Each time the initial tracking request doesn't contain a referrer (e.g. direct), but an ongoing request contains one, the referrer is updated. Currently the referrer_url is staying empty, which possibly causes some incorrect numbers in the referrer reports, at least where the url is fully shown.
I can for sure adjust the tests so the reflect the current incorrect behavior 🤷

@tsteur
Copy link
Member

tsteur commented Jun 2, 2022

I can for sure adjust the tests so the reflect the current incorrect behavior 🤷

👍 Let's do this for now and create a new issue describing the problem what it means for users. It's not really fully clear yet what the impact of it is. And be also good to qualify if it would create a new visit in some cases?

I've tried to confirm implications re the initial solution but it would take quite a while to be fully sure it doesn't cause any regression thinking of a case like location_provider where I'm not sure it be fine to change it or not for example.

Updating them after a hook it's also still not clear that the next time a different hook is triggered, the visit is updated after all.

Maybe in Matomo 5 it could be fixed properly or maybe Tracker/Visitor would need yet another method for this case or we would just need two different ways: One to access the unchanged visit/action, and one to change the changed/updated visit maybe.

@sgiehl
Copy link
Member Author

sgiehl commented Jun 3, 2022

@tsteur I've now adjusted the tests so they work without any other changes, but added some todos so it's clear which parts are actual incorrect behavior.

Also created a follow-up issue for Matomo 5: #19316 Feel free to update the description or add a comment if I forgot to mention something important.

@sgiehl sgiehl added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. c: Tests & QA For issues related to automated tests or making it easier to QA & test issues. and removed Bug For errors / faults / flaws / inconsistencies etc. labels Jun 3, 2022
@sgiehl sgiehl changed the title Attribution properties may not be updated correctly in some cases Adds tests for referrer attribution of visits & conversions Jun 3, 2022
@justinvelluppillai justinvelluppillai modified the milestones: 4.11.0, 4.12.0 Jun 7, 2022
@tsteur
Copy link
Member

tsteur commented Jun 7, 2022

Great, thanks 👍

@sgiehl
Copy link
Member Author

sgiehl commented Jun 9, 2022

Tests are passing, so I'll merge this one now. Everything else will be handled in the follow-up issue.

@sgiehl sgiehl merged commit b3fa7d8 into 4.x-dev Jun 9, 2022
@sgiehl sgiehl deleted the attributiontests branch June 9, 2022 13:01
peterhashair pushed a commit that referenced this pull request Jun 16, 2022
* Adds first bunch of referrer attribution tests

* simplify tests

* refactor tests to use data provider, so its easier adding new tests

* Add new test cases

* Refactor tests into automically created test cases

* implement tests for campaign handling

* Adjust tests so they pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Tests & QA For issues related to automated tests or making it easier to QA & test issues. Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants