Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude some url parameters from referrer urls #15905

Merged
merged 6 commits into from May 7, 2020
Merged

Exclude some url parameters from referrer urls #15905

merged 6 commits into from May 7, 2020

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented May 4, 2020

fixes #15902

@sgiehl sgiehl added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels May 4, 2020
@sgiehl sgiehl added this to the 3.13.6 milestone May 4, 2020
core/Tracker/PageUrl.php Outdated Show resolved Hide resolved
@sgiehl sgiehl force-pushed the instaparams branch 2 times, most recently from b95515b to 7344e55 Compare May 5, 2020 08:14
@sgiehl sgiehl changed the title Exclude tracking parameters from instagram referrer urls Exclude some url parameters from referrer urls May 5, 2020
@sgiehl sgiehl requested a review from tsteur May 5, 2020 08:16
}
if (!empty($this->referrerHost) && strpos($this->referrerHost, 'bing.com') !== false) {
$parametersToExclude[] = 'cvid';
$parametersToExclude[] = 'refig';
$parametersToExclude[] = 'elv';
$parametersToExclude[] = 'plvar';
$parametersToExclude[] = 'setlang';
$parametersToExclude[] = 'qs';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure you want to exclude setlang and cc? those are set to iso country and language codes based on the users selection. As bing is not using country domains that might be the only way to get that info (if it's relevant for someone)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should be able to see this based on geolocation and detected language I suppose. Removed it cause was seeing heaps of referrers from different bing countries / languages. Then each referrer url had like only 1 visit and weren't visible vs if we remove these parameters, they would have had maybe a count of 4 or 5 and then these referrers would have become visible in the reports because they represent same search term. It's really about trying to group as many referrers as meaningful together as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But then we could also group together all google tlds 🤷
Actual I'm wondering if that is the correct way. We are manipulating the tracked referrer and might cut of information that might possibly be useful for anyone.
Where exactly does it cause memory problems having too many different referrer urls? Wondering if it wouldn't be enough to do the referrer url manipulation only while archiving the reports and keep storing the original referrer in the database for live reports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes issues in MySQL... like queries taking 6 hours to run because they do group by "referer_type", "referer_name", "referer_keyword", "referer_url" resulting in millions of rows vs now we're like doing with 30K rows.

We are manipulating the tracked referrer and might cut of information that might possibly be useful for anyone.

Sure some might be interested, and they could maybe write a plugin. But it would be a very small percentage that is impacted by this likely vs say 95% would profit from it. Eventually we will be even cutting out a lot more by default eg for privacy reason as otherwise a lot of personal data is tracked requiring tracking consent see #15426

It's also making the reports more useful as more information is grouped together and things become more accurate. Super important referrers suddenly show up in reports (we can see the effect already) that were invisible before.

AFAIK there wouldn't be much of a way for users to utilise the full referrer url currently anyway? Except for maybe segmenting?

So far it really helps with memory, performance, correctness of data, and even improves privacy.

Of course if needed can keep some parameters like lang or country if really needed, just makes the result maybe less accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. makes sense. Should we implement a possibility to disable that manipulation?
If not, feel free to review and merge if everything is fine now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always add a setting later if / when needed. Not wanting to add too many settings that may not be needed.

@tsteur
Copy link
Member

tsteur commented May 7, 2020

Cheers @sgiehl

@tsteur tsteur merged commit bd9aa9d into 3.x-dev May 7, 2020
@tsteur tsteur deleted the instaparams branch May 7, 2020 20:27
jonasgrilleres pushed a commit to 1024pix/pix-analytics that referenced this pull request Sep 22, 2020
jbuget pushed a commit to 1024pix/pix-analytics that referenced this pull request Sep 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instagram generates heaps of different referrer urls causing out of memory issues
2 participants