Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix archiving too many segments that aren't needed and showing 0 conversions for new visits/returning visitors #18255

Merged
merged 8 commits into from Nov 9, 2021

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Nov 2, 2021

Description:

Noticed while looking into #18088 that we're archiving way too many segments when eg viewing a goals overview page see the issue on how to reproduce it.

Previously we would have archived 9 archives when no segment was used, and 13 archives when a segment was used. Now we're only archiving 6 archives each which is:

  • an archive for all segment (or the given segment)
  • an archive for the segment + new visitors
  • an archive for the segment + returning visitors

Each of these get archived twice. Once for goals and once for visits summary. Previously we would have archived a lot of segments but never looked at these.

Eg previously we would have archived roughly like this

  • an archive for all segment (or the given segment)
  • an archive for the segment + new visitors
  • an archive for the segment + returning visitors
  • new visitors + an archive for the segment
  • returning visitors + an archive for the segment
  • an archive for the segment wrong encoding + new visitors
  • an archive for the segment wrong encoding + returning visitors
  • new visitors + an archive for the segment + returning visitors
  • returning visitors + an archive for the segment + new visitors

Review

plugins/VisitFrequency/API.php Show resolved Hide resolved
plugins/Goals/API.php Show resolved Hide resolved
@tsteur tsteur added the Needs Review PRs that need a code review label Nov 2, 2021
@tsteur tsteur added this to the 4.6.0 milestone Nov 2, 2021
@tsteur tsteur changed the title Segmentfixes Fix archiving too many segments that aren't needed and showing 0 conversions for new visits/returning visitors Nov 2, 2021
@tsteur
Copy link
Member Author

tsteur commented Nov 2, 2021

I can reproduce this actually on our own Matomo how we're seeing 0 conversions for new visits/returning visitors depending on what segment is selected.

image

// vs here we would use
// userId!@%40matomo.org;userId!=hello%40matomo.org;visitorType==new
// thus these would result in different segment hashes and therefore the reports would either show 0 or archive the data twice
$newSegment = Segment::combine($params->getSegment()->getOriginalString(), SegmentExpression::AND_DELIMITER, $segment);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some segments where originalString != getString this would actually result in a different segment hash and therefore a different done flag. However, it would have also not worked earlier AFAIK as the UI/API would have requested a different segment string and therefore also would have used a different hash.

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Should indeed safe the archiving of unneeded segments 👍
Do you think it might make sense to somehow invalidate the segments that were archived without any use. We could eg go through all pre-archived segments and prepend new & returning visitors segment and invalidate them? But maybe that's not worth the effort in order to safe some database space...

@tsteur
Copy link
Member Author

tsteur commented Nov 3, 2021

@sgiehl I noticed on cloud that Matomo will try to re-archive some data because "originalString" is now used in dependent archives. I thought it wouldn't be a problem as it would only archive current day, week, ...

But of course when it eg tries to archive the current year, it will also archive all subperiods for that period then.

I wrote a migration script that I partially tested on our Matomo to migrate basically previous done flags to the new done flag so this re-archiving won't be happening as it can cause quite a bit of load.

Can you have a look at this script to see if you notice any issues with that https://gist.github.com/tsteur/13721a2657838b6b71c1f688c9ac3fdd ?

I could potentially add it as a DB upgrade script in Matomo then (if wanted)

@sgiehl
Copy link
Member

sgiehl commented Nov 4, 2021

@tsteur had a look at it. In general I guess it should work. The only edge case that might possibly happen, is that archives for both segment variations might already exist upfront, and then the rename could fail with a duplicate key error. not sure if that is something that should be handled.

@tsteur
Copy link
Member Author

tsteur commented Nov 4, 2021

Thanks @sgiehl I reckon duplicates shouldn't be an issue as the index is on idarchive,name meaning duplicate names are no problem as long as the idarchive is different
image

@tsteur
Copy link
Member Author

tsteur commented Nov 4, 2021

Added a migration script @sgieh
Screenshot 2021-11-05 at 10 18 13 AM
l've also tested the script locally and on our cloud. Making sure it now generates the same segment hashes as would be used by the logic itself and seems to all work.

@tsteur
Copy link
Member Author

tsteur commented Nov 4, 2021

We could check if the archives actually include any of the segment hashes and only show the needed update statements. Makes a few more selects though.

@sgiehl
Copy link
Member

sgiehl commented Nov 5, 2021

@tsteur not sure if it would be better to first do some simple selects. Currently it would fire an update statement on each archive table, even though it might not need to change anything. Guess for people executing those queries manually it would be better to have only required queries listed.

@tsteur
Copy link
Member Author

tsteur commented Nov 7, 2021

@sgiehl added the select query 👍

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsteur I've merged in the latest 4.x-dev changes and bumped the version number, so the update script is triggered. For me locally, not even one query was needed. Seems the segments were already generated correctly, for my case.
Nevertheless, if that worked for you, feel free to merge this PR if tests are passing.

@tsteur
Copy link
Member Author

tsteur commented Nov 8, 2021

@sgiehl it's only a problem for some segments. Like actions>1 isn't a problem, but more complicated segments like userId!@%40matomo.org;userId!=hello%40matomo.org;visitorType==new would be affected. It worked for me and we'll run this on production soon. Can wait till then.

@sgiehl sgiehl removed the Needs Review PRs that need a code review label Nov 8, 2021
@tsteur
Copy link
Member Author

tsteur commented Nov 9, 2021

The migration script seems to have worked quite well for us.

@tsteur tsteur merged commit 4fe950a into 4.x-dev Nov 9, 2021
@tsteur tsteur deleted the segmentfixes branch November 9, 2021 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants