@tsteur opened this Pull Request on November 2nd 2021 Member

Description:

Noticed while looking into https://github.com/matomo-org/matomo/issues/18088 that we're archiving way too many segments when eg viewing a goals overview page see the issue on how to reproduce it.

Previously we would have archived 9 archives when no segment was used, and 13 archives when a segment was used. Now we're only archiving 6 archives each which is:

  • an archive for all segment (or the given segment)
  • an archive for the segment + new visitors
  • an archive for the segment + returning visitors

Each of these get archived twice. Once for goals and once for visits summary. Previously we would have archived a lot of segments but never looked at these.

Eg previously we would have archived roughly like this

  • an archive for all segment (or the given segment)
  • an archive for the segment + new visitors
  • an archive for the segment + returning visitors
  • new visitors + an archive for the segment
  • returning visitors + an archive for the segment
  • an archive for the segment wrong encoding + new visitors
  • an archive for the segment wrong encoding + returning visitors
  • new visitors + an archive for the segment + returning visitors
  • returning visitors + an archive for the segment + new visitors

Review

@tsteur commented on November 2nd 2021 Member

I can reproduce this actually on our own Matomo how we're seeing 0 conversions for new visits/returning visitors depending on what segment is selected.

image

@tsteur commented on November 3rd 2021 Member

@sgiehl I noticed on cloud that Matomo will try to re-archive some data because "originalString" is now used in dependent archives. I thought it wouldn't be a problem as it would only archive current day, week, ...

But of course when it eg tries to archive the current year, it will also archive all subperiods for that period then.

I wrote a migration script that I partially tested on our Matomo to migrate basically previous done flags to the new done flag so this re-archiving won't be happening as it can cause quite a bit of load.

Can you have a look at this script to see if you notice any issues with that https://gist.github.com/tsteur/13721a2657838b6b71c1f688c9ac3fdd ?

I could potentially add it as a DB upgrade script in Matomo then (if wanted)

@sgiehl commented on November 4th 2021 Member

@tsteur had a look at it. In general I guess it should work. The only edge case that might possibly happen, is that archives for both segment variations might already exist upfront, and then the rename could fail with a duplicate key error. not sure if that is something that should be handled.

@tsteur commented on November 4th 2021 Member

Thanks @sgiehl I reckon duplicates shouldn't be an issue as the index is on idarchive,name meaning duplicate names are no problem as long as the idarchive is different
image

@tsteur commented on November 4th 2021 Member

Added a migration script @sgieh
Screenshot 2021-11-05 at 10 18 13 AM
l've also tested the script locally and on our cloud. Making sure it now generates the same segment hashes as would be used by the logic itself and seems to all work.

@tsteur commented on November 4th 2021 Member

We could check if the archives actually include any of the segment hashes and only show the needed update statements. Makes a few more selects though.

@sgiehl commented on November 5th 2021 Member

@tsteur not sure if it would be better to first do some simple selects. Currently it would fire an update statement on each archive table, even though it might not need to change anything. Guess for people executing those queries manually it would be better to have only required queries listed.

@tsteur commented on November 7th 2021 Member

@sgiehl added the select query 👍

@tsteur commented on November 8th 2021 Member

@sgiehl it's only a problem for some segments. Like actions>1 isn't a problem, but more complicated segments like userId!@%40matomo.org;userId!=hello%40matomo.org;visitorType==new would be affected. It worked for me and we'll run this on production soon. Can wait till then.

@tsteur commented on November 9th 2021 Member

The migration script seems to have worked quite well for us.

This Pull Request was closed on November 9th 2021
Powered by GitHub Issue Mirror