New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix archiving too many segments that aren't needed and showing 0 conversions for new visits/returning visitors #18255
Conversation
// vs here we would use | ||
// userId!@%40matomo.org;userId!=hello%40matomo.org;visitorType==new | ||
// thus these would result in different segment hashes and therefore the reports would either show 0 or archive the data twice | ||
$newSegment = Segment::combine($params->getSegment()->getOriginalString(), SegmentExpression::AND_DELIMITER, $segment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some segments where originalString != getString this would actually result in a different segment hash and therefore a different done flag. However, it would have also not worked earlier AFAIK as the UI/API would have requested a different segment string and therefore also would have used a different hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Should indeed safe the archiving of unneeded segments 👍
Do you think it might make sense to somehow invalidate the segments that were archived without any use. We could eg go through all pre-archived segments and prepend new & returning visitors segment and invalidate them? But maybe that's not worth the effort in order to safe some database space...
@sgiehl I noticed on cloud that Matomo will try to re-archive some data because "originalString" is now used in dependent archives. I thought it wouldn't be a problem as it would only archive current day, week, ... But of course when it eg tries to archive the current year, it will also archive all subperiods for that period then. I wrote a migration script that I partially tested on our Matomo to migrate basically previous done flags to the new done flag so this re-archiving won't be happening as it can cause quite a bit of load. Can you have a look at this script to see if you notice any issues with that https://gist.github.com/tsteur/13721a2657838b6b71c1f688c9ac3fdd ? I could potentially add it as a DB upgrade script in Matomo then (if wanted) |
@tsteur had a look at it. In general I guess it should work. The only edge case that might possibly happen, is that archives for both segment variations might already exist upfront, and then the rename could fail with a duplicate key error. not sure if that is something that should be handled. |
Thanks @sgiehl I reckon duplicates shouldn't be an issue as the index is on |
We could check if the archives actually include any of the segment hashes and only show the needed update statements. Makes a few more selects though. |
@tsteur not sure if it would be better to first do some simple selects. Currently it would fire an update statement on each archive table, even though it might not need to change anything. Guess for people executing those queries manually it would be better to have only required queries listed. |
@sgiehl added the select query 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsteur I've merged in the latest 4.x-dev changes and bumped the version number, so the update script is triggered. For me locally, not even one query was needed. Seems the segments were already generated correctly, for my case.
Nevertheless, if that worked for you, feel free to merge this PR if tests are passing.
@sgiehl it's only a problem for some segments. Like |
The migration script seems to have worked quite well for us. |
Description:
Noticed while looking into #18088 that we're archiving way too many segments when eg viewing a goals overview page see the issue on how to reproduce it.
Previously we would have archived 9 archives when no segment was used, and 13 archives when a segment was used. Now we're only archiving 6 archives each which is:
Each of these get archived twice. Once for goals and once for visits summary. Previously we would have archived a lot of segments but never looked at these.
Eg previously we would have archived roughly like this
Review