New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segments using "Event category is not X" or "Page URL does not contain" etc. will list visits with these actions in the Visits log #13785
Comments
this is because of d89a08b#diff-3a1e3a40187c4a2123a617b97506e102R183 No idea why we would match null values when using not equals and not contains in https://github.com/matomo-org/matomo/blob/3.8.0-b2/core/Segment/SegmentExpression.php#L233 maybe @mattab remembers |
@tsteur I think the idea is:
|
I see... didn't even know MySQL wouldn't return those results... just double tested it locally with a null value
not sure how to fix that issue then . I reckon #11900 might fix it, but hard to tell... |
Moving this to Matomo 4 as #11900 is in this milestone as well |
fyi: was hoping this was maybe fixed along #11900 but still seeing some visits that have events that should be excluded in the segment. For example at this url in this instance there are visits with the |
Visitor Log is a totally different story... and this issue in particular anyway see previous comments |
Same issue with "Page URL does not contain" was reported in #15708 |
maybe a solution to this issue could be to introduce a new Segment "All Page URLs in the visit". As opposed to just "Page URL" this new segment would ensure that all pages in the visit match the condition. for example:
Wondering if this would be feasible technically and how much effort that would be? cc @tsteur Edit: one challenge is that we'd need to duplicate all action segments to have their "All .... in the visit" segment. Would be a lot of new segments. |
I reckon this be more work and even a more hacky solution cause somewhere in the segmentation code it would require some |
I reckon it be good to fix this as part of Matomo 4 as early as possible just in case it needs some bigger changes eg how tables are joined etc. This allows us to have this feature tested from the beginning of the first beta and increases the chances to notice regressions. |
AFAIK the goal be as shown in #13785 (comment) The visit isn't expected to be shown. Whether we'll merge it in the end depends on the fix whether it causes big performance regressions etc. |
Had some thoughts on that issue and tried to check which parts of the code would need to be adjusted. I guess that might be a tough one. SELECT log_visit.*
FROM matomo_log_visit AS log_visit
LEFT JOIN matomo_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE (log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?) AND ((
(log_link_visit_action.idaction_url NOT IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 1))) OR
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 3))) OR
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 2))) OR
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 10)))
))
GROUP BY log_visit.idvisit
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12 The only way I can think of how to adjust the query so it does not return any visits that had an action that contained that url is to kind of invert the query. That means fetching the visitid that had an action with that url and use that result to fetch the visits that are not included in that list. SELECT log_visit.*
FROM matomo_log_visit AS log_visit
WHERE (log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?) AND
(log_visit.idvisit NOT IN (
SELECT log_link_visit_action.idvisit
FROM matomo_log_link_visit_action AS log_link_visit_action
WHERE
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 1))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 3))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 2))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 10)))
))
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12 But guess it would be hard to adjust the code so it does that query instead and the subquery for all visits that contains something might have a lot results, which might be a performance issue. Or does anyone see another possibility to adjust the query so it has the result we want to have? |
Can't think of any other solution. I can see how that would be very difficult to implement plus how it may be a performance issue. Personally, would maybe simply keep existing behaviour as it's not necessarily wrong but of course also not ideal. Also I reckon this is maybe not something that has to be done in Matomo 4 maybe as it's not breaking API. Note that any change here would also apply to regular reports I suppose and not just visits log. So it will also make these reports slower. Any thoughts @mattab ? |
if i understand correctly it would make the segments that use "Does not contain" slower, but would not affect other reports or segments. because currently it does not work as expected which creates confusion and also frustration (when you actually need to select visits that didn't view a particular event), it would be great to fix it soon-ish. but clearly we don't have to do it in 4.0 though and it would be fine to do it in 4.x but if possible, it would be valuable to fix this issue this year.. |
Ok. Shall we move the issue to 4.1 or 4.2 then? |
Sure can move it to 4.1. Although I'm wondering now if we do it anyway we might want to do this in 4.0 since this might have implication on some core logic that could maybe affect plugins, not sure. At least there will be a longer beta/RC period so better chance to find errors because of this change and what it impacts and there would be some time to revert this change again if needed etc. Let's do it then in 4.0 |
Yes, agreed that it would be better and safer to do it in 4.0. Otherwise It would be a bit risky/stressful to push it later without a long beta cycle |
I had some more thoughts on that now, but actually I'm not sure how we could solve that without a big performance decrease. I still have no idea how we can solve that with a nice query. So there is only the query I mentioned above: SELECT log_visit.*
FROM matomo_log_visit AS log_visit
WHERE (log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?) AND
(log_visit.idvisit NOT IN (
SELECT log_link_visit_action.idvisit
FROM matomo_log_link_visit_action AS log_link_visit_action
WHERE
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 1))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 3))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 2))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 10)))
))
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12 But with the way we are currently building queries it won't be possible to build a query like that. We can't simply concatenate all single segment query expressions like it is currently done. We kind of need to combine them by table so we are able to only have one subquery as above. Otherwise we might would have multiple subqueries with each action type. But even with only one subquery that query might become very slow, as the subquery might have too many results (it might even select all Also I guess such subqueries shouldn't be implemented for all cases, but only for those where it's needed. But that will actually make the code complexity for that even worse. @tsteur @diosmosis Has anyone a idea how we could solve that a better way? Otherwise I'll give it a try and will start implementing a POC tomorrow. |
I guess the subquery or multiple subqueries (if multiple NOT IN are used) is not avoidable and it will make visits log and archiving slow. The idea be to somewhere use $segment = '// part of the defined segment that should be matched in NOT IN eg eventCategory==="my-value"';
$segment = new Segment($segment, array($idSite));
$params = new ArchiveProcessor\Parameters($site, $period, $segment);
$logAggregator = new LogAggregator($params);
$select = 'log_visit.idvisit';
$from = 'log_visit';
$where = $logAggregator->getWhereStatement('log_visit', 'visit_last_action_time');
$query = $logAggregator->generateQuery($select, $from, $where, 'log_visit.idvisit', ''); This gives us the inner query... eg if the segment is Then the actual segment expression needs to be rewritten to use From a first look this can be probably done in I reckon the best place to make this work might be Not sure if that helps? |
Hi guys,
I am using Matomo 3.7.0
when I create a segment such as, Event Category is not 'my-value' the visitor log report keep showing me visits which include this event.
Is that a known behavior?
The text was updated successfully, but these errors were encountered: