@NanorPiwik opened this Issue on December 1st 2018

Hi guys,
I am using Matomo 3.7.0
when I create a segment such as, Event Category is not 'my-value' the visitor log report keep showing me visits which include this event.
Is that a known behavior?

@tsteur commented on December 1st 2018 Member
@mattab commented on December 3rd 2018 Member

No idea why we would match null values when using not equals and not contains in https://github.com/matomo-org/matomo/blob/3.8.0-b2/core/Segment/SegmentExpression.php#L233

@tsteur I think the idea is:

  • for not equals eg. "City is not equal to London" then we expect to return all visits that had another city set, or those that didn't have a city set/geo-located.
  • for not contains eg. "Keyword does not contain X" should return entries that had any keyword set, and also no keyword set.
@tsteur commented on December 3rd 2018 Member

I see... didn't even know MySQL wouldn't return those results... just double tested it locally with a null value

select config_device_type from piwik_log_visit where idvisit = 5281842 and config_device_type <> 'foo' where config_device_type is null... the entry was not returned...

not sure how to fix that issue then . I reckon https://github.com/matomo-org/matomo/issues/11900 might fix it, but hard to tell...

@tsteur commented on July 14th 2019 Member

Moving this to Matomo 4 as https://github.com/matomo-org/matomo/issues/11900 is in this milestone as well

@mattab commented on October 22nd 2019 Member

fyi: was hoping this was maybe fixed along #11900 but still seeing some visits that have events that should be excluded in the segment. For example at this url in this instance there are visits with the MediaVideo event category still.

@tsteur commented on October 22nd 2019 Member

Visitor Log is a totally different story... and this issue in particular anyway see previous comments

@mattab commented on March 19th 2020 Member

Same issue with "Page URL does not contain" was reported in https://github.com/matomo-org/matomo/issues/15708

image

@mattab commented on March 23rd 2020 Member

maybe a solution to this issue could be to introduce a new Segment "All Page URLs in the visit". As opposed to just "Page URL" this new segment would ensure that all pages in the visit match the condition. for example:

  • "All Page URLs in the visit" is not https://host/page -> would return only visits where none of the pages urls are https://host/page
  • "All Page URLs in the visit" does not contain STRING -> would return only visits where none of the pages urls contain STRING

Wondering if this would be feasible technically and how much effort that would be? cc @tsteur

Edit: one challenge is that we'd need to duplicate all action segments to have their "All .... in the visit" segment. Would be a lot of new segments.

@tsteur commented on March 24th 2020 Member

I reckon this be more work and even a more hacky solution cause somewhere in the segmentation code it would require some IF segmentName = XYZ then change behaviour of does not contain. We need to instead change the way the does not contain filter works in general otherwise you have the issue for other segments as well and things get very messy and unpredictable and untestable.

@sgiehl commented on April 22nd 2020 Member

@tsteur @mattab how to proceed with this issue? What exactly should be changed?
Btw. is it still important for first Matomo 4 RC or should it be postponed?

@tsteur commented on April 22nd 2020 Member

I reckon it be good to fix this as part of Matomo 4 as early as possible just in case it needs some bigger changes eg how tables are joined etc. This allows us to have this feature tested from the beginning of the first beta and increases the chances to notice regressions.

@tsteur commented on April 22nd 2020 Member

AFAIK the goal be as shown in https://github.com/matomo-org/matomo/issues/13785#issuecomment-601419663

image

The visit isn't expected to be shown. Whether we'll merge it in the end depends on the fix whether it causes big performance regressions etc.

@sgiehl commented on June 3rd 2020 Member

Had some thoughts on that issue and tried to check which parts of the code would need to be adjusted. I guess that might be a though one.
Querying the visitorlog for a segment like pageurl*=diving would currently result in a query like this:

SELECT log_visit.*
FROM matomo_log_visit AS log_visit
LEFT JOIN matomo_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE (log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?) AND ((
(log_link_visit_action.idaction_url NOT IN ( 
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 1))) OR 
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 3))) OR 
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 2))) OR 
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name NOT LIKE CONCAT('%', ?, '%') AND TYPE = 10)))
))
GROUP BY log_visit.idvisit
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12

The only way I can think of how to adjust the query so it does not return any visits that had an action that contained that url is to kind of invert the query. That means fetching the visitid that had an action with that url and use that result to fetch the visits that are not included in that list.
So some query like this:

SELECT log_visit.*
FROM matomo_log_visit AS log_visit
WHERE (log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?) AND 
(log_visit.idvisit NOT IN (
SELECT log_link_visit_action.idvisit
FROM matomo_log_link_visit_action AS log_link_visit_action
WHERE 
(log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 1))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 3))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 2))) OR (log_link_visit_action.idaction_url IN (
SELECT idaction
FROM matomo_log_action
WHERE (name LIKE CONCAT('%', ?, '%') AND TYPE = 10)))
))
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12

But guess it would be hard to adjust the code so it does that query instead and the subquery for all visits that contains something might have a lot results, which might be a performance issue.

Or does anyone see another possibility to adjust the query so it has the result we want to have?

@tsteur commented on June 4th 2020 Member

Can't think of any other solution. I can see how that would be very difficult to implement plus how it may be a performance issue. Personally, would maybe simply keep existing behaviour as it's not necessarily wrong but of course also not ideal.

Also I reckon this is maybe not something that has to be done in Matomo 4 maybe as it's not breaking API.

Note that any change here would also apply to regular reports I suppose and not just visits log. So it will also make these reports slower.

Any thoughts @mattab ?

Powered by GitHub Issue Mirror