Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong data segmentation w/ segment w/ multiple action dimension conditions #17388

Open
heurteph-ei opened this issue Mar 24, 2021 · 21 comments
Open
Labels
Bug For errors / faults / flaws / inconsistencies etc.

Comments

@heurteph-ei
Copy link

heurteph-ei commented Mar 24, 2021

Expected Behavior

When I use a segmentation on a value for an action dimension, I should have only results on this value on the action dimension.

Current Behavior

Segmentation configuration

In the example below, banque is an action dimension (Note: Est means is).
image

Segmentation result

image

Segmented log

This is confirmed when I click on the segmented log icon: No result as expectd when segmented on banque equals E and M
image

Note: The segmented log on banque equals E work perfectly
image

Steps to Reproduce (for Bugs)

Custom dimensions configuration

image

Matomo Tag Manager with the configuration

image

Context

Your Environment

Browser

Browser: Firefox 78.8.0esr (64 bits) fr-fr but also Chrome Version 89.0.4389.72 (Build officiel) (64 bits)
Operating System: Windows 8.1 Entreprise

Mandatory checks

Version PHP >= 7.2.5: ✔ 7.3.20
PDO Extension: ✔
PDO\MYSQL Extension: ✔
MYSQLI Extension: ✔
Extensions requises: ✔ zlib ✔ SPL ✔ iconv ✔ json ✔ mbstring ✔ Reflection
Fonctions requises: ✔ debug_backtrace ✔ eval ✔ hash ✔ gzcompress ✔ gzuncompress ✔ pack
Configuration PHP requise (php.ini): ✔ session.auto_start = 0 ✔ max_execution_time = 0 OR >= 30
Répertoires avec les droits d'accès en écriture: ✔ $DOC_ROOT/tmp ✔ $DOC_ROOT/tmp/assets ✔ $DOC_ROOT/tmp/cache ✔ $DOC_ROOT/tmp/climulti ✔ $DOC_ROOT/tmp/latest ✔ $DOC_ROOT/tmp/logs ✔ $DOC_ROOT/tmp/sessions ✔ $DOC_ROOT/tmp/tcpdf ✔ $DOC_ROOT/tmp/templates_c
Répertoires avec accès en écriture pour Gestionnaire de balises: ✔ $DOC_ROOT/js

Optional checks

Intégrité du fichier: ✔
Statut du traqueur: ✔
Limite mémoire: ✔ 128M
Fuseau horaire: ✔
Ouvrez l'URL: ✔ curl
PageSpeed désactivé: ✔
GD > 2.x + Freetype (graphiques): ✔
Autres extensions: ✔ json ✔ libxml ✔ dom ✔ SimpleXML
Autres fonctions: ✔ shell_exec ✔ set_time_limit ✔ mail ✔ parse_ini_file ✔ glob ✔ gzopen ✔ md5_file
Système de fichiers: ✔
Setup Cron (faster report loading): ⚠ Warning: Pour des performances optimales et un Matomo rapide, il est vivement recommandé de mettre en place une entré CRON pour archiver automatiquement vos rapports, et de désactiver le déclanchement depuis l'interface depuis les paramètres Matomo.
Setup Cron - Gestion des processus via la ligne de commande: ✔ Ok
Capacités de la base de données: ✔ UTF8mb4 charset ⚠ Warning: LOAD DATA INFILE
L'utilisation de LOAD DATA INFILE va grandement accélérer la vitesse du processus d'archivage de Matomo. Pour le rendre disponible à Matomo, essayez de mettre à jour PHP & MySQL et assurez-vous que votre utilisateur MySQL a les privilèges de FILE.
Si votre serveur Matomo suit des sites à fort trafic (ex. > 100000 pages par mois), nous recommandons d'essayer de corriger ce problème.
Erreur: LOAD DATA INFILE failed... Error was:
Try 1: LOAD DATA INFILE : SQLSTATE[28000]: Invalid authorization specification: 1045 Access denied for user (using password: YES)[28000],
Try 2: LOAD DATA LOCAL INFILE : SQLSTATE[42000]: Syntax error or access violation: 1148 The used command is not allowed with this MySQL version[42000]
✔ CREATE TEMPORARY TABLES ✔ Changing transaction isolation level
Taille maximale des paquets: ⚠ Warning: Il est important de configurer une taille 'max_allowed_packet' dans votre base de données MySQL d'au moins 64MB. 32MB configuré en ce moment.
Connexion SSL forcée: ⚠ Warning: Nous recommandons d'utiliser Matomo uniquement avec des connexions sécurisées SSL. Pour empêcher l'accès non sécurisée par HTTP, ajoutez force_ssl = 1 à la section General dans votre fichier conf/config.ini.php de Matomo.
Géolocalisation: ⚠ Warning: La géolocalisation fonctionne, mais vous n'utilisez pas un des fournisseurs recommandés. Si vous devez importer des fichiers de journaux ou bien effectuer quoi que ce soit d'autre qui requiert la définition des adresses IP, utilisez l'implémentation PHP GeoIP l'extension maxminddb.
Mettre à jour via HTTPS: ✔
Traceur JavaScript inscriptible ("/matomo.js"): ✔
Heatmap & Session Recording Tracking: ⚠ Warning: Requesting '$MATOMO_URLplugins/HeatmapSessionRecording/configs.php?idsite=1&trackerid=5lX6EM&url=http%3A%2F%2Ftest.test%2F' resulted in an SSL error. Maybe you are using a self signed certificate? Please open the URL manually in a browser to see if the response contains 'Piwik.HeatmapSessionRecording'. If not, you might need to modify your server configuration as this file needs to be accessible via a browser from the Internet or Intranet.

Informational results

Matomo Version: 4.0.5
Matomo Update History: 4.0.5,
Matomo Install Version: 4.0.5
PHP_OS: Linux
PHP SAPI: apache2handler
Timezone Version: 0.system
PHP Timezone: UTC
PHP Time: 1614954151
PHP Datetime: 2021-03-05 14:22:31
PHP INI max_execution_time: 30
PHP INI post_max_size: 8M
PHP INI max_input_vars: 1000
PHP INI zlib.output_compression:
Curl Version: 7.29.0, NSS/3.44
Suhosin Installed: 0
DB Prefix: matomo_
DB Charset: utf8mb4
DB Adapter: PDO\MYSQL
MySQL Version: 5.7.33-enterprise-commercial-advanced-log
Num Tables: 69
Browser Segment Archiving Enabled: 1
Development Mode Enabled: 0
Internet Enabled: 1
Multi Server Environment: 0
Custom User Path: 0
Custom Include Path: 0
Plugins Activated: API, AbTesting 4.0.6, Actions, ActivityLog 4.0.2, Annotations, BulkTracking, Cohorts 4.0.2, Contents, CoreAdminHome, CoreConsole, CoreHome, CorePluginsAdmin, CoreUpdater, CoreVisualizations, CustomDimensions, CustomJsTracker, CustomReports 4.0.4, Dashboard, DevicePlugins, DevicesDetection, Diagnostics, Events, Feedback, FormAnalytics 4.0.2, Funnels 4.0.3, GeoIp2, Goals, Heartbeat, HeatmapSessionRecording 4.0.6, ImageGraph, Insights, Installation, Intl, IntranetMeasurable, LanguagesManager, Live, LoginLdap 4.3.0, MobileMessaging, Monolog, Morpheus, MultiSites, Overlay, PagePerformance, PrivacyManager, ProfessionalServices, Proxy, Referrers, Resolution, RollUpReporting 4.0.1, RssWidget, SEO, ScheduledReports, SearchEngineKeywordsPerformance 4.1.1, SegmentEditor, SitesManager, TagManager, Tour, Transitions, TwoFactorAuth, UserCountry, UserCountryMap, UserId, UserLanguage, UsersFlow 4.0.2, UsersManager, VisitFrequency, VisitTime, VisitorInterest, VisitsSummary, WebsiteMeasurable, Widgetize
Plugins Deactivated: CustomVariables, DBStats, Ecommerce, Login, LoginSaml 4.0.3, Marketplace (yes we had to deactivate it!), MediaAnalytics 4.0.2, MobileAppMeasurable, MultiChannelConversionAttribution 4.0.2, Provider, WhiteLabel 4.0.1, WooCommerceAnalytics 4.0.3
Plugins Invalid:
Server Info: Apache
Had visits in last 1 day: 1
Had visits in last 3 days: 1
Had visits in last 5 days: 1
Archive Time Last Started: 0
Archive Time Last Finished: 0
Num invalidations: 80 queued, 0 in progress
User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0
Browser Language: fr,fr-fr,en-us,en
Anonymize Referrer:
Do Not Track enabled: 0

@heurteph-ei heurteph-ei added the Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. label Mar 24, 2021
@heurteph-ei
Copy link
Author

Maybe related to #14664 ?

@diosmosis
Copy link
Member

Hi @heurteph-ei, segments that work on actions (including custom action dimensions), don't select only the actions that match the segment, but visits that have at least one action that matches the segment. Can you check whether the visits in the visitor log in this segment all have the action? If there are visits that don't have an action that matches, then there would be an issue.

@heurteph-ei
Copy link
Author

heurteph-ei commented Mar 25, 2021

Yes, I understand this point of view, BUT in this case, the segmented visit log:
image
should show this fact. Instead of, I got no result:
image

Note: you are right, the dimension value changed duting the single visit and took all values that are displayed in the widget, and no other values

@diosmosis
Copy link
Member

Hi, @heurteph-ei can you provide the exact steps you got to display the segmented visitor log like that? Are you able to reproduce this on https://demo.matomo.cloud ? If not, can you check if the same issue occurs on the latest Matomo version?

@heurteph-ei
Copy link
Author

Hi @diosmosis
Unfortunately, I can't test on a local latest version of Matomo. Also, I cannot add new segment to matomo.cloud, maybe you could do it for me?
For example, please add a segment on Page Author == 'Git-17388', then I'll be able to do a tracking on a page to reproduce the use case.

@heurteph-ei
Copy link
Author

Also, I tried several times to be tracked on divezone.net in order to appear on https://demo.matomo.cloud, but it never works. I even tried on my own computer, and on several browsers (Chrome, Edge, Firefox).
Note: on Firefox, I cannot disable the HTTP parameter DNT:1

@diosmosis
Copy link
Member

Hi @heurteph-ei, demo.matomo.cloud is a special instance that uses test data to showcase matomo features. It's not directly connected to divezone.net. I can add a segment for the page author today. Would you be able to post the tracking request(s) you'd use here in the meantime? It might help in reproducing locally.

@heurteph-ei
Copy link
Author

@diosmosis okay, I have pushed some tracking on matomo.cloud...
But the Page Author behaviour seems to encounter some problems... I'll wait untill tomorrow in order to check it the server needs time to update its data...

@heurteph-ei
Copy link
Author

OK, my data are well stored:
image
@diosmosis can you add the segment Page Author == 17388?

Anyway, on this sample, I enconter some strange behaviour:

Prerequisite

Behaviour > Page Author
Date: 31st of March, 2021 (day)
Segment: All visits

Step 1

Add segment comparison: Returning users
OK, the data seem good (as my tests were not done as returning users)
image

Step 2

Open the Segmented visits log of Page Author D2 for Segment Returning users. I expect getting no result, but this is not the case:
image
Also note that all these logs are not related to my "visit" at all, as I was just "tracked" via the HTTP API.

@diosmosis
Copy link
Member

@heurteph-ei
Copy link
Author

No worries :-)

Visits log

The Visitors > Visits Log works as expected on the new segment: The log displays the whole session in which the Page Author == Git-17388.

  • As soon as the user hits Page Author == Git-17388 during his session : I can see all events of the session.
  • If a user does not hit Page Author == Git-17388 during his session : The session is not visible at all.

Page Author behaviour

Now I have to wait a few time for the new segment to be processed by the system for Page Author ;-)
I won't be available until April, the 6th. So I'll continue my test on Tuesday :-)

Enjoy your weekend!

@heurteph-ei
Copy link
Author

Hello, I'm back
So the behaviour I encounter on premises the same as in the cloud.

Please have a look at: https://demo.matomo.cloud/index.php?module=CoreHome&action=index&idSite=1&period=day&date=yesterday#?idSite=1&period=day&date=2021-03-31&segment=dimension2%3D%3DGit-17388&category=General_Actions&subcategory=customdimension2

image

The table displays all the author values I encountered during the visit in which I saw the famous Page Author Git-17388. I don't know if it as an expected behaviour of Matomo, or if we expect to have only one single row: Git-17388.

But as the table displays 3 actions for Page Author Julien, when I click on the segmented visit log of Julien row, I should get some data instead of nothing:
image

@diosmosis diosmosis added Bug For errors / faults / flaws / inconsistencies etc. and removed Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. labels Apr 11, 2021
@diosmosis diosmosis changed the title Wrong data segmentation on action dimension Wrong data segmentation w/ segment w/ multiple action dimension conditions Apr 11, 2021
@diosmosis
Copy link
Member

Thanks @heurteph-ei for going through all the trouble of reproducing it on demo.matomo.cloud! It's certainly a bug. Seems to have to do with segments like dimension2==Git-17388;dimension2==another-author. I think it should be interpreted as "select visits that contain an action w/ dimension2 == Git-17388 AND an action w/ dimension2 == another-author", but it's clearly not doing anything.

@sgotre
Copy link

sgotre commented Nov 28, 2021

Dear @diosmosis I invested several hours on this bug to find a solution. First of all I can confirm it is a bug.

It was introduced with 48c9f2c which was despite this bug an awesome work by @tsteur 👍 🥇

The LogAggregator is using a temporary table to speed up calculation. Whilst this is super efficient it does not respect action based dimensions.

This is what the LogAggregator does:

  • retrieve all visit ids where there was at least one action matching the segment containing the action based dimension
  • Filter to only these visits when querying the results from the log_link_visit_action table.
  • The result contains not only actions matching the customDimension but instead contains ALL actions of these visits

Suggestion

While the use of a temporary table is very sufficient for visit based segments it might not be as sufficient for segments containing action based segments.
Filtering the log_link_visit_action table with custom_dimension_x = ? might reduce the resultset in the same way as log_link_visit_action.idvisit in tempTable.idvisit (with a join) does currenlty.
Do you agree with that @tsteur ?

Therefore I would suggest to not use a temporary table when the segment contains an action based dimension.

Proposed Code Change

Currently

public function generateQuery($select, $from, $where, $groupBy, $orderBy, $limit = 0, $offset = 0)
{
$segment = $this->segment;
$bind = $this->getGeneralQueryBindParams();
if (!$this->segment->isEmpty() && $this->isSegmentCacheEnabled()) {

Suggested Change:

    private function segmentContainsActionBasedDimensions()
    {
        if (strpos($this->segment->getString(), 'dimension') < 0) {
            // containing no dimension at all
            return false;
        }

        $segmentExpressionSql = $this->segment->getSegmentExpression()->getSql();
        // the sql will contain the log_link_visit_action table name if the dimension is action based
        return strpos($segmentExpressionSql['where'], 'log_link_visit_action.') >= 0;
    }

    public function generateQuery($select, $from, $where, $groupBy, $orderBy, $limit = 0, $offset = 0)
    {
        $segment = $this->segment;
        $bind = $this->getGeneralQueryBindParams();

        // if segment is based on a custom dimension which is action based do not use TMP table
        if (!$this->segment->isEmpty() && $this->isSegmentCacheEnabled() && !$this->segmentContainsActionBasedDimensions()) {

If anybody uses this code on her website: make sure to invalidate all reports, since the results get archived/cached.

I am willing to provide a PR, but was struggling to create a proper Unit Test.
My company would pay someone to get a fix into Matomo Core, so please leave a comment if you are interested

Thank you for your time reading this comment 😄 ❤️

@tsteur
Copy link
Member

tsteur commented Nov 29, 2021

@sgotre if I understand it right you are maybe after this feature: #15438

from my perspective it works correct, also for actions just like segmentations should. The behaviour you are maybe after are "Filters": #15438

Existing workarounds for previous behaviour would be to set [General] enable_segments_cache = 0 in the config file or to use our Custom Reports feature.

Let me know if that's not what you are after regarding the Filters feature.

@sgotre
Copy link

sgotre commented Nov 29, 2021

@tsteur thanks for your super fast reply. You are right, we all have been trying to use segments as filters.

Disabling the segment cache has a huge performance impact, right? With about 100 Mio hits per month this might not be the best solution.

We will use custom reports instead.

For all wondering why @tsteur is right: Segments are by definition visit based. Filters are by definition action based.

Segment – For a segment, every visit is checked to see if it satisfies the conditions of the segment. For sessions that satisfy the condition, all rows are returned. For sessions that do not, no rows are returned.

Filter – For a filter all the rows for ALL visits are considered, then only the rows that satisfy the conditions of the filter will be returned.

Source

I think you can close this issue in favor of #15438
@tsteur this bug is not a duplicate, as it originally reported another bug as i was assuming. See this comment for root cause description of this reported UI issue

@tsteur tsteur added duplicate For issues that already existed in our issue tracker and were reported previously. Bug For errors / faults / flaws / inconsistencies etc. and removed Bug For errors / faults / flaws / inconsistencies etc. duplicate For issues that already existed in our issue tracker and were reported previously. labels Nov 29, 2021
@heurteph-ei
Copy link
Author

@sgotre I think this cannot be closed as the segment on 2 action dimensions doesn't work (I am not in the configuration of a filter)

@sgotre
Copy link

sgotre commented Nov 29, 2021

@heurteph-ei you can set the global config [General] enable_segments_cache = 0 to make segments work like filters. The behaviour we are seeing here ist epected by design: **Segments are by definition visit based. **

@heurteph-ei
Copy link
Author

@sgotre I don't want to use filter but segment.
See #17388 (comment)
If you segment 2 values on an action dimension, the segment doesn't work.
In the example, I wanted all the visits where users set dimension2 to Git-17388 during a part of its visit, and set dimension2 to Julien during another part of its visit...

@sgotre
Copy link

sgotre commented Nov 29, 2021

@heurteph-ei You are right, segmentation works as you expected but its not possible to show the segmented visit logs. seems like the query here is using segments as filters. I'll check the sql to see where this error occurs.

@sgotre
Copy link

sgotre commented Nov 29, 2021

@heurteph-ei this is NOT a SQL issue but an UI issue.

When requesting the segmented visit logs the URL is

https://demo.matomo.cloud/index.php?segment=dimension2==Git-17388;dimension2==D2;actionUrl=$divezone.net%2Fheurteph%2Ftest&date=2021-03-31&module=Live&action=indexVisitorLog&disableLink=1&small=1&enableAddNewSegment=1&period=day&idDimension=2&idSubtable=4&idSite=1

Please notice segment=dimension2==Git-17388;dimension2==D2;
=> resulting query is: "give me all visits having actions where action.dimension2 is Git-17388 AND action.dimension2 is D2'
=> result is of course empty

Query Executed:

SELECT log_visit.*
FROM matomo_log_visit AS log_visit
         LEFT JOIN matomo_log_link_visit_action AS log_link_visit_action
                   ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE (log_visit.idsite in (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?)
  AND 
    (log_link_visit_action.custom_dimension_3 = ? 
      AND log_link_visit_action.custom_dimension_3 = ?  /** <---- this is not intended*/
    AND
       ((log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 1))) OR
        (log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 3))) OR
        (log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 2))) OR
        (log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 10)))))
GROUP BY log_visit.idvisit
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12

This would be correct if segment where filters.

Correct URL

Instead the URL should contain this segment segment=dimension2==Git-17388;dimension2==Git-17388,dimension2==D2

=> resulting query: "give me all visits that had action.dimension2=Git-17388 AND (dimension2=Git 17388 OR dimension2==D2)

Query Executed:

SELECT log_visit.*
FROM matomo_log_visit AS log_visit
         LEFT JOIN matomo_log_link_visit_action AS log_link_visit_action
                   ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE (log_visit.idsite in (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ?)
  AND (log_link_visit_action.custom_dimension_3 = ? AND
       (log_link_visit_action.custom_dimension_3 = ? OR log_link_visit_action.custom_dimension_3 = ?) /** <-- thats it */
    AND
       ((log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 1))) OR
        (log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 3))) OR
        (log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 2))) OR
        (log_link_visit_action.idaction_url IN
         (SELECT idaction FROM matomo_log_action WHERE (name LIKE CONCAT('%', ?) AND type = 10)))))
GROUP BY log_visit.idvisit
ORDER BY log_visit.idsite DESC, log_visit.visit_last_action_time DESC
LIMIT 0, 12;

When the global option enable_segments_cache is 1 the URL is correct (but it will never be called in this way since such a result cannot appear in a report)
If the segment cache is enabled the generated segment ist wrong and the segment part of this URL should be changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

5 participants