@tsteur opened this Issue on September 21st 2021 Member

I'm meaning this query https://github.com/matomo-org/matomo/blob/4.5.0-b2/plugins/Ecommerce/VisitorDetails.php#L154-L169 which looks like this

SELECT COALESCE(SUM(ROUND(revenue,2)), 0) as lifeTimeRevenue, COUNT(*) as lifeTimeConversions,COALESCE(SUM(ROUND(items,2)), 0) as lifeTimeEcommerceItems FROM log_visit AS log_visit LEFT JOIN log_conversion AS log_conversion ON log_visit.idvisit = log_conversion.idvisit WHERE log_visit.idsite = '2' AND log_visit.idvisitor = 'VètO' AND log_conversion.idgoal = 0

At first glance the query looks good as it uses indexes etc.

This is usually also the case
image

However, things change when one visitor has hundreds or thousands of visits. Then it looks like this:

image

Here is more explain format=json information:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "365.16"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "log_conversion",
          "access_type": "ALL",
          "possible_keys": [
            "PRIMARY"
          ],
          "rows_examined_per_scan": 738,
          "rows_produced_per_join": 73,
          "filtered": "10.00",
          "cost_info": {
            "read_cost": "261.84",
            "eval_cost": "14.76",
            "prefix_cost": "276.60",
            "data_read_per_join": "2M"
          },
          "used_columns": [
            "idvisit",
            "idgoal",
            "items",
            "revenue"
          ],
          "attached_condition": "((`test`.`log_conversion`.`idgoal` = 0) and (`test`.`log_conversion`.`idvisit` is not null))"
        }
      },
      {
        "table": {
          "table_name": "log_visit",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY",
            "index_idsite_config_datetime",
            "index_idsite_datetime",
            "index_idsite_idvisitor"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "idvisit"
          ],
          "key_length": "8",
          "ref": [
            "test.log_conversion.idvisit"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 3,
          "filtered": "5.00",
          "cost_info": {
            "read_cost": "73.80",
            "eval_cost": "0.74",
            "prefix_cost": "365.16",
            "data_read_per_join": "113K"
          },
          "used_columns": [
            "idvisit",
            "idsite",
            "idvisitor"
          ],
          "attached_condition": "((`test`.`log_visit`.`idvisitor` = (@`id`)) and (`test`.`log_visit`.`idsite` = '2'))"
        }
      }
    ]
  }
}

FYI Overall, in that specific case, there is actually no entry that matches idsite = 2 and idgoal = 0 in log_conversion.

refs https://github.com/matomo-org/matomo/issues/16904

Not sure if we can improve this query, or maybe not execute it in the first place. Like if we are requesting the data for only one site, we could eg check if ecommerce is enabled for that site. In this specific example where we had this problem it would have fixed it. Maybe we can also find a way though to further improve this.

For more details, or if you want me to test queries please ping me.

To summarise the todo list

@tsteur commented on September 21st 2021 Member

what seems to have fixed this for me was

    SELECT /*+ JOIN_FIXED_ORDER() */ COALESCE(SUM(ROUND(revenue,2)), 0)  ...

or also using STRAIGHT_JOIN vs left join but not sure if there's any issue with that .

image

It changes the plan to

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4155.07"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "log_visit",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "index_idsite_config_datetime",
            "index_idsite_datetime",
            "index_idsite_idvisitor"
          ],
          "key": "index_idsite_idvisitor",
          "used_key_parts": [
            "idsite",
            "idvisitor"
          ],
          "key_length": "12",
          "ref": [
            "const",
            "const"
          ],
          "rows_examined_per_scan": 2798,
          "rows_produced_per_join": 2798,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "7.82",
            "eval_cost": "559.60",
            "prefix_cost": "567.42",
            "data_read_per_join": "84M"
          },
          "used_columns": [
            "idvisit",
            "idsite",
            "idvisitor"
          ],
          "attached_condition": "(`test`.`log_visit`.`idvisitor` = (@`id`))"
        }
      },
      {
        "table": {
          "table_name": "log_conversion",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "idvisit",
            "idgoal"
          ],
          "key_length": "12",
          "ref": [
            "test.log_visit.idvisit",
            "const"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 3948,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "2798.00",
            "eval_cost": "789.65",
            "prefix_cost": "4155.07",
            "data_read_per_join": "144M"
          },
          "used_columns": [
            "idvisit",
            "idgoal",
            "items",
            "revenue"
          ]
        }
      }
    ]
  }
}
@tsteur commented on September 21st 2021 Member

image

Below some performance tests. Seems STRAIGHT_JOIN (last 5 entries) performed fastest vs the other ones maybe actually weren't faster.

@samjf commented on September 21st 2021 Contributor

@tsteur Two ideas came to mind here. I can't verify if either will work without experimentation, but just something else to consider.

  1. Potentially there could be a composite index on idsite and idvisitor (in the order that best reduces the set) for the log_visit table.
  2. Potentially, and this might not help the query as a singular, but here we see the query is run again with different options. I'm wondering if it could be sped up by using a prepared statement?
@tsteur commented on September 21st 2021 Member

@samjf there is already such an index on the log_visit table, but because it matches so many rows (like 1000) it seems to prefer a different strategy and not use it.

Didn't realise we execute it twice 👍 It should be using a prepared statement already AFAIK although the idgoal could be a parameter too. Generally though it's interesting we could combine those two queries into one as it shouldn't make it slower like

SELECT log_conversion.idgoal, ... WHERE ... (log_conversion.idgoal = 0 or log_conversion.idgoal = -1) group by  log_conversion.idgoal
@samjf commented on October 5th 2021 Contributor

Something to be careful of when we test this:

SQL_NO_CACHE

The server does not use the query cache. It neither checks the query cache to see whether the result is already cached, nor does it cache the query result. (Due to a limitation in the parser, a space character must precede and follow the SQL_NO_CACHE keyword; a nonspace such as a newline causes the server to check the query cache to see whether the result is already cached.)

https://dev.mysql.com/doc/refman/5.6/en/query-cache-in-select.html

Interestingly "a space character must precede and follow the SQL_NO_CACHE keyword"

@tsteur commented on October 5th 2021 Member

@samjf that's good to know! could you mention this on https://developer.matomo.org/guides/profiling-code#profiling-a-specific-query where we talk about SQL_NO_CACHE? There is an edit button on the bottom right which allows you to edit the page using markdown. You can edit it directly in the Github UI. Then you can create a PR from this change by setting a branch name
image

@samjf commented on October 5th 2021 Contributor

@tsteur Sure thing! That is a good idea. I'll put a notice there too.

This Issue was closed on October 11th 2021
Powered by GitHub Issue Mirror