Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise ecommerce life time metric query #18036

Closed
4 tasks
tsteur opened this issue Sep 21, 2021 · 7 comments · Fixed by #18097
Closed
4 tasks

Optimise ecommerce life time metric query #18036

tsteur opened this issue Sep 21, 2021 · 7 comments · Fixed by #18097
Assignees
Labels
c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Sep 21, 2021

I'm meaning this query https://github.com/matomo-org/matomo/blob/4.5.0-b2/plugins/Ecommerce/VisitorDetails.php#L154-L169 which looks like this

SELECT COALESCE(SUM(ROUND(revenue,2)), 0) as lifeTimeRevenue, COUNT(*) as lifeTimeConversions,COALESCE(SUM(ROUND(items,2)), 0) as lifeTimeEcommerceItems FROM log_visit AS log_visit LEFT JOIN log_conversion AS log_conversion ON log_visit.idvisit = log_conversion.idvisit WHERE log_visit.idsite = '2' AND log_visit.idvisitor = 'Vè�tO' AND log_conversion.idgoal = 0

At first glance the query looks good as it uses indexes etc.

This is usually also the case
image

However, things change when one visitor has hundreds or thousands of visits. Then it looks like this:

image

Here is more explain format=json information:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "365.16"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "log_conversion",
          "access_type": "ALL",
          "possible_keys": [
            "PRIMARY"
          ],
          "rows_examined_per_scan": 738,
          "rows_produced_per_join": 73,
          "filtered": "10.00",
          "cost_info": {
            "read_cost": "261.84",
            "eval_cost": "14.76",
            "prefix_cost": "276.60",
            "data_read_per_join": "2M"
          },
          "used_columns": [
            "idvisit",
            "idgoal",
            "items",
            "revenue"
          ],
          "attached_condition": "((`test`.`log_conversion`.`idgoal` = 0) and (`test`.`log_conversion`.`idvisit` is not null))"
        }
      },
      {
        "table": {
          "table_name": "log_visit",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY",
            "index_idsite_config_datetime",
            "index_idsite_datetime",
            "index_idsite_idvisitor"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "idvisit"
          ],
          "key_length": "8",
          "ref": [
            "test.log_conversion.idvisit"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 3,
          "filtered": "5.00",
          "cost_info": {
            "read_cost": "73.80",
            "eval_cost": "0.74",
            "prefix_cost": "365.16",
            "data_read_per_join": "113K"
          },
          "used_columns": [
            "idvisit",
            "idsite",
            "idvisitor"
          ],
          "attached_condition": "((`test`.`log_visit`.`idvisitor` = (@`id`)) and (`test`.`log_visit`.`idsite` = '2'))"
        }
      }
    ]
  }
}

FYI Overall, in that specific case, there is actually no entry that matches idsite = 2 and idgoal = 0 in log_conversion.

refs #16904

Not sure if we can improve this query, or maybe not execute it in the first place. Like if we are requesting the data for only one site, we could eg check if ecommerce is enabled for that site. In this specific example where we had this problem it would have fixed it. Maybe we can also find a way though to further improve this.

For more details, or if you want me to test queries please ping me.

To summarise the todo list

  • Improve query performance
  • Only execute query if ecommerce feature is enabled
  • We should also check if there are few other similar queries that query by where idvisitor = ? and join other tables
  • Group the two queries in
    $sql = $this->getSqlEcommerceConversionsLifeTimeMetricsForIdGoal(GoalManager::IDGOAL_ORDER);
    $ecommerceOrders = $this->getDb()->fetchRow($sql, array($idSite, @Common::hex2bin($idVisitor)));
    $sql = $this->getSqlEcommerceConversionsLifeTimeMetricsForIdGoal(GoalManager::IDGOAL_CART);
    $abandonedCarts = $this->getDb()->fetchRow($sql, array($idSite, @Common::hex2bin($idVisitor)));
    into one see Optimise ecommerce life time metric query #18036 (comment)
@tsteur tsteur added Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. c: Performance For when we could improve the performance / speed of Matomo. labels Sep 21, 2021
@tsteur tsteur added this to the 4.6.0 milestone Sep 21, 2021
@tsteur
Copy link
Member Author

tsteur commented Sep 21, 2021

what seems to have fixed this for me was

	SELECT /*+ JOIN_FIXED_ORDER() */ COALESCE(SUM(ROUND(revenue,2)), 0)  ...

or also using STRAIGHT_JOIN vs left join but not sure if there's any issue with that .

image

It changes the plan to

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4155.07"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "log_visit",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "index_idsite_config_datetime",
            "index_idsite_datetime",
            "index_idsite_idvisitor"
          ],
          "key": "index_idsite_idvisitor",
          "used_key_parts": [
            "idsite",
            "idvisitor"
          ],
          "key_length": "12",
          "ref": [
            "const",
            "const"
          ],
          "rows_examined_per_scan": 2798,
          "rows_produced_per_join": 2798,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "7.82",
            "eval_cost": "559.60",
            "prefix_cost": "567.42",
            "data_read_per_join": "84M"
          },
          "used_columns": [
            "idvisit",
            "idsite",
            "idvisitor"
          ],
          "attached_condition": "(`test`.`log_visit`.`idvisitor` = (@`id`))"
        }
      },
      {
        "table": {
          "table_name": "log_conversion",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "idvisit",
            "idgoal"
          ],
          "key_length": "12",
          "ref": [
            "test.log_visit.idvisit",
            "const"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 3948,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "2798.00",
            "eval_cost": "789.65",
            "prefix_cost": "4155.07",
            "data_read_per_join": "144M"
          },
          "used_columns": [
            "idvisit",
            "idgoal",
            "items",
            "revenue"
          ]
        }
      }
    ]
  }
}

@tsteur
Copy link
Member Author

tsteur commented Sep 21, 2021

image

Below some performance tests. Seems STRAIGHT_JOIN (last 5 entries) performed fastest vs the other ones maybe actually weren't faster.

@tsteur tsteur modified the milestones: 4.6.0, 4.7.0 Sep 21, 2021
@samjf
Copy link
Contributor

samjf commented Sep 21, 2021

@tsteur Two ideas came to mind here. I can't verify if either will work without experimentation, but just something else to consider.

  1. Potentially there could be a composite index on idsite and idvisitor (in the order that best reduces the set) for the log_visit table.
  2. Potentially, and this might not help the query as a singular, but here we see the query is run again with different options. I'm wondering if it could be sped up by using a prepared statement?

@tsteur
Copy link
Member Author

tsteur commented Sep 21, 2021

@samjf there is already such an index on the log_visit table, but because it matches so many rows (like 1000) it seems to prefer a different strategy and not use it.

Didn't realise we execute it twice 👍 It should be using a prepared statement already AFAIK although the idgoal could be a parameter too. Generally though it's interesting we could combine those two queries into one as it shouldn't make it slower like

SELECT log_conversion.idgoal, ... WHERE ... (log_conversion.idgoal = 0 or log_conversion.idgoal = -1) group by  log_conversion.idgoal

@samjf
Copy link
Contributor

samjf commented Oct 5, 2021

Something to be careful of when we test this:

SQL_NO_CACHE

The server does not use the query cache. It neither checks the query cache to see whether the result is already cached, nor does it cache the query result. (Due to a limitation in the parser, a space character must precede and follow the SQL_NO_CACHE keyword; a nonspace such as a newline causes the server to check the query cache to see whether the result is already cached.)

https://dev.mysql.com/doc/refman/5.6/en/query-cache-in-select.html

Interestingly "a space character must precede and follow the SQL_NO_CACHE keyword"

@tsteur
Copy link
Member Author

tsteur commented Oct 5, 2021

@samjf that's good to know! could you mention this on https://developer.matomo.org/guides/profiling-code#profiling-a-specific-query where we talk about SQL_NO_CACHE? There is an edit button on the bottom right which allows you to edit the page using markdown. You can edit it directly in the Github UI. Then you can create a PR from this change by setting a branch name
image

@samjf
Copy link
Contributor

samjf commented Oct 5, 2021

@tsteur Sure thing! That is a good idea. I'll put a notice there too.

@justinvelluppillai justinvelluppillai changed the title SQL query to calculate life time metric can be slow Optimise ecommerce life time metric query Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants