Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of some archiving queries #18797

Merged
merged 1 commit into from Feb 21, 2022
Merged

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Feb 15, 2022

Description:

We had an issue where a custom reports query took longer than 45 minutes to execute every single time and then archiving gets aborted.

Noticed there wasn't crazy much data on the instance (log_visit 5.5M, log_link_visit_action 15M, log_action 300K). And the query usually without a segment runs very fast. With a segment applied in the query it runs fast as well. Once we join it with the temporary table then it is VERY slow and aborts after 45 minutes. After adding the primary key, then the query executed fast (in one minute).

I don't know why I didn't add the primary key initially. I assume I was thinking it shouldn't be needed as the index would contain same information as the table and when it's not needed then it's faster to not do it. However, seeing in that case now that it's clearly faster to have that index. I haven't tested any other query, but would assume some other archive queries would get faster as well for segments.

Queries I used to reproduce

-- drop temporary table logtmpsegment010ccc0a9e16636173df1de1bfc6a263;
create temporary table logtmpsegment010ccc0a9e16636173df1de1bfc6a263 (idvisit  BIGINT(10) UNSIGNED NOT NULL);
-- create temporary table logtmpsegment010ccc0a9e16636173df1de1bfc6a263 (idvisit  BIGINT(10) UNSIGNED NOT NULL, PRIMARY KEY (`idvisit`));

insert into logtmpsegment010ccc0a9e16636173df1de1bfc6a263 (idvisit) select idvisit from log_visit where  idsite=1 and visit_last_action_time >= '2022-02-06 23:00:00' and visit_last_action_time < '2022-02-07 23:00:00'; 


SELECT SQL_NO_CACHE        CASE      WHEN counter = 50001 THEN '__mtm_ranking_query_others__'      ELSE `CoreHome.VisitLastActionDayOfMonth`     END AS `CoreHome.VisitLastActionDayOfMonth`         , `sum_events_totalevents`    FROM (     SELECT     `CoreHome.VisitLastActionDayOfMonth`,         CASE     WHEN @counter = 50001 THEN 50001     ELSE @counter:=@counter+1    END    AS counter     , `sum_events_totalevents`    FROM     ( SELECT @counter:=0 ) initCounter,     ( SELECT    DAYOFMONTH(log_inner.visit_last_action_time) AS 'CoreHome.VisitLastActionDayOfMonth', sum(log_inner.visit_total_events) AS 'sum_events_totalevents'    FROM       (  SELECT     log_visit.visit_last_action_time,  log_visit.visit_total_events    FROM      logtmpsegment010ccc0a9e16636173df1de1bfc6a263 INNER JOIN log_visit AS log_visit ON log_visit.idvisit = logtmpsegment010ccc0a9e16636173df1de1bfc6a263.idvisit RIGHT JOIN log_link_visit_action AS log_link_visit_action  ON  logtmpsegment010ccc0a9e16636173df1de1bfc6a263.idvisit = log_link_visit_action.idvisit  LEFT JOIN log_action AS log_action_idaction_event_category ON log_link_visit_action.idaction_event_category = log_action_idaction_event_category.idaction    WHERE     (log_action_idaction_event_category.type = "10" and log_link_visit_action.idsite = 1)   GROUP BY     DAYOFMONTH(log_visit.visit_last_action_time), `log_visit`.`idvisit`    ORDER BY     NULL  ) AS log_inner    GROUP BY     DAYOFMONTH(log_inner.visit_last_action_time)    ORDER BY     sum_events_totalevents ) actualQuery    ) AS withCounter    GROUP BY counter

Explain when the primary key is there (query fast)

image

Explain when the primary key is not there (query slow)

notice the 56497021392 rows

image

Performance change for insert query

The insert into logtmpsegment010ccc0a9e16636173df1de1bfc6a263 had no big performance difference whether index was there or not for this data amount. It was inserting only 160K visits though. The performance difference was around 15%. Without index the query took usually around 1.1s vs with index around 1.3seconds.
This is probably why we didn't add it initially to save this 15% of time.

On a different table where 450K visits were inserted it slowed it weirdly always took 3.1 seconds no mater with index or without. On a result set of 2M visits (rows) it slowed it down from 9.3 seconds to 9.8 seconds.

I executed the same queries every time many times to get an average.

Generally, with segments the rows are often smaller since the segment filters out many visits so I don't think this is any real issue.

Review

@tsteur tsteur added the Needs Review PRs that need a code review label Feb 15, 2022
@tsteur tsteur added this to the 4.8.0 milestone Feb 15, 2022
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically that makes sense. Hard to say if that might cause any issues somewhere. But guess if we experience any other issue with that new index on cloud later, we can consider anytime to change or improve that again...

@tsteur
Copy link
Member Author

tsteur commented Feb 17, 2022

FYI I will keep monitoring how CPU changes based of this change and merge next week.

@tsteur
Copy link
Member Author

tsteur commented Feb 17, 2022

So far it looks like the CPU decreases by 50%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants