Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed order by idsite for Model #9684

Conversation

sebastianpiskorski
Copy link
Contributor

It prevented display of visits to be ordered by most important dimension - Time when MetaSites plugin was enabled.

It prevented display of visits to be ordered by most important dimension - Time when MetaSites plugin was enabled.
@tsteur
Copy link
Member

tsteur commented Feb 2, 2016

Can we wait with this one for 2.16.1 maybe? Not sure if we might introduce any performance regressions and should maybe beta test first.

FYI: We talked about idSite also here but it's not really related #9524

@sebastianpiskorski
Copy link
Contributor Author

I've checked this query with EXPLAIN EXTENDED and it shouldn't have any issue on performance. Looks like ORDER BY doesn't use the an index.

@tsteur
Copy link
Member

tsteur commented Feb 2, 2016

We should maybe run it with various database sizes. Did you check it with a big instance having many sites, many requests etc?

@tsteur
Copy link
Member

tsteur commented Feb 2, 2016

Might be still good as it's actually in the where clause.

@sebastianpiskorski
Copy link
Contributor Author

Yes, it is still in the WHERE clause. And I've only checked EXPLAIN, but I think this is fair enough as the algorithm execution doesn't change.

@mattab
Copy link
Member

mattab commented Feb 2, 2016

think this is fair enough as the algorithm execution doesn't change.

the algorithms change very much based on the dataset, ie. mysql will choose to use an INDEX or not depending on metrics processed in real time (and pre-aggregated statistics stored in the table statistics eg. cardinality of keys) by the mysql optimiser...

@sebastianpiskorski
Copy link
Contributor Author

@mattab @tsteur We have tested queries on large database 17GB (41717299 rows total)

And query without idsite in GROUP BY is twice as fast for large datasets. But it shows only for really large data sets. We had to obtain data set of size 5374869 to see the difference. It was for month period.

Here are EXPLAIN results:
No idsite:

SELECT sub.​* FROM ( SELECT log_visit.*​ FROM piwik_log_visit AS log_visit WHERE log_visit.idsite in ('1') AND log_visit.visit_last_action_time >= '2015-11-01 00:00:00' AND log_visit.visit_last_action_time <= '2015-12-01 00:00:00' ORDER BY  visit_last_action_time DESC LIMIT 15, 100) AS sub GROUP BY sub.idvisit ORDER BY sub.visit_last_action_time DESC ;

EXPLAIN:

+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+---------------------------------+
| id   | select_type | table      | type | possible_keys                                                             | key                   | key_len | ref   | rows     | Extra                           |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+---------------------------------+
|    1 | PRIMARY     | <derived2> | ALL  | NULL                                                                      | NULL                  | NULL    | NULL  |      115 | Using temporary; Using filesort |
|    2 | DERIVED     | log_visit  | ref  | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_datetime | 4       | const | 17030982 | Using where                     |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+---------------------------------+
100 rows in set (4 min 26.13 sec)

With idsite:

SELECT sub.​* FROM ( SELECT log_visit.*​ FROM piwik_log_visit AS log_visit WHERE log_visit.idsite in ('1') AND log_visit.visit_last_action_time >= '2015-11-01 00:00:00' AND log_visit.visit_last_action_time <= '2015-12-01 00:00:00' ORDER BY idsite, visit_last_action_time DESC LIMIT 15, 100) AS sub GROUP BY sub.idvisit ORDER BY sub.visit_last_action_time DESC ;

EXPLAIN:

+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+----------------------------------------------------+
| id   | select_type | table      | type | possible_keys                                                             | key                   | key_len | ref   | rows     | Extra                                              |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+----------------------------------------------------+
|    1 | PRIMARY     | <derived2> | ALL  | NULL                                                                      | NULL                  | NULL    | NULL  |      115 | Using temporary; Using filesort                    |
|    2 | DERIVED     | log_visit  | ref  | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_datetime | 4       | const | 17030951 | Using index condition; Using where; Using filesort |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+----------------------------------------------------+
100 rows in set (8 min 11.95 sec)

So I guess it will have impact only on reaaaaally big clients and it will be a positive one :)

@tsteur
Copy link
Member

tsteur commented Feb 4, 2016

17GB is not really a large database :) can you also try what happens when there are many sites given in the IN like it would be done by metasites?

@tsteur
Copy link
Member

tsteur commented Feb 4, 2016

When you did performance tests, did you clear the MySQL caches between the tests?

@sebastianpiskorski
Copy link
Contributor Author

Yes, the cache was cleared, beside test without idsite was running first.

@tsteur
Copy link
Member

tsteur commented Feb 4, 2016

Idea in order to have it in 2.16.0 is following: When only one idSite is given, still leave the idSite in the order by clause. When multiple sites are given, remove the idSite. This way we make sure not to regress anything for most users. Do you know what I mean?

@tsteur
Copy link
Member

tsteur commented Feb 4, 2016

Let me know if you can't do it now and I will issue the PR

@tsteur
Copy link
Member

tsteur commented Feb 4, 2016

See #9694

We will try to merge this one in 3.0 or 2.16.X

@sebastianpiskorski
Copy link
Contributor Author

17GB this was the size of log_visit table only. I have written it in the wrong way. I will deliver results on bigger database today.

@mattab mattab changed the title PPCDEV-379 removed order by idsite for Model removed order by idsite for Model Jun 8, 2016
@tsteur
Copy link
Member

tsteur commented Aug 30, 2016

@mattab do you have any thoughts on this? The index for idsite, time should be used for the where clause. It would be better if it could use the index for the order by as well as it has to sort all rows for the limit to work. However, I'm pretty sure that the index was not used before as @sebastianpiskorski because MySQL does not use the index for "ORDER BY" queries when ASC and DESC is mixed see http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html . Current order by is ORDER BY idsite, visit_last_action_time DESC in most of the cases. However, if you sort the visitor log the other way around it becomes ORDER BY idsite, visit_last_action_time ASC and it will use the index. @sebastianpiskorski can you do a test when sorting the time with ASC?

If we do not merge this and keep the idsite in there, we could actually improve the performance of this query in most cases by using ORDER BY idsite $filterSortOrder , visit_last_action_time $filterSortOrder.

@tsteur
Copy link
Member

tsteur commented Aug 30, 2016

If w do not merge this we should really create a follow up issue to improve the performance of that query as mentioned. If we merge this PR it needs a rebase

@tsteur tsteur changed the base branch from master to 2.x-dev September 1, 2016 06:39
@mattab mattab changed the base branch from 2.x-dev to 3.x-dev September 23, 2016 04:06
@mattab
Copy link
Member

mattab commented Sep 23, 2016

Hi @sebastianpiskorski
Could you please re-create this PR against 3.x-dev and resolve the merge conflicts? it would be good to have and would be happy to merge it 👍 I'm closing for now as if you don't re-create it we won't work on it ourselves. Thanks!

@mattab mattab closed this Sep 23, 2016
@tsteur
Copy link
Member

tsteur commented Sep 23, 2016

@matt let's maybe create an issue to speed the query up #9684 (comment) Will create an issue for it next week

@mattab
Copy link
Member

mattab commented Sep 26, 2016

@tsteur created #10567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants