@sebastianpiskorski opened this Pull Request on February 2nd 2016 Contributor

It prevented display of visits to be ordered by most important dimension - Time when MetaSites plugin was enabled.

@tsteur commented on February 2nd 2016 Member

Can we wait with this one for 2.16.1 maybe? Not sure if we might introduce any performance regressions and should maybe beta test first.

FYI: We talked about idSite also here but it's not really related https://github.com/piwik/piwik/issues/9524

@sebastianpiskorski commented on February 2nd 2016 Contributor

I've checked this query with EXPLAIN EXTENDED and it shouldn't have any issue on performance. Looks like ORDER BY doesn't use the an index.

@tsteur commented on February 2nd 2016 Member

We should maybe run it with various database sizes. Did you check it with a big instance having many sites, many requests etc?

@tsteur commented on February 2nd 2016 Member

Might be still good as it's actually in the where clause.

@sebastianpiskorski commented on February 2nd 2016 Contributor

Yes, it is still in the WHERE clause. And I've only checked EXPLAIN, but I think this is fair enough as the algorithm execution doesn't change.

@mattab commented on February 2nd 2016 Member

think this is fair enough as the algorithm execution doesn't change.

the algorithms change very much based on the dataset, ie. mysql will choose to use an INDEX or not depending on metrics processed in real time (and pre-aggregated statistics stored in the table statistics eg. cardinality of keys) by the mysql optimiser...

@sebastianpiskorski commented on February 3rd 2016 Contributor

@mattab @tsteur We have tested queries on large database 17GB (41717299 rows total)

And query without idsite in GROUP BY is twice as fast for large datasets. But it shows only for really large data sets. We had to obtain data set of size 5374869 to see the difference. It was for month period.

Here are EXPLAIN results:
No idsite:

SELECT sub.​* FROM ( SELECT log_visit.*​ FROM piwik_log_visit AS log_visit WHERE log_visit.idsite in ('1') AND log_visit.visit_last_action_time >= '2015-11-01 00:00:00' AND log_visit.visit_last_action_time <= '2015-12-01 00:00:00' ORDER BY  visit_last_action_time DESC LIMIT 15, 100) AS sub GROUP BY sub.idvisit ORDER BY sub.visit_last_action_time DESC ;

EXPLAIN:

+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+---------------------------------+
| id   | select_type | table      | type | possible_keys                                                             | key                   | key_len | ref   | rows     | Extra                           |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+---------------------------------+
|    1 | PRIMARY     | <derived2> | ALL  | NULL                                                                      | NULL                  | NULL    | NULL  |      115 | Using temporary; Using filesort |
|    2 | DERIVED     | log_visit  | ref  | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_datetime | 4       | const | 17030982 | Using where                     |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+---------------------------------+
100 rows in set (4 min 26.13 sec)

With idsite:

SELECT sub.​* FROM ( SELECT log_visit.*​ FROM piwik_log_visit AS log_visit WHERE log_visit.idsite in ('1') AND log_visit.visit_last_action_time >= '2015-11-01 00:00:00' AND log_visit.visit_last_action_time <= '2015-12-01 00:00:00' ORDER BY idsite, visit_last_action_time DESC LIMIT 15, 100) AS sub GROUP BY sub.idvisit ORDER BY sub.visit_last_action_time DESC ;

EXPLAIN:

+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+----------------------------------------------------+
| id   | select_type | table      | type | possible_keys                                                             | key                   | key_len | ref   | rows     | Extra                                              |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+----------------------------------------------------+
|    1 | PRIMARY     | <derived2> | ALL  | NULL                                                                      | NULL                  | NULL    | NULL  |      115 | Using temporary; Using filesort                    |
|    2 | DERIVED     | log_visit  | ref  | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_datetime | 4       | const | 17030951 | Using index condition; Using where; Using filesort |
+------+-------------+------------+------+---------------------------------------------------------------------------+-----------------------+---------+-------+----------+----------------------------------------------------+
100 rows in set (8 min 11.95 sec)

So I guess it will have impact only on reaaaaally big clients and it will be a positive one :)

@tsteur commented on February 4th 2016 Member

17GB is not really a large database :) can you also try what happens when there are many sites given in the IN like it would be done by metasites?

@tsteur commented on February 4th 2016 Member

When you did performance tests, did you clear the MySQL caches between the tests?

@sebastianpiskorski commented on February 4th 2016 Contributor

Yes, the cache was cleared, beside test without idsite was running first.

@tsteur commented on February 4th 2016 Member

Idea in order to have it in 2.16.0 is following: When only one idSite is given, still leave the idSite in the order by clause. When multiple sites are given, remove the idSite. This way we make sure not to regress anything for most users. Do you know what I mean?

@tsteur commented on February 4th 2016 Member

Let me know if you can't do it now and I will issue the PR

@tsteur commented on February 4th 2016 Member

See https://github.com/piwik/piwik/pull/9694

We will try to merge this one in 3.0 or 2.16.X

@sebastianpiskorski commented on February 4th 2016 Contributor

17GB this was the size of log_visit table only. I have written it in the wrong way. I will deliver results on bigger database today.

@tsteur commented on August 30th 2016 Member

@mattab do you have any thoughts on this? The index for idsite, time should be used for the where clause. It would be better if it could use the index for the order by as well as it has to sort all rows for the limit to work. However, I'm pretty sure that the index was not used before as @sebastianpiskorski because MySQL does not use the index for "ORDER BY" queries when ASC and DESC is mixed see http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html . Current order by is ORDER BY idsite, visit_last_action_time DESC in most of the cases. However, if you sort the visitor log the other way around it becomes ORDER BY idsite, visit_last_action_time ASC and it will use the index. @sebastianpiskorski can you do a test when sorting the time with ASC?

If we do not merge this and keep the idsite in there, we could actually improve the performance of this query in most cases by using ORDER BY idsite $filterSortOrder , visit_last_action_time $filterSortOrder.

@tsteur commented on August 30th 2016 Member

If w do not merge this we should really create a follow up issue to improve the performance of that query as mentioned. If we merge this PR it needs a rebase

@mattab commented on September 23rd 2016 Member

Hi @sebastianpiskorski
Could you please re-create this PR against 3.x-dev and resolve the merge conflicts? it would be good to have and would be happy to merge it :+1: I'm closing for now as if you don't re-create it we won't work on it ourselves. Thanks!

@tsteur commented on September 23rd 2016 Member

@matt let's maybe create an issue to speed the query up https://github.com/piwik/piwik/pull/9684#issuecomment-243579422 Will create an issue for it next week

@mattab commented on September 26th 2016 Member
This Pull Request was closed on September 23rd 2016
Powered by GitHub Issue Mirror