@bx80 opened this Issue on August 19th 2022 Contributor

TiDB currently cannot push COUNT(DISTINCT x) queries to TiFlash so on large datasets these queries can perform slowly and use a lot of memory. Reworking COUNT(DISTINCT x) queries to use sub-queries instead solves this issue for TiDB, but sub-queries perform ~2.5x slower on MySQL, so a hybrid solution will be required.

In order to support TiDB as an alternative database we need to provide the option for certain queries to be generated using sub-queries instead of COUNT(DISTINCT x). This could be a beneficial optimization for other databases too so it would be good to implement in a generic manner.

Suggested approach

  • Identify performance sensitive archiving queries that use COUNT(DISTINCT x) (33 occurrences in non-test code, mostly in core/DataAccess/LogAggregator).
  • Add a new core/Db/AdaptorInterface property preferSubqueries and implement it on the standard Matomo MySQL adapter to return false. A future TiDB PDO adapter can return true.
  • For each performance sensitive query using COUNT(DISTINCT x), rework the query generation code to optionally replace COUNT(DISTINCT x) with an appropriate sub-query if the PDO adapter preferSubqueries option is set.
Powered by GitHub Issue Mirror