Based on the discussions in #766, this ticket is the place for code that limits the number of results returned from MySQL.
The idea is to create a class that wraps an arbitrary SQL query with more SQL that limits the number of results while grouping the rest to "Others" and allows for some more fancy things that can be configured via an instance of the ranking query class.
(In [6803]) refs #3330 ranking query
(In [6804]) refs #3330 properties for RankingQuery.php
(In [6805]) refs #3330 PHPUnit tests for RankingQuery
(In [6812]) refs #3330 Transitions
The previous commit belongs to #3332. Too many commits for one day... ;-)
(In [6966]) Refs #3330, #766 add integration test for blob limiting behavior present in certain plugins. Also, fix three bugs:
Timo, amazing work on this feature!! :-)
This will make a huge difference in piwik performance and memory usage. Congrats...
Benaka, thanks for adding the integration tests & the recursive fix. Looking forward to seeing the RankingQuery code used in all plugins and archiving calls! :-)
Great commits Timo! - I'll test it within a lower memory environment in the next weeks. Judging from the source code, this should be a huge step for the memory consumption problems!
I'll post the results of the testing in this ticket.
@Peter, wait for the next RC which Benaka is working on to add RankingQuery to all archiving. However it will be limited to 50,000 so unless your DB has more than 50k unique URLs you won't see much improvement...
(In [6977]) Refs #3330, fix small bug in BlobReportLimitingTest.
(In [6981]) Refs #3330, add missing test results.
(In [6986]) Refs #3330, make sure ranking query limit is >= datatable_archiving_maximum_rows*_actions options & refactor a little.
(In [7065]) Refs #3330 Refactoring of the Actions.php and moving code to: Archiving and ArchivingHelper
@capedfuzz, can you please list, what you think is the remaining work in this ticket, in order to apply the code to all other reports where it makes sense? Or do we only apply RQ to Actions report, for now, which seems acceptable too.
Should we however create a ticket to port "Update in place" to other plugins archiving as this would save memory?
Replying to matt:
@capedfuzz, can you please list, what you think is the remaining work in this ticket, in order to apply the code to all other reports where it makes sense? Or do we only apply RQ to Actions report, for now, which seems acceptable too.
It can be applied fairly easily to custom variables, maybe not referers, however I'm not sure it's really necessary, since archiving is pretty fast w/o it. Also, since truncating in place can result in large speedups, that should be done first. I think if RQ needs to be added, it can wait.
Should we however create a ticket to port "Update in place" to other plugins archiving as this would save memory?
Another ticket sounds like a good idea. It would take a lot of refactoring to do it in some places. Also, for CustomVariables + Referers, the way they perform queries will have to change (which means testing for performance degradation).
It would work for referrers. If you want, take a look at the queryExternalReferrers-method in Transitions.
Decreasing priority for now.
@capedfuzz would you mind creating a ticket for applying Update-in-place to all reports in 1.9.x roadmap (normal priority)?
Setting
'archiving_ranking_query_row_limit' to 0 (no limit) causes archiving process not to process Actions.
The following errors shows up during archiving:
Notice:</strong> <i>Undefined index: type</i> in <b>public_html/piwik/plugins/Actions/ArchivingHelper.php</b> on line <b>49</b>
<br /><br />Backtrace --><div style="font-family:Courier;font-size:10pt"><br />
<a href='/0'>#0</a> Piwik_ErrorHandler(...) called at [public_html/piwik/plugins/Actions/ArchivingHelper.php:49]<br />
<a href='/1'>#1</a> Piwik_Actions_ArchivingHelper::updateActionsTableWithRowQuery(...) called at [public_html/piwik/plugins/Actions/Archiving.php:478]<br />
<a href='/2'>#2</a> Piwik_Actions_Archiving->archiveDayQueryProcess(...) called at [public_html/piwik/plugins/Actions/Archiving.php:282]<br />
<a href='/3'>#3</a> Piwik_Actions_Archiving->archiveDayActionsTime(...) called at [public_html/piwik/plugins/Actions/Archiving.php:100]<br />
<a href='/4'>#4</a> Piwik_Actions_Archiving->archiveDay(...) called at [public_html/piwik/plugins/Actions/Actions.php:592
...
var_dump'ing $row at plugins/Actions/ArchivingHelper.php:36, shows:
array(6) {
["idaction"]=>
string(7) "1792657"
[17]=>
string(1) "1"
[19]=>
string(1) "1"
[20]=>
string(1) "3"
[21]=>
string(3) "279"
[22]=>
string(1) "0"
}
so we have a problem with translation of keys in the row.
Thanks for the report, I created a ticket at: #3482
This feature is done, we may improve it later or reuse RankingQuery in other archivers.