Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Ranking Query" to decrease archiving memory consumption and transfered data between MySQL and PHP #3330

Closed
timo-bes opened this issue Aug 17, 2012 · 24 comments
Assignees
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@timo-bes
Copy link
Member

Based on the discussions in #766, this ticket is the place for code that limits the number of results returned from MySQL.

The idea is to create a class that wraps an arbitrary SQL query with more SQL that limits the number of results while grouping the rest to "Others" and allows for some more fancy things that can be configured via an instance of the ranking query class.

@timo-bes
Copy link
Member Author

(In [6803]) refs #3330 ranking query

  • Piwik_RankingQuery encapsulates the logic to apply the limit + grouping of others to an arbitrary select query
  • queryActionsByDimension() and queryVisitsByDimension() in Piwik_ArchiveProcessing_Day get new parameters to use the ranking query

@timo-bes
Copy link
Member Author

(In [6804]) refs #3330 properties for RankingQuery.php

@timo-bes
Copy link
Member Author

(In [6805]) refs #3330 PHPUnit tests for RankingQuery

@timo-bes
Copy link
Member Author

(In [6812]) refs #3330 Transitions

  • Archiving code that doesn't hook into the actual archiving
  • API code that fakes an archive processing instance and calls archiving
  • PHPUnit test

@timo-bes
Copy link
Member Author

The previous commit belongs to #3332. Too many commits for one day... ;-)

@diosmosis
Copy link
Member

(In [6966]) Refs #3330, #766 add integration test for blob limiting behavior present in certain plugins. Also, fix three bugs:

  • CustomVariables plugin does not sort by visit before applying truncation to reports.
  • ReplaceSummaryLabel should be recursive so subtables w/ limited rows show 'Others' as label.
  • Pass user agent & language explicitly when doing bulk tracking in PiwikTracker.

@mattab
Copy link
Member

mattab commented Sep 11, 2012

Timo, amazing work on this feature!! :-)

This will make a huge difference in piwik performance and memory usage. Congrats...

Benaka, thanks for adding the integration tests & the recursive fix. Looking forward to seeing the RankingQuery code used in all plugins and archiving calls! :-)

@peterbo
Copy link
Contributor

peterbo commented Sep 11, 2012

Great commits Timo! - I'll test it within a lower memory environment in the next weeks. Judging from the source code, this should be a huge step for the memory consumption problems!

I'll post the results of the testing in this ticket.

@mattab
Copy link
Member

mattab commented Sep 12, 2012

@peter, wait for the next RC which Benaka is working on to add RankingQuery to all archiving. However it will be limited to 50,000 so unless your DB has more than 50k unique URLs you won't see much improvement...

@diosmosis
Copy link
Member

(In [6976]) Refs #3330, #766 improved BlobReportLimitingTest.

@diosmosis
Copy link
Member

(In [6977]) Refs #3330, fix small bug in BlobReportLimitingTest.

@diosmosis
Copy link
Member

(In [6980]) Refs #3330, #766, #3227 use RankingQuery and truncate tables as they are created in Actions plugin. Also modified phpunit integration testing mechanism so all API calls are tested and outputted before a test case throws.

@diosmosis
Copy link
Member

(In [6981]) Refs #3330, add missing test results.

@diosmosis
Copy link
Member

(In [6986]) Refs #3330, make sure ranking query limit is >= datatable_archiving_maximum_rows*_actions options & refactor a little.

@mattab
Copy link
Member

mattab commented Sep 26, 2012

(In [7065]) Refs #3330 Refactoring of the Actions.php and moving code to: Archiving and ArchivingHelper

@diosmosis
Copy link
Member

(In [7080]) Refs #3330, #766 refactor truncate-as-you-go approach used in Actions. Added maximumAllowedRows member and walkPath method to DataTable and some other utility methods to DataTable_Row types.

@diosmosis
Copy link
Member

(In [7083]) Refs #3330, #766 remove unnecessary methods and rename one function for clarity.

@mattab
Copy link
Member

mattab commented Oct 19, 2012

@capedfuzz, can you please list, what you think is the remaining work in this ticket, in order to apply the code to all other reports where it makes sense? Or do we only apply RQ to Actions report, for now, which seems acceptable too.

Should we however create a ticket to port "Update in place" to other plugins archiving as this would save memory?

@diosmosis
Copy link
Member

Replying to matt:

@capedfuzz, can you please list, what you think is the remaining work in this ticket, in order to apply the code to all other reports where it makes sense? Or do we only apply RQ to Actions report, for now, which seems acceptable too.

It can be applied fairly easily to custom variables, maybe not referers, however I'm not sure it's really necessary, since archiving is pretty fast w/o it. Also, since truncating in place can result in large speedups, that should be done first. I think if RQ needs to be added, it can wait.

Should we however create a ticket to port "Update in place" to other plugins archiving as this would save memory?

Another ticket sounds like a good idea. It would take a lot of refactoring to do it in some places. Also, for CustomVariables + Referers, the way they perform queries will have to change (which means testing for performance degradation).

@timo-bes
Copy link
Member Author

It would work for referrers. If you want, take a look at the queryExternalReferrers-method in Transitions.

@mattab
Copy link
Member

mattab commented Oct 20, 2012

Decreasing priority for now.

@capedfuzz would you mind creating a ticket for applying Update-in-place to all reports in 1.9.x roadmap (normal priority)?

@zawadzinski
Copy link
Contributor

Setting
'archiving_ranking_query_row_limit' to 0 (no limit) causes archiving process not to process Actions.

The following errors shows up during archiving:

Notice:</strong> <i>Undefined index: type</i> in <b>public_html/piwik/plugins/Actions/ArchivingHelper.php</b> on line <b>49</b>
<br /><br />Backtrace --&gt;<div style="font-family:Courier;font-size:10pt"><br />
#0  Piwik_ErrorHandler(...) called at [public_html/piwik/plugins/Actions/ArchivingHelper.php:49]<br />
#1  Piwik_Actions_ArchivingHelper::updateActionsTableWithRowQuery(...) called at [public_html/piwik/plugins/Actions/Archiving.php:478]<br />
#2  Piwik_Actions_Archiving-&gt;archiveDayQueryProcess(...) called at [public_html/piwik/plugins/Actions/Archiving.php:282]<br />
#3  Piwik_Actions_Archiving-&gt;archiveDayActionsTime(...) called at [public_html/piwik/plugins/Actions/Archiving.php:100]<br />
#4  Piwik_Actions_Archiving-&gt;archiveDay(...) called at [public_html/piwik/plugins/Actions/Actions.php:592
...

var_dump'ing $row at plugins/Actions/ArchivingHelper.php:36, shows:

array(6) {
  ["idaction"]=>
  string(7) "1792657"
  [17]=>
  string(1) "1"
  [19]=>
  string(1) "1"
  [20]=>
  string(1) "3"
  [21]=>
  string(3) "279"
  [22]=>
  string(1) "0"
}

so we have a problem with translation of keys in the row.

@mattab
Copy link
Member

mattab commented Oct 26, 2012

Thanks for the report, I created a ticket at: #3482

@mattab
Copy link
Member

mattab commented Jan 10, 2014

This feature is done, we may improve it later or reuse RankingQuery in other archivers.

@timo-bes timo-bes added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

5 participants