Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve serialize dataTable performance #12289

Merged
merged 4 commits into from Nov 20, 2017
Merged

Improve serialize dataTable performance #12289

merged 4 commits into from Nov 20, 2017

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Nov 20, 2017

This seems to improve the performance of serializing a data table which is useful when archiving. In my case it made the archiving of a monthly archive almost twice as fast from about 120 seconds down to like 70seconds.

Before:
image

After
image

Why?
getSerialized is called here 385 times and took before 51 seconds, afterwards about 10 seconds. The thing is that getSerialized gets called recursive, and on the 2nd layer, it is called almost 8000 times, and on the third layer (when having 3 nested dataTables) it is called 40.000 times:

image

When it is called 40.000 times, it doesn't take much time but on the 2nd layer, it spends about 40seconds in the method itself (excluding calling any methods). Looking at the function itself I realized the $aSerializedDataTable + $subTable->getSerialized($maximumRowsInSubDataTable, $maximumRowsInSubDataTable, $columnToSortByBeforeTruncation, $aSerializedDataTable) call which merges BIG arrays over time and seems to be the slow part.

So I changed it to getSerialized($maximumRowsInDataTable = null, $maximumRowsInSubDataTable = null, $maximumRowsInSubDataTable = null, $columnToSortByBeforeTruncation = null) $columnToSortByBeforeTruncation = null, $aSerializedDataTable = array()) which made it like 2 or 3 times slower because it is copying the arrays even more etc.

But when passing as reference it is suddenly fast (less copying):

image

It is down from 40 seconds to 2 seconds.

This seems to improve the performance of serializing a data table which is useful when archiving. In my case it made the archiving of a monthly archive twice as fast from about 120 seconds down to like 70seconds.
@tsteur tsteur added c: Performance For when we could improve the performance / speed of Matomo. Needs Review PRs that need a code review labels Nov 20, 2017
@tsteur tsteur added this to the 3.2.1 milestone Nov 20, 2017
@tsteur tsteur requested a review from mattab November 20, 2017 02:20
It is known that array_key_exists is quite a bit slower than `isset`. As this is executed easily a few million times this makes a difference in performance but here `array_key_exists` and `isset` work the same in logic as we never set `$consecutiveSubtableIds[$id]=null`
@@ -1292,7 +1293,7 @@ public function getSerialized($maximumRowsInDataTable = null,
// we then serialize the rows and store them in the serialized dataTable
$rows = array();
foreach ($this->rows as $id => $row) {
if (array_key_exists($id, $consecutiveSubtableIds)) {
if (isset($consecutiveSubtableIds[$id])) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is known that array_key_exists is quite a bit slower than isset. As this is executed easily a few million times this makes a difference in performance but here array_key_exists and isset work the same in logic as we never set $consecutiveSubtableIds[$id]=null

@mattab mattab merged commit 7846b9a into 3.x-dev Nov 20, 2017
@mattab mattab deleted the serializedatatable branch November 20, 2017 07:53
@matomo-org matomo-org deleted a comment from MatomoForumBot Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants