Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for goals by pages #19974

Merged
merged 32 commits into from Jan 4, 2023
Merged

Conversation

bx80
Copy link
Contributor

@bx80 bx80 commented Nov 8, 2022

Description:

This is a rework of #18221 to improve performance and decrease memory usage when archiving.

Changes:

  • Aggregation and attribution is no longer done in code and is performed as part of the database query.
  • Queries to retrieve goal metric data for pages are now run individually for each goal to decrease peak load on the database server and to reduce memory usage when enriching the actions array in code.
  • Page URL and page titles unit tests have been expanded to cover multiple days

Refs: L3-313

Review

…al at a time, reverted in-code aggregation, expanded page tests to cover multi-day aggregation
@bx80 bx80 added the c: Performance For when we could improve the performance / speed of Matomo. label Nov 8, 2022
@bx80 bx80 added this to the 4.12.5 milestone Nov 8, 2022
@bx80 bx80 self-assigned this Nov 8, 2022
@bx80 bx80 added the Needs Review PRs that need a code review label Nov 24, 2022
core/DataAccess/LogAggregator.php Outdated Show resolved Hide resolved
core/DataAccess/LogAggregator.php Outdated Show resolved Hide resolved
@justinvelluppillai justinvelluppillai modified the milestones: 4.12.5, 4.13.2 Nov 28, 2022
@justinvelluppillai
Copy link
Contributor

@bx80 is this still ready for review or are you addressing @sgiehl 's previous comment still?

@bx80
Copy link
Contributor Author

bx80 commented Nov 29, 2022

@justinvelluppillai I'm currently looking at a solution for supporting segments with the custom query. We definitely don't want to duplicate the segment query building logic, so it's probably a case of either improving the generateQuery method so it can build the complicated custom goals pages query or at least support having a pre-built full query passed to it and then apply the segment wrapping.

@justinvelluppillai justinvelluppillai removed the Needs Review PRs that need a code review label Nov 29, 2022
@justinvelluppillai
Copy link
Contributor

ok thanks, I've removed the Needs review label meanwhile 👍🏽

@justinvelluppillai
Copy link
Contributor

@bx80 @tsteur mentioned a few times that he's done something similar with multi attribution so maybe worth talking to him about this

@bx80
Copy link
Contributor Author

bx80 commented Dec 6, 2022

Thanks for the feedback @sgiehl 🙂

I've applied all the suggested improvements. For large dataset performance the query was tested on a real world database where a single goal had 500k+ conversions in a single day. The new query now runs for each goal individually and it took just under 5minutes to complete archiving for this goal, whereas previously it failed to complete goal actions archiving at all. There are more details about the performance testing in the comments for L3-313

@bx80 bx80 added the Needs Review PRs that need a code review label Dec 6, 2022
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left another comment. Otherwise this looks fine.

core/DataAccess/LogAggregator.php Outdated Show resolved Hide resolved
@justinvelluppillai justinvelluppillai modified the milestones: 4.13.2, 4.13.1 Dec 8, 2022
@bx80 bx80 requested a review from tsteur December 12, 2022 20:50
Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that looks really good @bx80 . Great this got a lot simpler now and that we'll have this query use less memory soon. I didn't do an in-depth review but looks overall all good. Only left a few minor comments to consider.

core/DataAccess/LogAggregator.php Outdated Show resolved Hide resolved
plugins/Actions/Archiver.php Outdated Show resolved Hide resolved
core/DataAccess/LogAggregator.php Outdated Show resolved Hide resolved
core/DataAccess/LogAggregator.php Outdated Show resolved Hide resolved
@bx80 bx80 requested review from tsteur and sgiehl December 14, 2022 23:54
Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it @bx80 but looks good 👍

@bx80
Copy link
Contributor Author

bx80 commented Dec 16, 2022

@sgiehl When you have a chance, could you please give this a final check and merge it if there are no outstanding issues? Thanks!

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments. Otherwise the code looks fine. Archiving seems to work as expected, even for segments.
@bx80 Did you check if the results of the tests have the correct numbers? As you change the existing test fixtures all numbers changed, which makes it hard to verify if they are "still" correct. I tried checking out the old test files, but using the new code for running the tests. The resulting files then differed slightly. Not sure if that is correct or not...

tests/PHPUnit/Integration/DbHelperTest.php Outdated Show resolved Hide resolved
plugins/Actions/ArchivingHelper.php Show resolved Hide resolved
plugins/Actions/ArchivingHelper.php Show resolved Hide resolved
core/DbHelper.php Outdated Show resolved Hide resolved
@sgiehl sgiehl removed the Needs Review PRs that need a code review label Dec 16, 2022
@bx80
Copy link
Contributor Author

bx80 commented Dec 21, 2022

@sgiehl Thanks for reviewing this one, here is the spreadsheet I created to check the test results: m-2030-Goals-per-page-test-calcs.ods
It's just a set of calculations based on the visitor actions in the fixture to show the expected values for each page. I've done a comparison of the test results against the spreadsheet and from what I can see they match. It can be tricky to compare when the test results are for a single day but the spreadsheet is for the entire test period, which is why I changed some of the tests to cover a whole week period.

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look at the test calculations. For me that looks valid. Still not 100% sure why some other tests changed their results slightly, but maybe there was a problem before, that is now fixed?

@bx80
Copy link
Contributor Author

bx80 commented Dec 27, 2022

Still not 100% sure why some other tests changed their results slightly, but maybe there was a problem before, that is now fixed?

I think that was probably the case. 👍 The improved tests should hopefully make it easier to check that the expected results are correct.

@bx80 bx80 merged commit 8dbd2f5 into 4.x-dev Jan 4, 2023
@bx80 bx80 deleted the l3-313-goals-actions-query branch January 4, 2023 00:29
@bx80 bx80 mentioned this pull request Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants