@RMastop opened this Issue on April 24th 2017 Contributor

In an environment, having around 90k pageviews per month, the flattened entry page report and exit page report take some time to be generated:

selecting top 10, takes around 10 seconds,
selecting top 50, takes around 19 seconds,
selecting the default top 100, takes around 22 second

If one would compare this to the pages report top 100 plattened report, it loads in around 4 seconds. In the past the flattened pages report were slow but the speed of those report was improved at some point (I don't recall what commit caused the improvements).

Could the Entry page and Exit page report be a candidate for the same loadtime improvements?

@tassoman commented on April 13th 2018 Contributor

We happy updated to Matomo 3.2.0 in php 7.0 fpm environment by Centos7 with OpCache enabled.
Running a dualhead application installation on a single instance of MySql 5.7 DBMS.
Counting in total 6.2 millions actions last month.

Sometimes our web editors need to report theire website activity, flattened entry and exit pages actions included.

We ran in the same issue. more, our Matomo installation hangs giving back errors because php-fpm killed by SIGSEGV. It happens also asking for daily flattened entry/exit page reports.

So I've raised execution parameters to 1GB RAM amount and time to 120 sec but PHP it's still going on Fatal Error because of Allowed memory size exausted in Line 1333 of DataTable.php (links properly to v2.3.0 tag file's version)

I ask you help because SysOps feels we're running already an amount of resoruces.
Can be better? Others reports are speedy flawless ok.

Thank you for caring, bye!

@tassoman commented on June 1st 2018 Contributor

Ciao @mattab , I'm wondering if there are any news on this topic.
Do you think this issue can gain some popularity in development team in the next future?

@mattab commented on June 1st 2018 Member

Hi @tassoman Hard to say, probably not, unless we experience the problem ourselves or one of our customers...

@tassoman commented on June 4th 2018 Contributor

Thank you for fast response, I understand your point of view. Anyway I hope some company might be interested tho. :1234:

@tsteur commented on June 4th 2018 Member

We can reproduce this for month as well. Takes quickly 20 seconds even when just viewing 50 rows as @RMastop mentioned.

I only had a quick look in the code but couldn't find anything obvious except maybe https://github.com/matomo-org/matomo/blob/3.5.1/plugins/Actions/DataTable/Filter/Actions.php#L65-L67 could you try to remove those 3 lines to see if it works better?

@tassoman commented on July 9th 2018 Contributor

Hello crew,
I did some testing after commenting lines of code @tsteur said.
We tried rendering monthly entry-page report with 50 rows limit and it ran without any memory issue.

Then we tried running the yearly entry-page report with 50 rows and it ran also until reaching of max-execution time.

To us is a quite good solution, looks like memory usage has decreased in an empiric way.
(I'm watching htop in the console and zabbix monitoring)

Thank you again for caring, best regards :)

@tsteur commented on July 9th 2018 Member

I think we could update core and remove those 3 lines https://github.com/matomo-org/matomo/blame/3.5.1/plugins/Actions/DataTable/Filter/Actions.php#L65-L67 . They were added 3 years ago when we fixed a bug to track URLs correctly whereas some were before url encoded and some not. To make sure the same URLs were still aggregated correctly no matter if url encoded or not we added this group by. But I think it is not really needed anymore unless you made like aggregations over the last couple of years. Unless you updated only < 1 year from 2.12 to 3.X in which case URLs may be still mixed up in yearly archives for example...

So we could in the worst case only apply this group by only for yearly archives if this was a concern. However, users may see this problem especially in yearly archives when there are many different urls so it wouldn't help too much. For monthly and yearly archives for example we could run a simple check where we iterate over each URL, and check if there are any duplicates with urlencoded + urldecoded and only if needed run the group by. This should be fairly fast and improve speed quite a bit. Should be also quick to develop.

cc @mattab may be a good performance improvement as it currently consumes a lot of CPU

This Issue was closed on November 20th 2018
Powered by GitHub Issue Mirror