@tsteur opened this Pull Request on December 17th 2022 Member

Description:

See this profile run where we fetch data over 9-10 months:

image

It issues 281 SQL queries in this part of the code. Then set a breakpoint as this wasn't expected and it's actually fetching a query per period (eg per day or per week) when it only needs to fetch one query per month.

See screenshot below. Before it would execute 10 queries (one per day period), vs now it would only fetch 2 queries and in the second query fetch 9 of them.

Screenshot 2022-12-18 at 7 42 10 AM

It will now basically aggregate all entries from same archive data into one query. Looking at the code I believe it was actually meant to work like that but never really was.

Review

@sgiehl commented on December 20th 2022 Member

@tsteur The code changes are looking fine.
But are you able to explain why the test results for that specific tests changed?

@tsteur commented on December 20th 2022 Member

@sgiehl yes, I am.

First of all, there are 2 same entries in the log file for different domain. So /blog/category/meta/index is available for both domains.

Screenshot 2022-12-21 at 7 38 37 AM

Then previously see below we would have fetched one archive after another for each day. First idArchive 13, then idArchive 5, then idArchive 8 but now we group this into one query to fetch them 13,5,8 at in one query.

Screenshot 2022-12-21 at 7 41 42 AM

idArchive being an in index etc it returns the archive rows sorted first 5 then 8 then 13. Vs before the $rows would have been 13,5,8.

Screenshot 2022-12-21 at 7 42 58 AM

That means the original URL from a different domain gets preference. There isn't generally one more right than the other though and things can be always completely random depending on what day what was tracked first. We can't even order by idArchive or ts_archived as things might be archived completely out of order. Generally, both are "correct" though so this shouldn't be needed anyway.

@sgiehl commented on December 21st 2022 Member

Ok. That makes sense :+1:

This Pull Request was closed on December 21st 2022
Powered by GitHub Issue Mirror