New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Visitor Log live query more performant #14700
Conversation
Maybe not the best idea actually.
Implemented something like this but haven't pushed yet. It be a compromise between things maybe and possibly still be somewhat efficient for high traffic websites. |
plugins/Live/Model.php
Outdated
} | ||
} else { | ||
$queries[] = array($dateStart, $dateEnd); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an idea that might make it a bit faster (assuming I'm reading this correctly). What the code is doing now is this:
- check dateStart => dateStart + 1 day
- still no visits? check dateStart => dateStart + 7 days
- still no visits? check dateStart => dateStart + 30 days
- ... etc.
In each query we're checking the same rows as the previous one, we could change it to something like:
- check dateStart => dateStart + 1 day
- still no visits? check dateStart + 1 day => dateStart + 7 days
- still no visits? check dateStart + 7 days => dateStart + 30 days
- ... etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diosmosis it should already do what you suggested. Unless I'm not understanding it right. But basically the code does not fetch the same day or week twice for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, $virtualEndDate
changes, I didn't notice that.
I wonder if this is similar to |
It's quite similar @diosmosis just that |
@tsteur I was actually more just wondering if it would save db resources to go month by month, instead of querying for a whole year (or the whole range). I haven't done any sort of testing, so I have no idea if it would help or not. |
@diosmosis we were thinking if we go month by month it might be in the end too many queries and taking too long. The though was more like improving performance for high traffic Matomo's. And they likely find eg 500 visitors within the first day / week or month. And if there weren't 500 visitors in the first month, likely they also won't be found in the next months so it might be better to query the remaining date range at once. We could be a bit "smart" maybe and have some logic if we find eg 25% of the results in one month, then we could assume that we will have all rows found if we query 3-4 more months instead of say the remaining 2 years. Is this maybe something you mean? |
I just figured the high load was due to the number of rows being counted overall, and if so going over smaller chunks might be less of an issue. However, if that isn't what's causing the high load, then it wouldn't be a good idea. Here's another crazy idea that's too difficult to implement: call VisitsSummary.get for multiple days over the last couple months or so, and use the number of visits to in rows to figure out where an offset starts. (I just wanted to write this idea since it seems strange yet workable, I don't think you should work on it :) ) Anyway, code looks good to me. Don't know if @mattab needs to look at it. |
@diosmosis that might not help when a segment is used and when the segment is not preprocessed for example. |
Sometimes the live query can be really slow (like many minutes). It could even end up using all database resources when there is a lot of data in the log tables. Especially when using segments on top. Often the system ends up doing queries like these:
Where it fetches data over several months. This obviously can take a very long time, and it might create big temporary tables, full table scans, etc. As we're only wanting the most recent results, it be smarter to first look only at the last day and if this query returns enough results, then we can directly return the result making it a LOT faster.
In this PR I'm executing up to 5 queries to get the result instead of doing it in one query:
If only 1 week is being requested, it would only execute 2 queries. If 2 years are requested, it would execute up to 5 queries but stop as soon as the requested data has been found. So we might be actually only looking at one day compared to 2 years!
The logic can be only done if there is a data limit. Otherwise we need to return all results anyway and this can still be a problem fyi @mattab
It cannot be executed when there is a limit plus an offset @mattab . So it won't solve too many problems possibly. Unless we executed each query twice see this logic:
Another workaround for offset that slightly improves it could be that when
Offset <= 3 * LIMIT
then we apply the offset in PHP and not in MySQL. Means we fetch a lot more rows but then discard them in PHP. Would make this logic work for a few more pages.And another workaround be, if there's an offset, we simply check if the first day gives results and if not fetch all other days.