New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make archiving faster when thousands of websites with low or no traffic #5922
Comments
|
|
|
if anyone has a archive.log for this use case of thousands of websites with low traffic, please attach to this ticket, thanks! |
In the following URL the logfile of archiving about 8000 small websites (I can only attach images here so I have to link to an external URL): http://stats.exto.nl/5922/archive.log Hope this helps. Since archiving was moved to a CLI process, it became to slow to archive every day. |
Right now i'm archiving some empty websites and it takes a long time:
Note that It was the first time that those websites are being archived, which explains some of the slowness, but still: something should be done so that archiving some empty websites should be faster than 22 seconds. |
As @czolnowski suggested something like a flag for each site should help I reckon. Basically, we wanna know whether there was at least one tracking request since the last archiving. Or to describe it differently: We wanna only trigger the archiving for sites, that had at least one tracking request since the last archiving run. This will only help if one has many sites with 0 visits. It does not help for many sites with low traffic. At first I thought we could just query the It is probably not worth storing a flag in the |
if you don't trigger archiving when there are no new visit, then we would have missing daily archives and missing week/month/year archives, leading to "no data" in some reports. we'd need to change more code (eg. change the code that deletes out of date archives). Maybe there is instead some room to decrease CPU walltime of archiving requests on "no traffic" website to make them very fast? |
When there is already an archive, and there were no visits, we don't need to rearchive or not? Of course we might need to change some code but that's normal or not? |
Yes we'd need to change code (archive selector would need to allow reading old archives, and do not purge outdated archives as we may read them if we don't re-archive every day eg. 5 days old archives)... it's possible but maybe error prone. I was hoping there we could make the use case "pre-archiving a site when there is no visit" request so fast that we wouldn't need to be clever about reading old archives, etc. not sure if archiving very fast those "low / no traffic" days is really possible though? |
FYI: A quick profile of archiving one day for one site: 60-80% is spent outside archiving for bootstrapping Piwik, loading all reports, all segments, ... It might be possible to make a faster version for the CLI that doesn't bootstrap a lot and directly calls something like |
Would it be possible to publish these profiles? It would be interesting to be able to examine them. |
running
The problem gets N times worse when you add N segments... If we can improve this performance in future Piwik versions, it would for sure help a lot! |
Closing this issue as we made heaps of progress recently and this issue scope is too wide. Well done to the team for all improvements done in last few months! |
Let's discuss ideas to make archiving faster when thousands of websites with low or no traffic.
The text was updated successfully, but these errors were encountered: