Currently, the multi sites dashboard page render takes 6 API requests (ie. 6 mysql SELECTs) for each website displayed. If a Piwik install has 1000 registered websites, the page load will simply fail, because the php page would try and execute 6,000 queries which is not a good idea.
Instead, only one request should be made to select the 6 values necessary for each website. This query would therefore select 6,000 values at once which would be as fast as we can get with that many values.
This query would join the site, access and archive_* tables. See Piwik_Access::reloadAccess() and core/Archive/Array/IndexedBySite.php for the current behavior. I think the solution might be to have a new class in core/Archive/Array indexed by user?
Attachment: Patch for this issue.
issue1077.diff.tar.gz
Attachment: Patch for issue #1077. Optimizes the IndexedBySite archive type. Speeds up All Websites page by 12 seconds when Piwik tracks 800 websites.
issue1077.diff
Attachment: A quick benchmark used to test how long it takes to retrieve the All Websites HTML. Written in CoffeeScript & uses node.js.
test_all_sites_time.coffee
Attachment: Patch for issue #1077. Several smaller optimizations included. Made for rev 5492.
issue1077_assorted_speedups.diff
Hi, is there any update on this?
One of the main reasons I installed Piwik was to use the multisite feature (it worked great when I had 10 sites). However I now have 500+ sites and this feature is unusable. If I try to run it, it times out of after a few minutes. Cheers.
Other bug report:
I am tracking 22 sites with Piwik. Two have titles that start with A (we'll call them A1 and A2), and the other 20 have titles starting with C-Z.
The main MultiSites screen is a bit funky. The initial view shows sites C-Z, in descending alphabetical order. Clicking on Next shows A1 and A2, in alpha order. These should be shown at the top of the other screen, while the last 2 should be on this second screen.
Also, the text at the bottom reads "1-21 of 22", when in fact it should read "1-20 of 22"
Doable in one query, we can go from 1,000 websites= 6,000 SELECT to one SELECT which would greatly improve performance. This is one of the last performance blockers for 1.0 with #409
See also #1315
Also it would be useful to have a global.ini.php setting controlling how many websites to show in the UI by default (so Super Users can decide to show 1000 if they want to)
Feature requests from forum post
Hello all, I'm interested in fixing this bug. I think I've worked out a way to fix it, but I'm not completely familiar with the code base, so I wanted to run my idea by you guys before I created a patch.
I looked at the queries being made when the 'All Websites' page is loaded and found that the large number of queries was due to the Array Archive types looping over an array of other Archives. I thought a possible solution would be a new archive type (called Piwik_Archive_Cached) that cached the archive IDs that result when prepareArchive() is called in a BLOB archive record. This would reduce the number of queries made from two per site per API call to two queries per api call, but it wouldn't make the page any faster when archiving is triggered through the browser.
Does this sound like an acceptable approach?
capedfuzz, in your proposal, will you run a query such as WHERE idarchive IN (.........) ?
This could be a problem, when there are for example, 5,000 websites, the WHERE idarchive IN ... will be rather slow. But, it is worth testing anyway such patch, as it might still improve performance?
In any case, it is normal that the page will work nicely only when automatic archiving is setup (ie. browser archiving is disabled), so this limitation is definitely OK.
looking forward to seeing your patch or ideas about this!
I planned on using IN (...), but I might be able to optimize the query... I'll create a prototype and see what happens.
A quick update, wanted to let you know I'm still working on this. I had to go a different route than what I proposed: instead of storing archive IDs I select all the IDs in one go. From my simple tests (which didn't include loading sparklines), this sped up the page from ~46 secs to ~17 secs when loading data for 800 websites.
I still have a fair amount of work to do, but hopefully I'll have a patch posted soon.
Actually, sorry, that's ~24 to 17secs. Forgot to disable archiving when testing the normal version. So not quite as monumental a speed up :).
capedfuzz, any update on your patch? Any improvement is definitely welcome for the next release :) thx
Yes, I've got an update. I ended up adding some more improvements and got the page to load in < 4 secs for a sufficiently small amount of sites per page. However, to do that I had to make it possible to do the sorting/pagination logic server-side. So the patch is a bit big. I should be able to submit it some time tomorrow (EST), but in the mean time, here's a list of the big changes I made:
0) I've added a new class, Piwik_QueryOptions, which holds extra query info (like what to use in the LIMIT/OFFSET/ORDER BY parts of a query). I added a parameter to the VisitsSummary API's get & the Goals API's get methods, as well as each Archive type's getDataTableFromNumeric method. Currently, its only used by the IndexedBySite type.
1) IndexedBySite won't call prepareArchive unless archiving is enabled. It'll also select archive ids in one query.
2) I added a getTotal method to the VisitSummary API & the Goals API to compute the sum of a set of archived values, and I added a getSum method to every Archive type. Currently, its only implemented by the IndexedBySite archive type (the others throw an exception).
The page will still load slowly when looking for a custom range that initiates archiving.
My patch is attached. It will speed things up greatly when there aren't that many sites shown, and speed things up a bit when a lot of sites are shown.
Some notes:
I've created some optimizations that speed up the All Websites page by ~12 seconds when tracking 800 websites. I had to make changes to the Piwik_ArchiveProcessing class, but I don't have detailed knowledge how this class & its descendants work. Would anyone mind taking a quick look at my changes to make sure there are no potential issues w/ it (I've uploaded a patch)? Thanks in advance.
capedfuzz, looks like an excellent patch! I haven't tried applying it, but will definitely test performance before and after you have committed.
Feedback
Index: core/ArchiveProcessing.php
===================================================================
--- core/ArchiveProcessing.php (revision 5455)
+++ core/ArchiveProcessing.php (working copy)
@@ -900,15 +900,22 @@
return $isDisabled;
}
+ protected $archivingDisabled = false;
protected function isRequestAuthorizedToArchive()
{
- return self::isBrowserTriggerArchivingEnabled()
- || Piwik_Common::isPhpCliMode()
- || (Piwik::isUserIsSuperUser()
- && Piwik_Common::isArchivePhpTriggered())
- ;
+ return !$this->archivingDisabled
+ && (self::isBrowserTriggerArchivingEnabled()
+ || Piwik_Common::isPhpCliMode()
+ || (Piwik::isUserIsSuperUser()
+ && Piwik_Common::isArchivePhpTriggered())
+ );
}
+ static public function disableArchiving($disable = true)
+ {
+ $this->archivingDisabled = $disable;
+ }
+
/**
* Returns true when
* - there is no segment and period is not range
I don't see any potential issues otherwise, please commit when you manage to add the test. if you need any help let me know :)
How long does the page take before and after? what % improvement?
(In [5475]) Refs #1077. Improved speed of the IndexedBySite archive type by optimizing the case when launching the archiving process is disabled. Instead of selecting archive IDs one at a time, IndexedBySite will now select them all at once.
Notes:
I committed the change & there's a new integration test.
The times I record when I test aren't constant, but the most conservative view of the speed improvement is from 14s to 6s (~57%).
I've also uploaded the script I've used to test the times.
Very nice commit and ideas. New integration test good.
I see from 17s to 7s improvements on my 600 websites test setup... Very nice!!! Great work.
When I enable profiler and debug logging, I see that now there is this executed for each site:
Executed 565 times in 564.3ms (average = 1ms)
SELECT * FROM piwik_site WHERE idsite = ?
I guess this could be detected upstream and select * instead, which would probably bring another 10% improvement.
Executed 4 times in 40.2ms (average = 10.1ms)
SELECT idsite, MAX(idarchive) AS idarchive FROM piwik_archive_numeric_2011_11 WHERE date1 = ? AND date2 = ? AND period = ? AND (name = 'done' ) AND (value = '1' OR value = '3') AND idsite IN (1,3,4,5,6,30,2944,2945,[... hundreds of ids ...]05,3506) GROUP BY idsite
This one is also interesting. Executed 4 times, because of: Piwik_MultiSites_Controller->getSitesInfo() called at [D:\piwik\svn\trunk\plugins\MultiSites\Controller.php:37]
(In [5489]) Refs #1077 Pre-load all websites when user is super user only. Saves hundreds of select * from piwik_site where idsite = ?
(In [5490]) Refs #1077 Missing code preventing multiple executions
Replying to matt:
When I enable profiler and debug logging, I see that now there is this executed for each site:
Didn't know there was a profiler built into Piwik, thanks for pointing it out!
Executed 4 times in 40.2ms (average = 10.1ms) SELECT idsite, MAX(idarchive) AS idarchive FROM piwik_archive_numeric_2011_11 WHERE date1 = ? AND date2 = ? AND period = ? AND (name = 'done' ) AND (value = '1' OR value = '3') AND idsite IN (1,3,4,5,6,30,2944,2945,[... hundreds of ids ...]05,3506) GROUP BY idsite
This one is also interesting. Executed 4 times, because of: Piwik_MultiSites_Controller->getSitesInfo() called at [D:\piwik\svn\trunk\plugins\MultiSites\Controller.php:37]
- we could group these 4 API calls into one (maybe via the new API.get helper?
- For example, maybe we create a custom API call for the ALl Websites dashboard that selects the few metrics needed. This is more a hack, but that would work well and would be quite easy to implement.
Downside: tied to naming of metrics in VisitsSummary,Goals plugin, but this is fine since these metrics won't change their name anytime soon.
I think you could get rid of these by allowing Piwik_Archive creation to be separate from the querying. I'll upload a patch to show you what I mean. (This might result in a speed up when archiving is enabled, too...)
Though from what I can tell the remaining SQL isn't taking too much time to execute. I'm going to try and whittle the time down a couple more seconds. If I can't I'll post a comment.
I uploaded another patch with a bunch of smaller fixes. This patch speeds All Sites up for me from 5.2s to 2.2s (w/ 800 websites stored). Here's a list of the changes:
I can do one more optimization that should reduce the time to ~1.5s. When the website count goes into the thousands, the speed up will be more apparent. But, it will require an added method to the VisitsSummary & Goals APIs.
I noticed that the only remaining thing that takes too much time is the creation of the IndexedBySite archive. The Archive::build method instantiates an Archive, Site, Period and Date instance and IndexedBySite will invoke it for every site. The first time its run it takes ~1s on my setup. I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this?
Also, I can rewrite part of the MultiSites controller and add an API class w/ a getSitesInfo method. This way it can have some integration tests.
Great!!
core/Archive/Array/IndexedBySite.php
36 $archive = Piwik_Archive::build($idSite, $strPeriod, $strDate, $segment->getString() );
37 $archive->setSite(new Piwik_Site($idSite));
38 $archive->setSegment($segment);
36 $archive = Piwik_Archive::build($idSite, $strPeriod, $strDate, $segment );
Here the call to setSite() is not done anymore - is this call to setSite() not used/useless?
Excellent changes in the Date class (makes the code much clearer!!) and good speed improvements :)
I removed part of it, but I could apply a similar optimization to other parts of Piwik if you let me know where they are. Saves ~2.25s (no idea why so much...)
I think the sql query is quite useful for the "super user", since getSitesWithViewAccess will issue a sql query ID IN(...) with thousands of IDs. For the Super User, we know that all websites are needed so this query would help, but leave your new code w/ this improvement as well for admin/view/anonymous users.
add an API class w/ a getSitesInfo method
WHat's your idea? I thought that this function could return the metrics needed, bypassing the VisitsSummary and Goals APIs. This would change from 4 sql queries to sql 2 queries.I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this?
After all other changes (and calling 1 API for today, 1 for yesterday), I'm not sure if this one would be required still.
Please feel free to commit! great stuff! in fact this ticket can be closed after this, since the page will render for most users now.
Replying to matt:
Here the call to setSite() is not done anymore - is this call to setSite() not used/useless?
Archive::build will set the site on a Single archive, so the call in IndexedBySite is unnecessary. It did have an effect on the IndexedByDate class, but I added a call to setSite to IndexedByDate's constructor.
- I support you've used the profiler to test the queries are called once as expected etc.
Excellent changes in the Date class (makes the code much clearer!!) and good speed improvements :)
I removed part of it, but I could apply a similar optimization to other parts of Piwik if you let me know where they are. Saves ~2.25s (no idea why so much...)
I think the sql query is quite useful for the "super user", since getSitesWithViewAccess will issue a sql query ID IN(...) with thousands of IDs. For the Super User, we know that all websites are needed so this query would help, but leave your new code w/ this improvement as well for admin/view/anonymous users.
The issue I see w/ the super user improvement is that its in the Archive::build method. So any API request that checks for 'all' sites may result in two queries (if the original API method optimizes for non-super user cases). If I move the optimization to the All Sites plugin, this wouldn't occur, but the optimization would have to be explicitly added to other parts of Piwik that get data for 'all' sites. I'm going to leave it in when I commit, though.
add an API class w/ a getSitesInfo method
WHat's your idea? I thought that this function could return the metrics needed, bypassing the VisitsSummary and Goals APIs. This would change from 4 sql queries to sql 2 queries.I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this?
After all other changes (and calling 1 API for today, 1 for yesterday), I'm not sure if this one would be required still.
No, it wouldn't. After I thought about it some more I realized you could just go through the data table and forget about the other APIs :).
Please feel free to commit! great stuff! in fact this ticket can be closed after this, since the page will render for most users now.
I'd still like to refactor the Multi Sites plugin, though. Do you mind if I create a new ticket: 'Refactor Multi Sites plugin to follow Piwik plugin conventions'?
If I move the optimization to the All Sites plugin, this
wouldn't occur, but the optimization would have to be explicitly added to
other parts of Piwik that get data for 'all' sites. I'm going to leave it
in when I commit, though.
I think the only other parts is the "Manage Websites" page in the Settings, for the Super User. Otherwise no other page should select all websites information I think.
I'd still like to refactor the Multi Sites plugin, though. Do you mind if I create a new ticket: 'Refactor Multi Sites plugin to follow Piwik plugin conventions'?
Creating a new ticket sounds good, since this one focuses on performance which is now mostly fixed :)
(In [5505]) Fixes #1077, Assorted optimizations for the Multi Sites plugin:
(In [5523]) Refs #1077 fix typo
note to self: we need UI / Widgets rendering tests :)