It would be nice to see some stats about Piwik DB usage, mostly for debug but also could be useful to the user for long term Mysql scaling and general info.
Here are the stats we could display:
This message should give a short overview of the Piwik DB, in terms of general usage data, and DB focused summary for capacity planning.
Attachment: Patch for this issue.
3004.diff.tar.gz
Attachment: Patch for this issue. (2nd)
3004.diff.tar.2.gz
I've uploaded a patch for this issue (a redesign of the DBStats plugin). It's a pretty big patch, so I think it should be reviewed. Let me know what you think.
Wow, very nice UI and useful reports!
Sorry for the delay in replying to the ticket. Here is my code review and feedback:
+ 'DBStats_ReportData' => 'Report Data',
+ 'DBStats_ReportTables' => 'Report Tables',
+ 'DBStats_ReportDataByYear' => 'Report Tables By Year',
+ 'DBStats_MetricTables' => 'Metric Tables',
+ 'DBStats_MetricData' => 'Metric Data',
+ 'DBStats_MetricDataByYear' => 'Metric Tables By Year',
I think can be simplified eg. "Report data size" or "Report tables, data size"
Then "'Metric Tables By Year'," becomes "%s By %s" and we already know "year" and then you can plugin the one title. So all these translations become 3 new strings.
The new string could be "General_ReportByDate" => "%1$s by %2$s"
+ public function setRelatedReports( $relatedReports )
+ {
+ $apiMethod = "$currentControllerName.$currentControllerAction";
+
+ $this->viewProperties['relatedReports'] = array();
Here $apiMethod will throw E_NOTICE ?
Some comments:
Replying to matt:
Wow, very nice UI and useful reports!
- RowCallbackAddColumn -> rename to AddColumn
- The new filter ColumnCallback -> is the same as existing ColumnCallbackReplace ?
- The new filter Piwik_DataTable_Filter_RowCallback -> is not useful because filters are already all "rows callback" since they just loop on rows and run the function, this filter does not refactor logic
RowCallback & ColumnCallback are just intermediate classes. They don't do anything except execute a callback on some data and are not meant to be used directly. GroupBy & RowCallbackAddColumn derive from them so the code is clearer. I could get rid of them, but then there'll be some code redundancy. I could also keep them and make them 'abstract'.
As for filters, they seem to operate not on rows, per say, but entire DataTables. So I guess, technically, they can do whatever they want to a DataTable. Perhaps some of the logic for my RowCallback should be in the Filter base type?
I was trying to make my filter code clear, w/o refactoring existing code until my changes went under review. Do you think it would be better not to use base types in creating new filters?
- Controller graph code: ideally the graph should be fixed in core to automatically work well with any Unit on any axis. There is in API/API.php a function to get the unit for a column name. Somehow the graphs may use this to handle all units and plot data without requiring the big "hacks" like you had to do here. Im not sure how hard it would be to fix core?
The problem wasn't that it wouldn't work w/ units, but that for memory values to display nicely, they'd need to use several different units (ie, B, KB, MB, ...). This would require modifying the jqplot related code, or possibly jqplot itself. Which would be hard, I'm guessing :)
RowCallback & ColumnCallback are just intermediate classes. They don't do anything except execute a callback on some data and are not meant to be used directly. GroupBy & RowCallbackAddColumn derive from them so the code is clearer. I could get rid of them, but then there'll be some code redundancy. I could also keep them and make them 'abstract'.
I think the code can be inlined it's pretty simple:
foreach($table->getRows() as $row) {
$function($row); // or call_user_func_array
}
It's a case that refactoring is not quite worth the trouble maybe, but also could be if you make them abstract.... up to you?
I tested the blob/metric queries:
BLOB TABLE RESULTS
For a table w/ 100,000 rows: 6.8s
For a table w/ 200,000 rows: 13.56s
NUMERIC TABLE RESULTS
For a table w/ 100,000 rows: 5.66s
For a table w/ 200,000 rows: 11.2s
Hopefully, this isn't too slow. I think we could maybe cache the results of each query (except the current month) in the piwik_option table. Since it would be one per archive table, I don't think it would be all that much space. Would be a bit of a hack, though.
Uploaded a new patch. Some notes:
Let me know what you think of my patch.
Sorry for the delay, here is the review:
feel free to commmit directly
Replying to matt:
- Times are OK, less than 20s is fine, thanks for testing
To be clear, that's ~6-13s per table. So if there is two years of data, it'd be 5.2 min, which I'm guessing isn't so great.
I think I could speed things up w/ a scheduled task that gets run once a month. It would create the table for all existing archive tables except the current month and store them in the options table. Sound good?
A scheduled task sounds good!
In the code, I recommend to cache the result in API.php in the option table. The scheduled task should not do the caching but simply call the API to "pre-archive" the data.
There are cases where new data could be added to the past tables (eg. all "january" tables contain the data of the yearly archives), so maybe the task could be weekly to make it slightly more accurate? It will only be executed for users who enabled the DBstats plugin so I'm not worried about overhead for most users since most users dont enable the plugin.
Replying to matt:
A scheduled task sounds good!
In the code, I recommend to cache the result in API.php in the option table. The scheduled task should not do the caching but simply call the API to "pre-archive" the data.
There are cases where new data could be added to the past tables (eg. all "january" tables contain the data of the yearly archives), so maybe the task could be weekly to make it slightly more accurate? It will only be executed for users who enabled the DBstats plugin so I'm not worried about overhead for most users since most users dont enable the plugin.
I think I can get the last updated timestamp for a table and check if the table was modified since the last 'archive' (see http://stackoverflow.com/questions/307438/how-can-i-tell-when-a-mysql-table-was-last-updated ). This should make it accurate and fast :)
Replying to capedfuzz:
Replying to matt:
A scheduled task sounds good!
In the code, I recommend to cache the result in API.php in the option table. The scheduled task should not do the caching but simply call the API to "pre-archive" the data.
There are cases where new data could be added to the past tables (eg. all "january" tables contain the data of the yearly archives), so maybe the task could be weekly to make it slightly more accurate? It will only be executed for users who enabled the DBstats plugin so I'm not worried about overhead for most users since most users dont enable the plugin.
I think I can get the last updated timestamp for a table and check if the table was modified since the last 'archive' (see http://stackoverflow.com/questions/307438/how-can-i-tell-when-a-mysql-table-was-last-updated ). This should make it accurate and fast :)
Actually, this won't work for InnoDB tables... So for now, the best options seems like your suggestion for the weekly task. Or maybe we could patch InnoDB? :)
could we maybe do SELECT MAX(ts_archived) ?
if you still have your big tables, would be interesting how fast it is?
Otherwise weekly full refresh is fine too.
If so could you write in the UI in grey "inline help" style eg "Report last updated on 2012, Jan 23th" to make sure users know about the 1 week delay.
Replying to matt:
could we maybe do SELECT MAX(ts_archived) ?
if you still have your big tables, would be interesting how fast it is?Otherwise weekly full refresh is fine too.
If so could you write in the UI in grey "inline help" style eg "Report last updated on 2012, Jan 23th" to make sure users know about the 1 week delay.
For 200,000 rows the query takes about ~.4s, however MAX(idarchive) is instantaneous, so I'll use that.
(In [6324]) Fixes #3004, redesigned DBStats plugin, added several new reports including database space taken up by tracker tables, database space taken up by archive blob tables, database space taken up by archive metric tables, database space taken up by individual reports & database space taken up by individual metrics.
Notes:
(In [6329]) Refs #3004, fix PrivacyManager regression.
Wow big commit!! Nice work..
Reopening & Code Review:
maybe should be = true or = 1 instead of -1 ?
(In [6340]) Refs #3004, fix regression due to keep_summary_row JS check.
(In [6343]) Fixes #3004, tweaks.