New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New report listing all User ID, in Visitors > Users #9883
Conversation
… unique user IDs and some aggregated data. Includes reindexing of raw visitors log into aggregated user ids table
Is there any reason for not having that as a separate plugin, not bundled with core but available on the marketplace instead? |
@sgiehl The issue description looks like it's needed in core. If it's not, I will go with the marketplace and small changes in core. |
Yes, it would be helpful to have in core I'd say. I'm excited about this one I'll have a look :) |
MAX(sub.visit_last_action_time) as last_visit_time, | ||
SUM(sub.visitor_count_visits) as total_visits | ||
FROM ( | ||
SELECT user_id, idsite, idvisit, idvisitor, visit_first_action_time, visit_last_action_time, visitor_count_visits FROM log_visit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to make sure the prefixed table name is used.
First 👍 for working on this. It'll be a useful feature and it'll be good to have all UserId related feature in a separate plugin. We will move some more UserId specific things that are currently in other plugins there some day. For people not knowing what it does I took a screenshot: Even the visitor profile row action works 👍 Re visualization:
Re indexer: I've been thinking for a while what a good example Archiver would be but first: You can create a new archiver for this plugin by executing It's quite complex but an example could look like this:
This will also make your API much easier as pretty much all you need to do is:
Afterwards you can remove pretty much all related to the indexer and it will have more features. I haven't tested anything but wanted to point in the right direction. Really excited to see this report in Piwik :) |
…hes columns; format dates
@tsteur I've updated the PR with 3 more columns and date formatting. |
They are also optimized for holding data for long periods, such as month, year, etc.
That's not really how Piwik works. All reports in Piwik are by period. This way users can decide whether they want to see all users (eg by selecting a year or a long data range) or whether they want to see only recent users (by selecting a short date range, day period, week period). This is IMHO quite important to give users that freedom as not everyone is interested in seeing all users but maybe more recent active users.
With archiving the search comes out of the box and actually we do only load one record from the database (the archived report record). It might fetch multiple archives from the database if a range date is selected or if the aggregated report is very big but it's still very fast. It's a very fast query and unserializing + searching is quite fast as well with the existing archives. Plus we save some time not having to aggregate any data.
It's true it could be faster by having stored aggregated reports differently but it's actually not too greedy as the select for an archive is very fast (instead of having to fetch many aggregated entries and aggregate them again or so). The problem is that we might do the unserialize (and some other work) for all rows within the datatable, even though we maybe need only one eg when
Reading the archives with 1b+ visits is still very fast I would say. Problem is only the archiving process itself: meaning querying the log tables and aggregating the data for the new archive. Even with 1b+ visits it's actually not a big deal to archive the reports, mainly the Action URLs report is becoming slow with 1b+ visits but that's a different topic :) There are a couple of problems with the indexer.
There are likely many more problems with it as the archiver is a "beast" in terms of feature and logic and stuff :) In general, yes, the archives could be a bit smarter and less resource intensive. On the other side reading the archives is always quite fast, no matter if there are 1b visits or 100 visits. Generating the aggregated archives is a bit of a problem and could be faster but to solve this properly it'll take many months of work. To have this report in Piwik we would kinda need to use the archiver here as it's just "how Piwik works". Users want to apply different dates and segments to their reports etc. |
@tsteur Thanks for the detailed answer 👍 I'll rework it to use archiver and see how it works |
@tsteur BTW an example why I'm concerned about high resources consumption by archive reads. I took the Cities report as an example, because it's closest by numbers to users report on my local instance. I have around 300k users and there are 262587 cities. Total visits from 2013-06-06 till 2016-03-10 is around 1.7M. From the issue description I got that searching a certain user by user IDs between all users may be a common usage pattern for the Users report. So, I'm trying to load lifetime cities report and search through it:
Paginating it is the same pain. Also, it started working only when changed memory limit from 1Gb to 2Gb. Archiving is configured to happen by cron. With such numbers the report is unusable. I'm afraid to have the same performance with user IDs. |
When requesting The loading time of |
Do you maybe have the settings [Debug]
always_archive_data_period = 1
always_archive_data_day = 1
always_archive_data_range = 1
aalways_archive_data_year = 1
aalways_archive_data_week = 1
aalways_archive_data_month = 1 enabled? |
@tsteur I reworked it to use core archiver. Concerning the performance questions. The [Debug] options mentioned by you are not active on my local. On my test data set I have 298567 user IDs. If I load a lifetime report, it takes around 60s (PHP 5.5) and 20s (PHP 7.0). Profiling shows that the main bottlenecks are operations on DataTable. It's mainly deserializing, sorting etc. Disabling even one "Sort" filter makes Users report twice faster for php5 and 5x faster for php7. You may take a look at the xdebug profiles: On short periods or small data it works just like other reports. The problem is that piwik archiver-based reports work slowly and consume a lot of resources when containing so many rows. Cities is one example, other example is user IDs. I think having > 100000 users is not something extraordinary for an average business. |
These are our known bottlenecks so far where we cannot do too much about it as mentioned in previous posts. Problem is we're unserializing etc all rows even the ones that are not needed etc. The user report might have 300k rows and Piwik does not perform very well when having so many users. BTW: In Piwik 3 sorting will be faster as we're using native sort methods there instead of custom sort. A workaround could be to disable sort when there are eg > 300k users. This will let most Piwik users still use sort but big sites would not be able to use it until it performs faster. Code would be like this: if ($dataTable->getRowsCount() > 300000){
$dataTable->disableFilter('Sort');
} Haven't tested it though. If it's 20 seconds with PHP 7 I think it would be still acceptable for now to leave Sort always enabled (for now). Have you profiled this on a fast server or more an average or local development server? Also be aware that XHProf adds quite a bit of overhead when one method is called many times contrary to xdebug see eg https://tideways.io/profiler/blog/profiling-overhead-and-php-7 . It would be good if you could test actual speed of API call without Xdebug and without xhprof enabled. I'd expect it to be at least twice as fast just by disabling xhprof. If xdebug is enabled it should be even much faster. |
@tsteur 19-20s is the time without xdebug or xhprof, I enable profiler only when I need to debug. It's my local dev machine: Macbook 2014, 2GHz Core i7, 16GB DDR3, SSD. Haven't tried on a dedicated server, I think it should be faster there. Would be nice if you can try it on more or less real environment. |
I can't try it on a more or less real environment soon but suggestion would be to just merge it (I need to review PR) and see how it performs. Most times users rather view a small date range of a single day or week and most users will have much less than 300k users. Nonetheless it should be also fast when having eg 300k users for a single day but we'd improve this one afterwards. I'll try to review today or in the next days and do some tests. |
*/ | ||
public function __construct(LoggerInterface $logger) | ||
{ | ||
$this->logger = $logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logger is not used anymore?
$columnsAggregationOperation = null; | ||
$this->getProcessor()->aggregateDataTableRecords( | ||
$dataTableRecords, | ||
$maximumRowsInDataTableLevelZero = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should set a limit of eg 50k rows by default. To do this I suggest to create a new config entry similar to custom variables see https://github.com/piwik/piwik/blob/2.16.0/config/global.ini.php#L393-L395
eg datatable_archiving_maximum_rows_userid_users = 50000
.
This way it will be fast for all users by default, even when having more than 50k users as it other users will be grouped into one summary row "others". This is the Piwik way of doing it currently. Users that want to see all users can simply set it to a high value like 10mio and it will make sure that all users are shown in the report.
Basically you will set similar to https://github.com/piwik/piwik/blob/2.16.0/plugins/CustomVariables/Archiver.php#L49
$maximumRowsInDataTableLevelZero = Config::getInstance()->General['datatable_archiving_maximum_rows_userid_users'];
It's not an ideal solution for sure but that's how it is supposed to work right now.
So far I had a look and PR looks good apart from a few things. Really good work so far 👍 |
- remove logger injection - change archive record name - add low population filter - add all columns visualization - add datatable_archiving_maximum_rows_userid_users configuration, default value is 50000
Implemented the small changes you requested. |
*/ | ||
public function query($sql, $bind = array()) | ||
public function query($sql, $bind = array(), $usePreparedStatementsCache = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this is likely also no longer needed / unused now
@tsteur added the data-row-metadata attribute. It allowed to remove much code, so I like it more. |
Awesome work 👍 We will merge for Piwik 2.16.2 as we just released the RC for 2.16.1 last Friday. Looking forward to have this feature in Piwik |
|
||
DataTable_RowActions_VisitorDetails.prototype.performAction = function (label, tr, e) { | ||
var visitorId = this.getRowMetadata($(tr)).idvisitor || ''; | ||
if (visitorId.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to myself: check if we need to call encodeUriComponent
Thanks for the PR @alexkuk I will merge now 👍 👍 |
* Add link to Product changelog in the Developer changelog (#10046) Here is a recent comment from a community member ` another colleague found and pointed me to your changelog for 2.16.1, which was admittedly hard to find, ` which triggered me to add a clear link in our visible changelog root file. * Update FrontController documentation (#10049) * Update link to the overview * Remove deprecated method from example * Add TweakStyle browser icon * Add possibility to execute plugin test suite * 2.16.2-b1 * Fix pagination in datatables when displaying all entries (#10063) * fix #4410 The string to escape is not a valid UTF-8 string (#10050) * Trigger a Goal conversion for "Event matching goals" even when an event is tracked without a URL (#10018) * use idGoal parameter in system tests * Trigger a Goal conversion for "Event matching goals" even when an event is tracked without a URL * Test files for green build * Add new expected test file * do not start a session during tracker mode or if session is not started yet (#10071) Eg could start a session during tracking if a tracker plugin is not compatible with the current Piwik and it would try to trigger a notification because of this. * Implements #9389 - UserId module, that adds a new Users report (#9883) * Implement the UserId module, that adds a new Users report showing all unique user IDs and some aggregated data. Includes reindexing of raw visitors log into aggregated user ids table * Update UserId module: add total_actions, total_events and total_searches columns; format dates * Use an injected model instead of creating a new object * Rework the UserId plugin to use core archiver instead of custom indexer * Users report small changes: - remove logger injection - change archive record name - add low population filter - add all columns visualization - add datatable_archiving_maximum_rows_userid_users configuration, default value is 50000 * Remove unused method parameter * Users report: remove custom visualizations and add data-row-metadata attribute for every row * minor UserId tweaks, enable row evolution for userId etc (#10072) * addding userId plugin description and enable plugin during update (#10073) * Fix processed metrics are not shown in visitor map when viewing cities Eg nb_actions_per_visit bounce_rate avg time per visit Bounce rate seems to always show 100% now but not sure why * some more fixes * fix tests after adding new userid feature * Fix tests * Ignore idvisitor in system tests as it always changes in each test * Skip email tests in case intl extension is not installed * Update ui tests * update submodules * Explain a user should call setSessionCookieTimeout when changing visit length * value should be specified in seconds, not milliseconds * fix example was wrong * language update * allow + in database names during setup (#10091) * only fail when at least 5 domains extensions are failing, so it does not fail every time IANA adds a new domain extension... * Remove travis `group: legacy` and use their latest infrastructure (#10094) * Removed group:legacy from travis.yml template * use latest php 5.3 instead of php 5.3.3 * Run MySQL off the RAM in our CI server to have fast builds * Use travis files from latesst master branch commit * updates travis.yml * language update * refs matomo-org/searchengine-and-social-list#3 improves detection for google searches without keyword * refs matomo-org/searchengine-and-social-list#3 improves detection for yahoo searches without keyword * Fix some typos in comments (found by codespell) (#10120) Signed-off-by: Stefan Weil <sw@weilnetz.de> * refs matomo-org/searchengine-and-social-list#3 improves detection for yahoo japan searches without keyword * fixes #10130 7 days in Evolution over the period when Period is Day * misc: Fix typos (found by codespell) (#10136) Signed-off-by: Stefan Weil <sw@weilnetz.de> * plugins: Fix typos (found by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de> * submodule update * language update * improve referrer type detection for direct entries ignore fallback detection (same host for referrer and action) if ignore unknown urls is activated for current site * adds some more test cases * update translation command to use ProgressBar instead of deprecated ProgressHelper * Manage Website: Links to website (#10087) * Manage Website: Links to website Add clickable urls for each website as requested in #10066. * Remove extra = and whitespace Updated style * New tab, rel noreferrer, ng-href, commas Changed the urls to open in new tabs, rel=noreferrer, ng-href, and add commas back into the list as requested * Fix pasting error Edit out extra </a> and {{ urls}} * Change div to span Change div to span to have url on the correct line * When only a icon is rendered, the title is not enough to provide an accessibility label, so it an aria-label to the icon so screen reader element lists and such have something to display. * Update urlnormalizer.js (#10171) * Update piwik.js (#10170) * JavaScript SDK: add support to send tracking requests to multiple Piwik servers and/or store in one or multiple websites ID (#10085) * Add support to configure multiple trackers easily * added tests * fix tests * 2.16.2-b2 * Use Common::sendHeader for consistency * Up to date minified piwik.js #10085 * Screenshot UI tests pixel alignment minor change * Escape the value as suggested in the Pr comments. * language update * refs plugin-CustomDimensions#31 fix doc block for custom dimension was wrong * add callback ability to trackPageView (#10172) * Update piwik.js * Update piwik.js * submodule updates * language update * update languages * rebuilt piwik.js * split all tests (#10192) * submodule update * fix some tests * device detector update * Add mention to BrowserStack.com in our readme (#10205) * Check if UserCountry plugin is activated before showing location data * ensure to reset translated categories for each system test (#10210) * link to useful CI build result dashboard and tweaks * in Evolution over the period when Period is Day the limit should be actually 8 instead of 7 See #10130 (comment) * Fix the flags of Switzerland and Nepal Adding a transparent padding ensures the flags will not be stretched to the 16x11 format of all the other flags. * Update the flag of Tokelau * Update the flag of Myanmar * Update the flag of Lesotho * Update the flag of Libya * Update the flag of French Guiana * fixed some typos [ci skip] * fix report order of browser reports * update tests with new order * Update JQuery to the latest 1.12.4 version (#10260) * Update JQuery to the latest 1.* version * UI tests: Updated Jquery to latest version * submodule update * Update jquery to latest 1.*, new release checklist test * language update * Remove uneeded HTML files from a JS library * Put idSite into single quote for support custom string id (#10247) * Put idSite into single quote for support custom id This is small change, but is necessary for #9975 and #4920. On my tests, even with single quote, numeric idSite works fine. * Change unit test expected to support test with strings * Change expected data to support test with strings * submodule update * submodule update * Tweak core:archive output logging to better explain where to find error log depending on server configuration * Adding hyphen for consistency * Composer: remove the disable tls flag, refs #10023 (#10277) * Do not throw an exception when debugging tracker (#10267) This method is only called when Tracker debugging is enabled. It may cause problems if an unknown type is used and then the tracking completely fails which also stops tracking. Debugging info should not really throw exceptions, at least not in this case. * Fix CSV export cannot be used with dates like previousX (#10273) * fixes #9838 - show description bubbles below column if they would be out of viewport if shown above * submodule update * Let plugins provide custom piwik.js tracker tests (#10270) Plugins could provide custom tracker tests by creating a file 'tests/javascript/index.php' within the plugin. Won't be a documented feature for now but could add it to dev changelog if needed. Content within the test file could be eg ``` <script type="text/javascript"> test("MyPiwikTest", function() { expect(1); var tracker = Piwik.getTracker(); equal(typeof tracker.trackGoal, 'function', 'trackGoal' ); }); </script> ``` * Fix segment data disappears from UI after changing auto_archive setting (#10078) * refs #10036 fallback to global done flag if plugin done flag has no archive/data * global done flag might be the same as the plugin done flag if no segment and no range is used * installing touch-punch lib to fix dragging issue on mobiles (#10045) * Reset the list of items in the cart, after tracking an ecommerce conversion (#10279) * Clarify that token_auth must be an admin or super user token. * Reset the list of items in the cart, after tracking an ecommerce conversion But do not reset the list of itms in the cart after tracking an ecommerce cart update. fixes #10252 * Adding changelog note `piwik.js`: after an ecommerce order is tracked using `trackEcommerceOrder`, the items in the cart will now be removed from the JavaScript object. Calling `trackEcommerceCartUpdate` will not remove the items in the cart. * Add possibility to specify a custom row aggregation operation (#10288) * Add possibility to specify a custom row aggregation operation * added test for row metadata aggregations * When JSLint validation failed, explain where to find the error messages refs #10271 (comment) * trackLink does not invoke callback if request is done via GET (#10294) * trackLink does not invoke callback if request is done via GET * Update piwik.js * Make JSlint errors output actually useful to developers * JSLint: better error reporting, correct line numbers #7232 * Improved join generation for segments (#10264) * better join generation for segments * do not update visit if there are no values to be updated * Calling an undefined JS API method should not trigger an error in the console (#10176) * When calling a non existing method, do not fail JS execution but print a useful error message in browser console * Fix * Throw an error so developers can see in the browser console * Up to date minified piwik.js * Improve detection for empty keywords by allowing to define that in search engine definitions * updates piwik/searchengine-and-social-list dependency * refs #9984 - adds confirmation when changing segments definition * refs #9984 - adjust UI test to automatically confirm confirmation * better naming * record actual screen resolution when using JS tracker (#10296) * Add current domain to list of domains automatically only if no path is set (#10051) * Do not add current domain automatically if setDomains is called, might also fix a bug in IE11 * this might fix jslint test * added tests and fixed a bug * fix tests * Up to date piwik.js * Minor changes * Document new internal piwik.js change https://github.com/piwik/piwik/pull/10051/files * Add link to multiple tracker docs in changelog. #10174 * make sure we prefer forwarded proto header over regular header (#10081) * 2.16.2-b3 * Fixes #10117 * Fix typo * Refs #10115 * use better message depending of the availability of archiving triggered by browser * add option to hide segment definition change confirm * Add a test for isInsideAnIFrame method in piwik.js (#10300) #10271 #10105 * Searching recursively in sub-tables should work as expected (#10303) * When a datatable is manipulated in memory, the correct id subtable is stored in metadata and must be fetched from there. * piwik.js: Prepend a slash to all paths such as when users don't specify leading slash for URL path (#10302) * Load the JS file in the test where it is used, to prevent errors running tests on IE8 jslint not working on IE8 it seems * Fix JS error when running tests in edge browser * iframe waits for document ready * Fix the JS tests on Chrome * Fix issue IE * Fix for IE9: wait for window.onload * Fixes #10105 - copied from #10271 by @Kijewski (#10306) * submodule updates * Added API method to get a single goal (#10305) * added method to get a goal * always fetch goal directly to not having to iterate over all goals * added new changelog entry for devs * updates phpunit dependency / submodule updates * submodule updates * Use log tables to decide which tables can be purged (#10304) * use log tables to decide which tables can be purged * make sure to return the number of deleted visits * fix tests * UI tests: Log tables purge * added app manifest (#10321) * Fixes #10276 * Update piwik/network component to 0.2.0 release for IPv6 ranges support #9411 (#10324) * language update * Segment editor, updating segments, logic when to display messages (#10322) * Display the right message when segment is set to real time * Correct logic * UI tests: minor messaging change * Submodules including matomo-org/travis-scripts#28 #10323 * Update integration tests for #10276 * Remove comment * submodule update * submodule updates * Php7 compatibility, Pear archive_tar package #10058 * Update submodules * fixes #10310 - use indexOf polyfill function to avoid errors in older IEs (#10327) * pass instance of both row objects to aggregate callback when aggregating them (#10330) * Update composer libraries, while enforcing the minimum php in composer.json (#10334) * Enforce the Minimum PHP version in composer.json using "platform" feature Added in composer/composer@a57c51e It was found that this flag is very important to use to make sure the package we build and release will be compatible with this PHP version. Otherwise, the minimum PHP version enforced would be the version where our build script runs, which could be PHP 5.5 or PHP 7 which would have caused serious issues. * Update doctrine/cache, monolog,piwik/network, piwik/cache, piwik/php-tracker etc. #10058 * Release checklist to check that composer.json is correctly defined * Use latest travis scripts for matomo-org/travis-scripts@30d7b16 * .travis.yml file is out of date, auto-updating .travis.yml file. * Use latest travis scripts for matomo-org/travis-scripts@63445cf * .travis.yml file is out of date, auto-updating .travis.yml file. * Fetch latest referrer spam blacklist refs #10058 * When Piwik config file becomes un-readable, make it clear that the file still exists but is not readable (#10331) * * In messages, make it more clear whether config file exists and/or is (not) readable * When the config file exists but is not readable, provide users the choice between making the file readable, or starting a fresh install * As long as Piwik is not installed, piwik.php should not return 500 when config file is not readable * Fixes #10283 * UI tests: updated message when config file is not found * Update submodules to latest - refs #10058 (#10335) * 2.16.2-b4 * Refs #10331 #10283 * 2.16.2-b5 * Add require_once to prevent build failure * Add missing file * 2.16.2-b6 * Professional Services for Piwik, neutral branding within Piwik app (#10339) Professional Services for Piwik - neutral branding within Piwik app * 2.16.2-rc1 * Update submodule, and fix to update file * 2.16.2-rc2 * UI tests: fixed typo * language update * add possibility to create ui tests for users with view access only * Added Recommended Drivers During Installation (#10346) * Fixed * Made Recommended Changes * Add support section to plugin json template (#10358) * Add support section to plugin json template As suggested by a developer. The Marketplace has been supporting this for quite a while already. * Added support section to example theme plugin json * added wiki section * added wiki section * Adding couple issues to the changelog #10058 * Do not document until finalised #10270 * Update to Device detector 3.7.1 (#10362) * Device detector 3.7.1 * Fix build * 2.16.2 KABOOM * fix #10361 (#10361) Now only the english string is changed, as translations are handled via https://www.transifex.com/piwik/piwik/ * Add pull request template mentioning PRs should be made against 3.x-dev (#10030) * Do not limit password length (#10349) * Disabled Max Password Length Check * Disabled Max Password Length Check * Reset config files * Translation & other fix * Fixed Test Case for Very Long Passwords * UI tests: #10357 * fixes #10374 generate:visualizationplugin was not working * support section is no longer needed * language update * Update global.ini.php (#10408) * submodule updates * Remove again the Hiring file #9814 * Minor Improvement to description * Fix the integration test and patch Zend Validate * on Travis use PHPUnit from composer * Create index.htm files in all tmp/ sub-folder as an additional safety net (#10414) * Create index.htm files in tmp/ folder as safe net * UI test * silent fail * fix unit test * Minor Improvement to description * Fix release checklist * Fix release checklist * UI test * UI test logic * Actually make methods public to keep BC * Fix fatal error when requesting RSS format (#10407) * Fixes #10399 * add test reproducing issue #10399 * another health check * Fix no required Piwik version was defined (#10413) * Fix no required Piwik version was defined * plugins should specify an upper version constraint to the next major version * PHP 7: when there is a fatal error in API request processing, display the original error in the api response (#10406) * PHP 7: when there is a fatal error in API request processing, display the original error * PHP 7 compatibility * Use Piwik 40.0.0 so that test does not need to be updated soon * UI test * UI tests: plugins not compatible with next version are shown * language update * submodule update * Split piwik.js into multiple files and provide a merged one (#10441) refs #6106 * Fixes #10143 (#10424) * Fix issue when running archiving via the old archive.php path Notice: Use of undefined constant PIWIK_DOCUMENT_ROOT - assumed 'PIWIK_DOCUMENT_ROOT' in plugins/CustomPiwikJs/Commands/UpdateTracker.php * fix tests
Implements #9389 - UserId module, that adds a new Users report showing unique user IDs and some aggregated data. Includes reindexing of raw visitors log into aggregated user ids table.