Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track seconds instead of days for "days since" dimensions #15774

Merged
merged 65 commits into from Jun 26, 2020

Conversation

diosmosis
Copy link
Member

Fix #14094

@diosmosis diosmosis added the Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. label Apr 3, 2020
@diosmosis diosmosis added this to the 4.0.0 milestone Apr 3, 2020
@@ -109,6 +111,18 @@ public function getMigrations(Updater $updater)
// remove old options
$migrations[] = $this->migration->db->sql('DELETE FROM `' . Common::prefixTable('option') . '` WHERE option_name IN ("geoip.updater_period", "geoip.loc_db_url", "geoip.isp_db_url", "geoip.org_db_url")');

// replace days to ... dimensions w/ segments
foreach (['log_visit', 'log_conversion'] as $table) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we should do this all in one query and not in multiple queries so we group all log_visit and all log_conversion update queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one UPDATE? sure, makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis are these ones actually needed? I would assume Matomo installs the columns automatically (which is also why you didn't need to add it for new installs to the mysql schema)?
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait I see it is needed for the update... statement, makes sense... Could technically also record that the component was updated so Matomo won't try to update the columns

image

like insert into option values option_value = $columnType, option_name = version_log_visit_ visitor_seconds_since_first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the desired approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think the dimensions end up getting added after the main update, so we wouldn't be able to initialize the new dimension values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's what I figured too. It's needed to add them here already. Be just good to additionally record the component version so once the DB update completed it won't try to change the type of these columns but it might not be too important since it would get the same type anyway and therefore mysql wouldn't do anything.

@mattab
Copy link
Member

mattab commented Apr 8, 2020

Test case:

  • If custom reports are using one of these dimensions we now remove, can you test it and check what will happen? will the core:archive fail, or will the "bogus" custom report be skipped and disabled automatically or so?

@diosmosis
Copy link
Member Author

@mattab 👍 if possible will try to make it work in that case

@diosmosis
Copy link
Member Author

@mattab it looks like if one of those dimensions is used, it is ignored and effectively removed from the list. I was hoping to be able to add the dimension back as a view on the seconds one, but it doesn't seem like it's possible to have a dimension that is a "view" of another. I also noticed w/o it, the days dimension disappears and only the seconds one is there. This seems like it would be a problem for most users since seconds is harder to use when they want days. @tsteur can you think of a way around this? Is there a way to add a dimension class w/o it being added to a table but still having an associated column so customreports can aggregate on it?

@tsteur
Copy link
Member

tsteur commented Apr 8, 2020

@diosmosis Custom reports uses as unique ID the dimension id from Dimension::getId() which is by default the PluginName.DimensionClassName. Eg Goals.DaysToConversion.

So it could help to recreate the old dimension classes. The SQL segment in that class would be something like

class DaysSince...Dimension {
    protected $type = self::TYPE_NUMBER;
    protected $sqlSegment = 'ROUND(log_visit.days_since_last_visit / 86400)';
    protected $nameSingular = 'General_DaysSinceFirstVisit';
}

you could use the same class to add back the previous segments which you are currently adding in addSegments manually.

class VisitorDaysSinceFirst {
    protected $columnName = 'visitor_days_since_first';
    protected $segmentName = 'daysSinceFirstVisit';
    protected $type = self::TYPE_NUMBER;
    protected $sqlSegment = 'ROUND(log_visit.days_since_last_visit / 86400)';
    protected $nameSingular = 'General_DaysSinceFirstVisit';
}

I think this should the trick. The seconds since first XYZ we will later need to disable in Custom Reports since they would be generating too many rows and isn't really useful for Custom Reports and could cause significant performance problems

@mattab
Copy link
Member

mattab commented Apr 8, 2020

we will later need to disable in Custom Reports

Btw if we could deal with this in general that'd be great. For example a couple of times before we had the same issue as here: a user/customer used to have a plugin (for example Provider) and some custom reports used the dimension in that plugin (eg. Provider name) and then the plugin is deactivated and the archiving then fails/errors. The fix then is to delete custom reports in the DB that contain the dimension (which is not best user experience and requires support assistance)

@diosmosis diosmosis added Needs Review PRs that need a code review and removed Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. labels Apr 18, 2020
@diosmosis
Copy link
Member Author

@tsteur applied review comments. There's still a failing test for the 1.5 update change. Let me know if you think that's a mistake on the tests or not.

Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now what the problem was re the 1.5 updates @diosmosis . I reckon we could change the update script to simply not specify the third parameter in the addColumn/addColumns methods where it currently sets visitor_days_since_last/visitor_days_since_first. The order shouldn't matter so we can simply remove the parameter.

Alternatively, I was thinking we could make the AddColumns smarter to automatically add it to the end if that specified column no longer exists because the order should never really matter but shouldn't be needed and maybe there are cases where for some reason the order does matter (although I can't imagine that). So for now could just remove the third parameter in those two places.

@@ -128,14 +127,13 @@ public function findKnownVisitor($configId, VisitProperties $visitProperties, Re
}
}

public function removeUnchangedValues(VisitProperties $visitProperties, $visit)
public function removeUnchangedValues(VisitProperties $visitProperties, $visit, VisitProperties $originalVisit = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems $visitProperties is no longer used.

}

// remove old days_to_... columns
$migrations[] = $this->migration->db->dropColumn('log_visit', 'visitor_seconds_since_first');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should visitor_seconds_since_first really be removed?

@@ -3364,10 +3356,7 @@ if (typeof window.Piwik !== 'object') {

var cookieValue = visitorIdCookieValues.uuid + '.' +
visitorIdCookieValues.createTs + '.' +
visitorIdCookieValues.visitCount + '.' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis just checking whether you checked regarding https://github.com/matomo-org/matomo/pull/15774/files#r423417650 because now nowTs is in third position vs before existing cookies would have stored the visitCount there? Not sure if we need to check in getValuesFromVisitorIdCookie whether the value looks more like a timestamp?

Also I reckon in getValuesFromVisitorIdCookie() the array indexes 4 and 6 might not always be defined?
image

The currentVisitTs seems no longer used in general?

@diosmosis
Copy link
Member Author

Not sure if we need to check in getValuesFromVisitorIdCookie whether the value looks more like a timestamp?

Forgot about this, added placeholder values back in, seems like the simplest approach.

@diosmosis
Copy link
Member Author

The currentVisitTs seems no longer used in general?

No it's not, looks like we can simplify things even more here. Don't need the old cookie values now since they are all unused.

@tsteur
Copy link
Member

tsteur commented Jun 24, 2020

@diosmosis is there much left to do in this PR?

@diosmosis
Copy link
Member Author

@tsteur this was ready for another review. I must've forgotten to ping you.

@tsteur
Copy link
Member

tsteur commented Jun 24, 2020

Looks good @diosmosis

@diosmosis
Copy link
Member Author

build js

@diosmosis
Copy link
Member Author

@tsteur / @sgiehl I updated OmniFixture.sql to have a lower version so the update would run on it fully. This, strangely, causes the ip addresses displayed in the visitor log to change to 0.0.0.0. I looked in the SQL file, and that's the correct value. When I look in the db for the branch and in 4.x-dev, the ip address for the visits shown are 0.0.0.0. But for some reason on 4.x-dev, a completely different ip address is shown... because of the update being run I guess... not sure what the issue is, but I'm merging this. Can fix the test if anyone knows what's going on.

@diosmosis diosmosis merged commit de5ae85 into 4.x-dev Jun 26, 2020
@diosmosis diosmosis deleted the 14094-time-dimensions-counter branch June 26, 2020 13:17
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change "days since last ..." dimensions to save second duration instead of days and compute in PHP
3 participants