Currently segments are basically triple url encoded:
browserName==abc) is encoded once (so the value is double encoded at this point)
$_GET/$_POST(so the value is triple encoded at this point)
All three urldecodes happen in PHP when processing a Segment. The system works, but is overly complicated and in some cases, can introduce subtle bugs. For example, based on whether the correct amount of encoding is used or not in a segment string, the hash will change, which means it's possible to use equivalent segments but get no data from the API.
We could fix this by using a different encoding scheme to encode segment condition values, and then only urlencoding the segment in the URL. For example, we could urlencode values then replace
% characters w/
$ as is done in datatable row action parameters. Then in PHP we would use the segment value found in
$_GET/$_POST w/o issue.
To do this w/o breaking BC, we'd have to:
@diosmosis what be your time estimate for making this work? I suppose we can't migrate existing segments in the DB because we don't know how often they were encoded? As we still have to support old triple urlencoded segments, what would be the plan to get rid of this so we can benefit from a simpler solution? AKA would we basically simply say in 3-4 years time we no longer support the old way of such definitions? Because people delete their raw data but keep historical data we might lose reports eventually.
@diosmosis what be your time estimate for making this work?
I'd say 1-2 days including automated tests. It changes should only be in a couple places.
I suppose we can't migrate existing segments in the DB because we don't know how often they were encoded?
I think the biggest issue w/ archive table segment hashes is that we don't always know what the segments are. Not every segment used will be in the
segment table if browser archiving of segments is enabled.
As we still have to support old triple urlencoded segments, what would be the plan to get rid of this so we can benefit from a simpler solution?
Only thing I can think of would be to migrate on demand until matomo 5 then say if it's not migrated already it won't be. Or... we could have an update that does the migration for segments in the
segment table, then provide a command to do it for custom segments, where the segment would be specified via a CLI parameter. Then we can have a notification that pops up if using a triple encoded segment + there's no data saying they might need to migrate.
What do you think?