@diosmosis opened this Issue on January 5th 2021 Member

Currently segments are basically triple url encoded:

  • each value in a condition is encoded once
  • each condition (ie, browserName==abc) is encoded once (so the value is double encoded at this point)
  • and the entire segment is encoded in the URL and matomo uses this value, not the urldecoded one in $_GET/$_POST (so the value is triple encoded at this point)

All three urldecodes happen in PHP when processing a Segment. The system works, but is overly complicated and in some cases, can introduce subtle bugs. For example, based on whether the correct amount of encoding is used or not in a segment string, the hash will change, which means it's possible to use equivalent segments but get no data from the API.

We could fix this by using a different encoding scheme to encode segment condition values, and then only urlencoding the segment in the URL. For example, we could urlencode values then replace % characters w/ $ as is done in datatable row action parameters. Then in PHP we would use the segment value found in $_GET/$_POST w/o issue.

Backwards Compatibility

To do this w/o breaking BC, we'd have to:

  • still support old triple urlencoded segment values in the URL (preferrably by converting these segment parameters to the proper encoding)
  • and still support looking for triple urlencoded segment archives (where the hash will be different than w/ the new encoding) so we don't have to re-archive everything in the past
@diosmosis commented on January 5th 2021 Member

Refs #17029

@tsteur commented on January 10th 2021 Member

@diosmosis what be your time estimate for making this work? I suppose we can't migrate existing segments in the DB because we don't know how often they were encoded? As we still have to support old triple urlencoded segments, what would be the plan to get rid of this so we can benefit from a simpler solution? AKA would we basically simply say in 3-4 years time we no longer support the old way of such definitions? Because people delete their raw data but keep historical data we might lose reports eventually.

@diosmosis commented on January 11th 2021 Member

@tsteur

@diosmosis what be your time estimate for making this work?

I'd say 1-2 days including automated tests. It changes should only be in a couple places.

I suppose we can't migrate existing segments in the DB because we don't know how often they were encoded?

I think the biggest issue w/ archive table segment hashes is that we don't always know what the segments are. Not every segment used will be in the segment table if browser archiving of segments is enabled.

As we still have to support old triple urlencoded segments, what would be the plan to get rid of this so we can benefit from a simpler solution?

Only thing I can think of would be to migrate on demand until matomo 5 then say if it's not migrated already it won't be. Or... we could have an update that does the migration for segments in the segment table, then provide a command to do it for custom segments, where the segment would be specified via a CLI parameter. Then we can have a notification that pops up if using a triple encoded segment + there's no data saying they might need to migrate.

What do you think?

Powered by GitHub Issue Mirror