Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store segment hash in DB #17408

Merged
merged 77 commits into from Apr 18, 2021
Merged

store segment hash in DB #17408

merged 77 commits into from Apr 18, 2021

Conversation

flamisz
Copy link
Contributor

@flamisz flamisz commented Mar 30, 2021

Description:

fixes #17301

Review

  • Functional review done
  • Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed

@flamisz flamisz self-assigned this Mar 30, 2021
@flamisz flamisz added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Mar 30, 2021
@flamisz flamisz added this to the 4.3.0 milestone Mar 30, 2021
@flamisz
Copy link
Contributor Author

flamisz commented Mar 30, 2021

Hi @tsteur and @diosmosis,

this is very WIP, but I'd like to know if I go in the right direction.

A couple of questions I have:

  • should I check if the hash column exists or the updater will solve it anyway (I mean it can not happen that the code updated but DB not?)
  • when we create the update file? whenever we need a migration? so this PR should contain a new update file?
  • how we fill the hash column for the existing segments in the db? does that go into the update file?
  • @tsteur mentioned in the issue, ideally we can remove the urlencodes from this PR: https://github.com/matomo-org/matomo/pull/17029/files. can we @diosmosis?

sorry I have too many questions, I just'd like to understand more and do properly. thanks

@diosmosis
Copy link
Member

diosmosis commented Mar 30, 2021

should I check if the hash column exists or the updater will solve it anyway (I mean it can not happen that the code updated but DB not?)

It shouldn't happen that the code is updated but DB is not updated. If it does happen, then it's because of a bug in the update process, or because of user error. So you shouldn't have to check if the added column exists. (There are exceptions in this because sometimes during the one click update, some new code will execute after it is written to the hard disk, while the old database hasn't been updated, but this shouldn't be one of those cases.)

when we create the update file? whenever we need a migration? so this PR should contain a new update file?

Yes 👍 and updates are tied to a version, so first you'd bump the version in Version.php (setting to a new beta or rc based on what's in there already).

how we fill the hash column for the existing segments in the db? does that go into the update file?

Yes 👍 it should be initialized to urlencode($segment['definition']) in the update file.

@tsteur mentioned in the issue, ideally we can remove the urlencodes from this PR: https://github.com/matomo-org/matomo/pull/17029/files. can we @diosmosis?

I think it depends on how this is implemented. To get rid of those urlencodes, in the Segment constructor you'd have to search through every stored segment in the db for a definition that matches, then if found, use the hash in the database.

sorry I have too many questions, I just'd like to understand more and do properly. thanks

from what matt & thomas said, questions are to be encouraged 👍

@diosmosis
Copy link
Member

I guess we don't have docs for writing updates/migrations, I'll put it on my list to try and write them (unless someone gets to them first).

@flamisz flamisz changed the title get segment hash store segment hash in DB Mar 30, 2021
@flamisz flamisz marked this pull request as ready for review March 31, 2021 02:13
@flamisz flamisz added the Needs Review PRs that need a code review label Mar 31, 2021
@flamisz
Copy link
Contributor Author

flamisz commented Mar 31, 2021

@tsteur and @diosmosis I'm wondering what else we need in this PR?
Can we get rid of those urlencode calls from the other PR? I checked that part of the code, and most of the time it gets the hash in the end of the process from the Segment, and now the getSegmentHash method does its best to find it (without anything, with urlencode and with urldecode as well).

$segmentTable = Common::prefixTable('segment');
$segments = Db::fetchAll('SELECT idsegment, definition from ' . $segmentTable);
foreach ($segments as $segment) {
$hash = md5(urlencode($segment['definition']));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be md5($segment['definition']). Apologies, I know I said it needs to be urlencoded, I was referring to Segment construction. The Segment::getSegmentHash() function decodes the definition before md5-ing, so:

$segment = new Segment(urlencode($segment['definition']));
$hash = $segment->getHash();

is the same as

md5(urldecode(urlencode($segment['definition'])))

So if not going through Segment, calling md5 directly works (but only for definitions in the segment table).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis and @tsteur there is an issue with this one.
If we have a segment currently in a db like this actions%3E%3D2, current Matomo version will use the hash of the urlencoded version of it, so when we create the hash here:

$hash = md5($segment['definition']);

it won't be the same as we have in the archive_numeric tables.
One solution would be using $hash = md5(urldecode($segment['definition'])); in the update file.
We don't support double encoded segments in the db, do we? Only in URL/API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The segment condition value is encoded multiple times, not the segment condition operator.

The hash is computed overall with the fully urlencoded segment, because that is what is sent in the segment parameter. The stored segments are created by sending a definition= parameter, which does not have special treatment and gets urldecoded once automatically. So there is a discrepancy with what is in the stored segment table and what is sent as a query parameter (segment=urlencode(pageUrl==urlencode(urlencode(segmentValue))). The hash in the table has to match the hash computed w/ the fully encoded segment. The tests should cover this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash that gets put in the table needs to be the same hash that is created in this test: https://github.com/matomo-org/matomo/blob/4.x-dev/plugins/SegmentEditor/tests/System/ApiTest.php#L21 when run on 4.x-dev.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @diosmosis, much more clear now.

@flamisz flamisz removed the Needs Review PRs that need a code review label Apr 5, 2021
@flamisz flamisz added the Needs Review PRs that need a code review label Apr 6, 2021
Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flamisz looks like that should work 👍 left some minor comments

core/Segment.php Outdated Show resolved Hide resolved
core/Segment.php Outdated Show resolved Hide resolved
core/Segment.php Outdated Show resolved Hide resolved
@flamisz flamisz added Needs Review PRs that need a code review and removed Needs Review PRs that need a code review labels Apr 9, 2021
@flamisz
Copy link
Contributor Author

flamisz commented Apr 11, 2021

Hi @diosmosis, can you please review it again (hopefully last time) and we can merge it then?

@diosmosis
Copy link
Member

When the tests pass, I'll take another look 👍

@flamisz flamisz force-pushed the 17301-store-segment-hash-in-db branch 2 times, most recently from a63912e to 2a1db98 Compare April 13, 2021 02:52
@tsteur
Copy link
Member

tsteur commented Apr 18, 2021

@flamisz @diosmosis was wondering if there's much left in this issue to do? Asking as we'd like to have this issue included in the next release

@diosmosis diosmosis merged commit 7b1b36c into 4.x-dev Apr 18, 2021
@diosmosis diosmosis deleted the 17301-store-segment-hash-in-db branch April 18, 2021 22:18
@diosmosis
Copy link
Member

@tsteur @flamisz merged it

@tsteur tsteur restored the 17301-store-segment-hash-in-db branch April 19, 2021 20:59
@tsteur tsteur deleted the 17301-store-segment-hash-in-db branch April 19, 2021 20:59
@tsteur tsteur restored the 17301-store-segment-hash-in-db branch April 19, 2021 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store the segment hash in the DB table
3 participants