New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the segment hash in the DB table #17301
Comments
We would still need to do the |
I don't know if there's any issue with that but would assume these segments should be encoded the same way maybe as usual but because it's not sent along a URL the outer URL encode would not be needed. Generally, if it's a real time segment or a segment from the config then I suppose we'd need to fallback to the regular hash calculation since there be no other choice pretty much.
You mean during the update/ migration? I assume we'd need to do the urlencode when adding this column from what I understand to make sure we end up having the correct segment hash value. We could write a little test script that checks if this works for some production DBs. Like we could calculate the hash for each segment in memory, and then check if there are any archive tables with that "done" flag etc to make sure we calculate the hash correctly. Not sure if that would help. Then I was hoping something like #17029 (comment) would work |
If it's a custom segment it won't be in the table. If it's real time, and it's in the table then we'd use the hash (to my understanding). The config ones are weird since no one specified exactly how to specify them. I guess we can check if there's an operator in it that isn't encoded (like,
Yes and when adding new segments to the table. This solution seems a bit like taking on tech debt, but it's also very low risk, so it's probably the best short term solution 👍 |
I'm thinking it's actually the best long term solution since you want to be able to change the algorithm for how to calculate the hash at any time (eg different segment string or do it based on parsed segment expression). There is of course still the urlencode/urldecode thing but that's a different problem which we eventually will hopefully tackle with #17050 . |
Well changing the algorithm still means you have to deal w/ old hashes and new hashes, so I'm not sure it will really be aided by a hash table column, where we can put only one value. |
another good benefit is that when the hash is stored in the DB, and we see a query in the processlist, then we can map it more easily to a specific segment https://user-images.githubusercontent.com/273120/110702749-cee7de80-8257-11eb-9ec9-11a7d096fa06.png |
Just to make sure, there is no way today to edit a segment definition and keep historical data in the same report? Gonna follow this issue, because it's a highly requested feature from our users. Br, Jorge |
also another great benefit is that it would allow us to join |
That would still require a like query, correct? Or some set of function calls since the name is in the done%.Plugin format. Or you mean going backwards from hash to definition? Yes that's definitely super useful 👍 |
I haven't thought this through but was hoping that something like |
ideally when this PR is finished we can remove all of the |
@flamisz was about to create a new issue for below comment but then realised it may be better to do it as part of this issue as it helps us validate/check if we fully achieved the wanted outcome. We could still close the issue later and create a separate issue for it if there's trouble. Now that we moved the segment hash into the DB table in #17408 / #17301 it would be great to try and remove all the URL encoded segment constructor calls for the definition that were added in https://github.com/matomo-org/matomo/pull/17029/files#diff-f41c275e883996b8b69cb765b1818e12e82cbe034e648bc3f7d03610cc7abad1R888 There are multiple |
refs #17050 which we're currently still discussing and looking into.
In any case I'm thinking to fix the segment hash problem hopefully once and for all I think we should introduce a new column to the segment DB table that stores the segment hash that will be used when fetching archives etc. This will also make it future proof if we ever change the segment definition and/or if we change the calculation for hashes etc without needing to update any archives etc.
We would basically need an update script that sets the correct hashes for any existing segment. And then adjust https://github.com/matomo-org/matomo/blob/4.2.1/core/Segment.php#L464 to try and find a matching hash from the segment DB table. We might need to encode / decode a segment definition to see if we find a match in the table.
@diosmosis do you think that would be useful? If so I reckon we could start working on this already as it's independent of a decision for #17050 which will probably still need a while (@mattab and I already quickly talked about it but need more time)
The text was updated successfully, but these errors were encountered: