New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure case insensitivity of the the Page URL segment type #8584
Conversation
Hi @halabuda - thanks for the PR. Fyi, we can't merge this because of performance regression. there is no index on the field I'm not sure how we can fix #8579 edit: maybe we could create the hash based of the lowercase string? |
ah yes i see. i thought about fixing it that way initially but my concern was with preserving the hash field entropy for all the use cases when a case sensitive lookup was desirable. but forcing the hash to lower on insert/select makes sense to me if you don't think my aforementioned concern is a problem. let me know and i would be glad to spend some time on this today. |
It would be nice if we did not have to create the hash based on lowercase as there will be no way back. I mean once we change the URL to lowercase we won't be possible to get the originally used URL at some point. Eg in the UI we would display the URL lowercase as well but this might not be what everyone wants. Also I think we would possibly create many duplicated actions and would have to modify some reports maybe to merge duplicated urls to one url. Otherwise in the report there might appear eg a Also not sure how this would affect |
@tsteur i agree with you. i don't like the "no way back" aspect either. i am confused regarding your comment about duplicated actions. with case sensitive tracking, as it works currently, a new log_action entry is created for each instance of |
Yes, I mean in the |
right, so given my example, 3 unique log_action entries would be created for the same effective URL of what about the possibility of creating a secondary hash/index (ie. log_action.hash_ci)? actions would still be stored against the original hash along with their unmodified URL string, so case sensitivity would be preserved where desired, but then the hash_ci field could be used in certain instances where consolidation is necessary. |
I guess to have only one is kinda desired functionality but it might be problematic as old data is not stored lowercase. This would lead in having duplicated actions which can be bad for performance and we might display wrong data. This needs to be tested. Nonetheless we currently might have already different versions of the same URL in the log_action table. I haven't had a chance to have a look at the code yet. Maybe the hash could be based on lowercase but we store the actually sent URL (how it was sent the first time) to keep having upper case letters in the UI. I wouldn't really feel confident about storing all urls lowercase. In theory one could even serve a different page depending on lower/upper case characters.
|
just to confirm I had a look at the code and tested sending eg 3 different versions of the same URL creates 3 actions in the |
Thanks for the PR but we can't merge it as is. We continue discussion in #8579 |
hello hello! the case sensitivity of the the Page URL segment type was not an issue with the field itself, which actually is of type utf8_general_ci, but rather with the CRC32 hashing of the field. upper/lower case variations of the same fundamental URL will produce unique hashes. this is a simple conditional fix to ignore the hash field when retrieving records for the Page URL segment type.