hello hello! the case sensitivity of the the Page URL segment type was not an issue with the field itself, which actually is of type utf8_general_ci, but rather with the CRC32 hashing of the field. upper/lower case variations of the same fundamental URL will produce unique hashes. this is a simple conditional fix to ignore the hash field when retrieving records for the Page URL segment type.
Hi @halabuda - thanks for the PR.
Fyi, we can't merge this because of performance regression. there is no index on the field
log_action so this change would result in performance slowdown. Only index on the log_action table is:
INDEX index_type_hash (type, hashs)
I'm not sure how we can fix #8579
edit: maybe we could create the hash based of the lowercase string?
ah yes i see. i thought about fixing it that way initially but my concern was with preserving the hash field entropy for all the use cases when a case sensitive lookup was desirable. but forcing the hash to lower on insert/select makes sense to me if you don't think my aforementioned concern is a problem. let me know and i would be glad to spend some time on this today.
It would be nice if we did not have to create the hash based on lowercase as there will be no way back. I mean once we change the URL to lowercase we won't be possible to get the originally used URL at some point. Eg in the UI we would display the URL lowercase as well but this might not be what everyone wants.
Also I think we would possibly create many duplicated actions and would have to modify some reports maybe to merge duplicated urls to one url. Otherwise in the report there might appear eg a
/Test and a
/test when requesting eg week, month, year, range and on the day of the update.
Also not sure how this would affect
label feature etc. Would be nice to find a solution that only modifies the segment query but that's probably not possible. Not sure how to fix
@tsteur i agree with you. i don't like the "no way back" aspect either.
i am confused regarding your comment about duplicated actions. with case sensitive tracking, as it works currently, a new log_action entry is created for each instance of
/FOO. is this what you mean by "duplicated actions"? forcing the URL to lower before insertion would actually consolidate log_action records. whether or not this is desirable from a product design perspective i do not know.
a new log_action entry is created for each instance of /foo /Foo and /FOO. is this what you mean by "duplicated actions"
Yes, I mean in the
log_action table that contains the name of the URL and the hash. I think we usually check whether a given hash already exists and if not, insert a new action
right, so given my example, 3 unique log_action entries would be created for the same effective URL of
/foo because each case sensitive hash is unique, whereas a forced lower hash would consolidate all 3 URLs into one record because they would share a hash value. but which is the desired functionality?
what about the possibility of creating a secondary hash/index (ie. log_action.hash_ci)? actions would still be stored against the original hash along with their unmodified URL string, so case sensitivity would be preserved where desired, but then the hash_ci field could be used in certain instances where consolidation is necessary.
I guess to have only one is kinda desired functionality but it might be problematic as old data is not stored lowercase. This would lead in having duplicated actions which can be bad for performance and we might display wrong data. This needs to be tested. Nonetheless we currently might have already different versions of the same URL in the log_action table. I haven't had a chance to have a look at the code yet.
Maybe the hash could be based on lowercase but we store the actually sent URL (how it was sent the first time) to keep having upper case letters in the UI. I wouldn't really feel confident about storing all urls lowercase. In theory one could even serve a different page depending on lower/upper case characters.
hash_ci could be a solution but would need a lot of time to regenerate the index for all actions, the index size of the DB would get bigger etc. It's probably not realistic to do it
just to confirm I had a look at the code and tested sending eg 3 different versions of the same URL creates 3 actions in the
log_action table which is also kinda right as eg case sensitivity in URL parameters etc could be important I think
Thanks for the PR but we can't merge it as is. We continue discussion in #8579