Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation: make "Contains" operator case insensitive #8579

Closed
mattab opened this issue Aug 14, 2015 · 16 comments
Closed

Segmentation: make "Contains" operator case insensitive #8579

mattab opened this issue Aug 14, 2015 · 16 comments
Labels
c: Usability For issues that let users achieve a defined goal more effectively or efficiently. worksforme The issue cannot be reproduced and things work as intended.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Aug 14, 2015

Currently, when creating a Page URL segment, the URL is match case sensitive. This means that if a user inputs any capital letter in the URL, it will not match the URL and no data will be shown. This is confusing to non technical users who are not aware that URL are case sensitive in Piwik segments.

Suggestions:

  • Could we make the segment match case insensitive?
    • For example `WHERE log_action.name COLLATE UTF8_GENERAL_CI LIKE '%UrlHereWithRandomCase%' may do the trick (source)
  • (if case insensitive matching is not possible, we should otherwise display some kind of notification, in the segment editor, when we detect that URLs contain some capital letters)
@mattab mattab added the c: Usability For issues that let users achieve a defined goal more effectively or efficiently. label Aug 14, 2015
@mattab mattab added this to the 2.15.0 milestone Aug 14, 2015
@mattab
Copy link
Member Author

mattab commented Aug 14, 2015

adding to 2.15.0 as hopefully this is a quick fix (and this missing feature causing much confusion for some users)

@halabuda
Copy link

hello matt. interesting problem presented here. i spent some time this afternoon looking into it. i just submitted a pull request with a simple fix addressing this issue. actually turned out to be a hash issue and not a problem with the log_action.name field itself which was already UTF8_GENERAL_CI.

@tsteur
Copy link
Member

tsteur commented Aug 18, 2015

As kinda mentioned in the PR I wouldn't say it is a bug. I'd maybe rather recommend to write a plugin that transforms all URL's into lowercase during tracking if someone wants this behaviour.

@mattab
Copy link
Member Author

mattab commented Aug 19, 2015

maybe we could assume that "segment contains" implies "case insensitive match" ? ie. when we have "Segment CONTAINS xyz", maybe it's acceptable to include all variations of the xYz xyZ etc.

"Contains" uses the LIKE '%xyz% which maybe could be changed to COLLATE UTF8_GENERAL_CI LIKE to have case insensitivity matching?

@tsteur
Copy link
Member

tsteur commented Aug 19, 2015

Yeah but wasn't the problem that there is no index on that column?

@mattab
Copy link
Member Author

mattab commented Aug 19, 2015

segment CONTAINS uses LIKE and so does not use the index. therefore adding the case insensitivity match, would not change the fact that the "Segment contains" already do not use the index.

@tsteur
Copy link
Member

tsteur commented Aug 19, 2015

And what about other cases? Like "equals"? I'm not sure if it will be intuitive for users that we sometimes do case sensitive and sometimes not. Reckon docs need to be updated and maybe UI improved to explain it. Maybe there could be "Segment contains case insensitive". In some cases someone might still want to use case sensitive but not sure how often...

@tsteur tsteur closed this as completed Aug 19, 2015
@tsteur tsteur reopened this Aug 19, 2015
@mattab
Copy link
Member Author

mattab commented Aug 20, 2015

And what about other cases? Like "equals"?

fyi: for "equals" we cannot implement a "case insensitive" because when the operator is "equals", then we use the "hash" match.

Reckon docs need to be updated and maybe UI improved to explain it.

👍 doc is at: http://developer.piwik.org/api-reference/reporting-api-segmentation

Not sure how we could make it look nice in the UI?

Also, we should change "Does not contain" to be similarly case insensitive.

In some cases someone might still want to use case sensitive but not sure how often...

I suggest we change "Contains" to mean "Contains (case insensitive)" and if some users complain in the future, we could consider adding a new "Contains (case sensitive)" (we would need a new segment operator too...)

@tsteur
Copy link
Member

tsteur commented Aug 21, 2015

fyi: for "equals" we cannot implement a "case insensitive" because when the operator is "equals", then we use the "hash" match.

I know, that why I meant it will be confusing to apply it in some cases and others not.

I suggest we change "Contains" to mean "Contains (case insensitive)" and if some users complain in the future, we could consider adding a new "Contains (case sensitive)" (we would need a new segment operator too...)

For the UI sounds good to me, docs as mentioned need to be updated clearly as well. This is kinda a "breaking change". Need to mention it there as clearly as well which should be easily doable. And we also need to mention that before 2.15 it used to be not case insensitive.

Do I get this right that we will now make all contains case insensitive?

@mattab mattab changed the title Make Segment URL matching case insensitive Make "Contains case insensitive Aug 21, 2015
@mattab mattab changed the title Make "Contains case insensitive Segmentation: make "Contains" operator case insensitive Aug 21, 2015
@mattab
Copy link
Member Author

mattab commented Aug 21, 2015

Do I get this right that we will now make all contains case insensitive?

👍

This is kinda a "breaking change".

We can document this:

  • in the developer changelog,
  • in the developer guide,
  • In the user guide
  • and it will be listed in the changelog via this issue title.

@tsteur tsteur self-assigned this Aug 21, 2015
@tsteur
Copy link
Member

tsteur commented Aug 21, 2015

FYI: At least in my local Piwik it looks like it always was case insensitive already as collation=UTF8_GENERAL_CI is default.n I hope it won't lead to any errors in case database uses different charset eg latin1 etc see https://lalitvc.wordpress.com/2014/10/17/error-collation-utf8_general_ci-is-not-valid-for-character-set-latin1/

Eg it results either in error:

COLLATION 'UTF8_GENERAL_CI' is not valid for CHARACTER SET 'latin1'

or even

COLLATION 'UTF8_GENERAL_CI' is not valid for CHARACTER SET 'binary'

when column is binary (eg config_id etc)

@tsteur
Copy link
Member

tsteur commented Aug 21, 2015

In the past we ran some alter table statements to set UTF8_GENERAL_CI on same columns specifically but reckon this is not an option here as it would take maybe too long...

@tsteur
Copy link
Member

tsteur commented Aug 21, 2015

FYI: When we're setting the charset to UTF8 via set names utf8 we could also set the collation like SET NAMES 'utf8' COLLATE 'utf8_general_ci' (in case charset==utf8)... This would mean it would behave similar for all databases and behaviour would be more predictable similar to #8589 (comment)

@tsteur
Copy link
Member

tsteur commented Aug 21, 2015

I made a change (in https://github.com/piwik/piwik/compare/8579) but I think there's actually nothing to do for contains... I think it always worked case insensitive and we would rather get more errors by merging that change. I think the issue was only about equal comparison etc see #8584 (comment)

@tsteur
Copy link
Member

tsteur commented Aug 21, 2015

As discussed moving issue to 2.15.1 as we cannot reproduce the issue. It seems to be case insensitive already

@mattab
Copy link
Member Author

mattab commented Sep 20, 2015

Closing for now as it seems to work

@mattab mattab closed this as completed Sep 20, 2015
@mattab mattab added the worksforme The issue cannot be reproduced and things work as intended. label Sep 20, 2015
@mattab mattab modified the milestones: 2.15.1, 2.15.0 Oct 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Usability For issues that let users achieve a defined goal more effectively or efficiently. worksforme The issue cannot be reproduced and things work as intended.
Projects
None yet
Development

No branches or pull requests

3 participants