Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to configure simple wildcards in site URLs #16484

Open
anthosz opened this issue Sep 28, 2020 · 11 comments
Open

Add possibility to configure simple wildcards in site URLs #16484

anthosz opened this issue Sep 28, 2020 · 11 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@anthosz
Copy link

anthosz commented Sep 28, 2020

Hello,

I have an issue when I want to use import_logs.py and I check "Only track visits and actions when the action URL starts with one of the above URLs." once I use url like example.com/! (for a shortener url). My goal is to create a website with a report for all url/pages starting with "/!*".

Example:
Url (tried also with https & a * and the end): http://example.com/!

Scenario 1 (doesn't works):
Enabled: Only track visits and actions when the action URL starts with one of the above URLs.
Log:
example.com X.X.X.X [21/Sep/2020:14:30:01 +0200] "GET /!abcd" 200
./import_logs.py --idsite=1 --url='http://example.com/piwik/' --recorders=3 --log-format-regex="(?P<host>\S+) (?P<ip>\S+) \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?)\" (?P<status>\S+)" access.log
-> Nothing new in log_link_visit_action table

Scenario 2 (works):
Disabled: Only track visits and actions when the action URL starts with one of the above URLs.
Log:
example.com X.X.X.X [21/Sep/2020:14:30:01 +0200] "GET /!abcd" 200
./import_logs.py --idsite=1 --url='http://example.com/piwik/' --recorders=3 --log-format-regex="(?P<host>\S+) (?P<ip>\S+) \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?)\" (?P<status>\S+)" access.log
-> New entry in log_link_visit_action table

Scenario 3 (works):
Disabled: Only track visits and actions when the action URL starts with one of the above URLs.
Log:
example.com X.X.X.X [21/Sep/2020:14:30:01 +0200] "GET /!abcd" 200
./import_logs.py --idsite=1 --url='http://example.com/piwik/' --recorders=3 --log-format-regex="(?P<host>\S+) (?P<ip>\S+) \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?)\" (?P<status>\S+)" --hostname=example.com --include-path='/!*' access.log
-> New entry in log_link_visit_action table (so it works if I force the path in import_logs but not in matomo -> I need to launch several time the import_logs in this case)

In this case, my goal is not to use a path separated by slash (/) but by exclamation mark "!".

If you need more informations, doesn't hesitate.

Thank you!

@anthosz anthosz changed the title import_logs.py + url (matomo) including exclamation mark import_logs.py + url (matomo) including exclamation mark = NOK Sep 28, 2020
@anthosz
Copy link
Author

anthosz commented Sep 28, 2020

It seems that indeed, all separators are managed by slash in https://github.com/matomo-org/matomo/blob/3.14.1/plugins/SitesManager/SiteUrls.php

I don't know if you have something like a patch to allow other separator?

@tsteur
Copy link
Member

tsteur commented Sep 28, 2020

@anthosz If I understand things correct what you are after then you're wanting to only match paths where the path starts with /!* vs currently Matomo would only support excluding URLs where the path is `/!/*? Do I understand this right?

This would be kind of on purpose currently if I understand things correctly since for Matomo there's currently no way to differentiate which behaviour someone expects.

@tsteur tsteur added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Sep 28, 2020
@anthosz
Copy link
Author

anthosz commented Sep 28, 2020

@anthosz Yes, that's what I would like, have the possibility to also take into account "/!*"

@anthosz
Copy link
Author

anthosz commented Sep 28, 2020

A simple way can be to compare if import_url (url in log or request) like url* (instead to force url/*) -> use this website
Also add an option to disable this behavior by default (so no impact on existing instance) and allow to enable it on demand

The bonus can be to allow regex in url (not related to this issue but can be usefull if someone want to use another separator (like "/(!|&)") ^^

@tsteur tsteur changed the title import_logs.py + url (matomo) including exclamation mark = NOK Add possibility to configure simple wildcards in site URLs Sep 28, 2020
@tsteur
Copy link
Member

tsteur commented Sep 28, 2020

Thanks @anthosz I've updated the title to make it a bit more clear for us. Generally we would likely only be able to support some simply wildcards like * (if that's even possible) as I think we're sometimes might be using the site URLs also for other purposes maybe. To be checked.

Do I see this right it might already help if the include-path parameter in the log importer would support this in your case( eg include-path='/!*')?

@tsteur tsteur added this to the Backlog (Help wanted) milestone Sep 28, 2020
@anthosz
Copy link
Author

anthosz commented Sep 29, 2020

@tsteur yes and no, currently seems to works if we also specify the site ID but the issue is that in this case, we need to execute multiple time the imports_logs and it is slow (especially when we have more 10 millions of lines of logs to parse and multiples websites)

@Starker3
Copy link
Contributor

We got another request for this feature today.

The user would like to be able to use a wildcard for subdomains, for example:
https://*.example.org instead of having to specify every subdomain individually.

@mhh515
Copy link

mhh515 commented Aug 24, 2022

Maybe even with the ability to use regular expressions, similar to the field "Excluded User Agents".

@anthosz
Copy link
Author

anthosz commented Aug 24, 2022

be patient :)

@Starker3
Copy link
Contributor

We have another request from a Matomo user for this feature today.

@ptemmer
Copy link

ptemmer commented Oct 17, 2023

Hey.
Was support for regular expressions for website URLs added?
My colleague assures me this used to work, however I can't seem to get it going myself, so it would be nice if you could confim.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

5 participants