Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piwiks tracks visited https URLs as http #5312

Closed
anonymous-matomo-user opened this issue Jun 6, 2014 · 19 comments
Closed

Piwiks tracks visited https URLs as http #5312

anonymous-matomo-user opened this issue Jun 6, 2014 · 19 comments
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Milestone

Comments

@anonymous-matomo-user
Copy link

Hi! I'm running Piwik 2.3.0 on Apache HTTP behind an Nginx SSL Proxy. This is working fine and without the issues reported in the forums etc.

I use Piwik to track visits on a few websites that are also secured via SSL. All non-secured requests are redirected by Varnish, which means that no http:// adresses should reach Piwik or the tracked website.

Nevertheless, if an SSL-secured website is visited, Piwik shows the URL as normal http:// in the visitors log and I can't find the cause for this.

I already enabled $GLOBALS['PIWIK_TRACKER_DEBUG'] to have a look at the sent data:

action_name:vanished-site-title
idsite:3
rec:1
r:167255
h:16
m:5
s:28
url:https://www.vanished.site/page/
urlref:https://www.vanished.site/
_id:fa1926e74503fbed
_idts:1402063420
_idvc:1
_idn:0
_refts:0
_viewts:1402063420
pdf:1
qt:1
realp:0
wma:1
dir:0
fla:1
java:1
gears:0
ag:0
cookie:1
res:1920x1080
gt_ms:1336

This clearly shows that https:// was used to visit the page, but the visitor log presents the URL as http://www.vanished.site/page/.

Is this probably a bug or am I missing something here?
Do you need further details?

Thank you!
Keywords: 2.3.0

@mattab
Copy link
Member

mattab commented Jun 7, 2014

This is I think by design because most users want them to be tracked under the same canonical URL.

If you want to know for a pageview whether it was loaded under SSL or not, you could use a Custom Variable of scope "page" (for example track "HTTPS" = "Yes" or "Protocol" = "http/https"). See user guide Custom Variables.

Maybe we could change the Visitor Log and display when HTTPS was used?

@anonymous-matomo-user
Copy link
Author

Thank you, Matt! I'm going to have a look at the plugin and maybe I'll try to adapt it for an HTTP/HTTPS view.

But nevertheless, the site(s) I was talking about have https canonical URLs. I'm wondering why Piwik still thinks that they should be tracked as normal http. There tracked as https "sometimes", like 1 out of 20. As said, these sites (WordPress) have an https base URL and my Varnish ensures that external requests are redirected to the SSL site.

@anonymous-matomo-user anonymous-matomo-user added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014
@mattab mattab removed the P: normal label Aug 3, 2014
@mattab
Copy link
Member

mattab commented Sep 30, 2014

There tracked as https "sometimes", like 1 out of 20.

That's probably why they are shown as HTTP. if 100% of requests are tracked as https it should work HTTPS in links. Cheers

@mattab mattab modified the milestones: Mid term, Long term Oct 11, 2014
@amq
Copy link

amq commented Nov 3, 2014

I am running a site where anonymous users are served HTTP and authenticated HTTPS. Canonical URL is set for every page and it is always HTTP, and I can confirm the '1 out of 20' issue where some seemingly random HTTPS links pop up in Piwik.

@mattab mattab modified the milestones: Mid term, Long term Apr 25, 2015
@mattab
Copy link
Member

mattab commented Apr 25, 2015

This issue was also reported in the forums: http://forum.piwik.org/read.php?2,126198

Piwik incorrectly displays http instead of https

I was stumped about this for a few weeks.

We have re-direct enabled for our site. Anybody that comes into an http page will be sent to https.
Piwik was displaying visitors visiting http versions of pages. It listed http about 50% of the time along with https. We checked our code and our htaccess files and everything seemed fine.

Today, it so happened that an employee was browsing our site and we could see them being fed http versions of the site. Going to their actual browser and checking their browsing history showed NO http pages in their browser only https. Piwik incorrectly displayed http in the Live Visitors view and on the dashboard.

@mattab mattab added Bug For errors / faults / flaws / inconsistencies etc. and removed Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. labels Apr 25, 2015
@mattab mattab modified the milestones: Short term, Mid term Apr 25, 2015
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Apr 25, 2015
@mattab mattab modified the milestones: Mid term, Short term Apr 25, 2015
@Harest
Copy link

Harest commented Feb 22, 2016

Any update on this issue ? Until now i was using self-signed SSL cert and i couldn't force https. I've switched to a CA delivering free SSL certs and i now force https on 1 site.
I'm using nginx to redirect permanently (301) http visitors as long as an HSTS header.

server {
        listen   80;
        server_name  mysite.com www.mysite.com;
        rewrite ^/(.*) https://www.mysite.com/$1 permanent;
}

As far as i can tell, any visit (http or https) at the root of the site (mysite.com/) will be displayed as http in Piwik, but visits (http or https) on a subdirectory (mysite.com/test/) will be displayed correctly as https.

I'm actually using the last stable version of Piwik : 2.16.0.

@codifex
Copy link

codifex commented Aug 26, 2016

I am wondering how a solution to this problem could look like.

As far as I see, for each visited page (not per visit, just per page) an entry in the log_action table is created. The value of the url_prefix column indicates if the prefix of the first visit (?) of the page was http://, http://www., https://, or https://www.. More information: https://developer.piwik.org/guides/persistence-and-the-mysql-backend

Visiting the same page again using another "prefix" does not seem to update this field. I assume that a HTTPS prefix is only stored if the page does not yet have an entry in this table.

Is the information whether or not a certain page has been visited using HTTP or HTTPS stored somewhere else in the database?

The current situation is confusing because the URLs of static pages will never change on many of the websites I maintain and thus the visits to these pages will be shown as HTTP instead of HTTPS. We serve our websites using HTTPS only for some weeks.

@mattab
Copy link
Member

mattab commented Sep 27, 2016

Visiting the same page again using another "prefix" does not seem to update this field. I assume that a HTTPS prefix is only stored if the page does not yet have an entry in this table.

Correct

I am wondering how a solution to this problem could look like.

Me too. One solution could be to have a new Website setting such as This website is served over SSL and when checked/activated, then we'd show all URLs as https in the UI and API.

Maybe there is another better solution as well?

@mattab mattab modified the milestones: Mid term, Long term Sep 27, 2016
@PotcFdk
Copy link

PotcFdk commented Jan 19, 2018

Wouldn't a better solution to this be to have an option to not consider http and https the same thing and instead just show http://example.org/... or https://example.org/... in the visitor log, exactly as it was in the browser?
Saving either http or https in some kind of global place and then applying that to every subsequent visit on that domain sounds very wrong to me, at least in most contexts. You want the visitor log to reflect what page the browser has been on, right?

If there are use cases where you want to group http and https and represent them using a canonical http or https url in the visitor log, you could make it configurable.
What I mean is having three options:

  1. Group http/https into http
  2. Group http/https into https
  3. Leave as-is, show http as http and https as https in the log

@16th-earl
Copy link

Related to mattab's proposed solution… If you've chosen an HTTP or HTTPS URL in the site settings, shouldn't the statistics be presented using that protocol? We had a site that moved from http://example.com to https://example.com about 2 months ago. We have the https://example.com URL in the site settings.

However from the Pages report if we looked at the Segmented Visitor Log it was still showing a mixture of HTTP and HTTPS URLs, even though all URLs are accessed over HTTPS. I tried to fix this by updating the log_action table and changing url_prefix from 0 to 2.

Now all the URLs are consistently HTTPS. However in the heading of the report for the home page it still reads "Visit log showing visits where page URL is http://example.com". I don't know where it's getting this from!

In the page overlay report for the home page it tries to use the HTTP URL as well. This results in a error logged from Javascript: "Found invalid iframe origin in hash URL: http://example.com".

So it looks like the HTTP/HTTPS confusion is breaking the page overlay report as well.

@sgiehl
Copy link
Member

sgiehl commented Jan 26, 2021

@mattab @tsteur what would be the preferred solution for this one? Based on the discussion above I would see the following possibilities:

  • Automatically prefer https over http. That means we update the URL prefix option in the database once https was sent with a url, but http is set in the database
  • Extend the measurable settings, so its possible to define whether https should be preferred over http -> let the user decide which should be used for links in UI (and maybe reports). That would require to automatically check all urls in reports if they need to be changed based on that setting. That would for sure only work for urls that are defined for a measurable. Urls that are not defined, but tracked nevertheless would be untouched and remain as tracked (first).
  • Give the user the possibility to configure if Matomo should differ between http and https. That means they would be treated as different urls. Guess that would be a bit more work, as afaik we would need to adjust all queries to take the url_prefix into account (or remove the usage of url_prefix at all)

Guess we could also think about combining option 1 + 2.

@tsteur
Copy link
Member

tsteur commented Jan 26, 2021

Automatically prefer https over http. That means we update the URL prefix option in the database once https was sent with a url, but http is set in the database

I suppose this could result in a LOT of updates and then when viewing the visitor log it could often change from http to https and back? That's if both http and https is used (like mentioned in comments above) but would work if all traffic is changed from http to https.

Extend the measurable settings, so its possible to define whether https should be preferred over http -> let the user decide which should be used for links in UI (and maybe reports). That would require to automatically check all urls in reports if they need to be changed based on that setting. That would for sure only work for urls that are defined for a measurable. Urls that are not defined, but tracked nevertheless would be untouched and remain as tracked (first).

Thinking that looks a bit complicated for users and would want avoid adding more settings there if any possible.

Give the user the possibility to configure if Matomo should differ between http and https. That means they would be treated as different urls. Guess that would be a bit more work, as afaik we would need to adjust all queries to take the url_prefix into account (or remove the usage of url_prefix at all)

Not 100% sure. I think this be again a bit different issue/feature. Overall there are two different problems in this issue and I'm not sure if we're wanting to fix both:

Feature A: treat http and https URLs as different and track them separately
Bug B: http used to be used, and now https is used. The links in visitor log and reports still go to http.

For Bug B: Some workaround could be checking if a matching site URL is defined in site settings, and for a matching domain checking what protocol is defined. If HTTPS is defined, we prefer using HTTPS. It could be a bit slow though as we have to parse every URL etc when preparing report / visits log. It's also not super user friendly but at least we don't show a setting that won't be needed be most people and it could be explained in an FAQ. I wouldn't really update existing log_action entries when the protocol is changed because existing reports might already include the HTTP anyway etc and I wouldn't mess around with log_action entries as it's hard to revert and there might be duplicates afterwards somehow etc. Could be done though.

For Feature A: I would create an FAQ explaining to track this as a custom dimension as it's very rarely needed. We could add a setting but it would only add more complexity to the code etc when there might be already workarounds possible for custom dimensions depending on the user case. I know it's not crazy difficult to add a setting here, but lots of settings over time make things just harder everywhere and be good to avoid. If there are then still use cases where the custom dimensions aren't good enough we could still see.

@sgiehl
Copy link
Member

sgiehl commented Jan 27, 2021

@tsteur I've created #17151 so at least the urls in visitor log & profile should be automatically changed to https if the https url is configured as site url.
For all action reports that might be a bit more effort. Currently the main site url will be used to build action urls while archiving. So when changing the main url to https new reports should automatically use that for building action urls. Not sure if it's worth to apply a datatable filter to all those reports to adjust the urls if that is not the case...

@mattab mattab modified the milestones: 4.2.0, 4.3.0 Feb 22, 2021
@sgiehl
Copy link
Member

sgiehl commented Apr 15, 2021

@tsteur Shall we apply any other changes for the next release or are the changes in #17151 good enough for now?

@tsteur
Copy link
Member

tsteur commented Apr 15, 2021

@sgiehl I would say it's good enough. For new users it's becoming less and less of an issue anyway since most sites don't use HTTP anymore etc and for existing users it should be mostly fixed by this. The mentioned FAQs would need to be created though and then could close the issue for now 👍

@mattab mattab modified the milestones: 4.3.0, 4.4.0 May 26, 2021
@tsteur
Copy link
Member

tsteur commented Jun 16, 2021

Here we still need to create the FAQ

@tsteur tsteur added the c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. label Jul 26, 2021
@mattab mattab modified the milestones: 4.4.0, 4.5.0 Jul 28, 2021
@justinvelluppillai justinvelluppillai modified the milestones: 4.5.0, 4.6.0 Oct 7, 2021
@tsteur
Copy link
Member

tsteur commented Oct 7, 2021

The FAQ title be like How do I get Matomo to use HTTPS for links to my site instead of HTTP? Where we would then explain to configure a HTTPS URL in the measurable site URLs.

To be double checked / confirmed. Not sure if eg the HTTP URL would need to be removed or not. And to be confirmed if this only works for visits log / profile.

@justinvelluppillai justinvelluppillai self-assigned this Oct 12, 2021
@justinvelluppillai
Copy link
Contributor

The new FAQ explaining to set an HTTPS url in Measruables is here: https://matomo.org/faq/how-to/how-do-i-get-matomo-to-use-https-for-links-to-my-site-instead-of-http/

@tsteur is it needed to create a second FAQ for tracking a custom dimension to treat HTTPS and HTTP URLs as different or can this be closed now?

@tsteur
Copy link
Member

tsteur commented Nov 2, 2021

I guess these days it's not really needed anymore and not needed that FAQ.

btw that FAQ you linked to, the second part is not really clear how to actually do it. It be great to mention each step needed like

  • Go to admin
  • go to Measurables -> Manage (Sometimes UI might be Websites -> Manage)
  • click on edit...
  • In the field ""... add a new line with the https...

@justinvelluppillai justinvelluppillai added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. c: Documentation For issues related to in-app product help messages, or to the Matomo knowledge base. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

No branches or pull requests

10 participants