@anonymous-piwik-user opened this Issue on June 6th 2014

Hi! I'm running Piwik 2.3.0 on Apache HTTP behind an Nginx SSL Proxy. This is working fine and without the issues reported in the forums etc.

I use Piwik to track visits on a few websites that are also secured via SSL. All non-secured requests are redirected by Varnish, which means that no http:// adresses should reach Piwik or the tracked website.

Nevertheless, if an SSL-secured website is visited, Piwik shows the URL as normal http:// in the visitors log and I can't find the cause for this.

I already enabled $GLOBALS['PIWIK_TRACKER_DEBUG'] to have a look at the sent data:

action_name:vanished-site-title
idsite:3
rec:1
r:167255
h:16
m:5
s:28
url:https://www.vanished.site/page/
urlref:https://www.vanished.site/
_id:fa1926e74503fbed
_idts:1402063420
_idvc:1
_idn:0
_refts:0
_viewts:1402063420
pdf:1
qt:1
realp:0
wma:1
dir:0
fla:1
java:1
gears:0
ag:0
cookie:1
res:1920x1080
gt_ms:1336

This clearly shows that https:// was used to visit the page, but the visitor log presents the URL as http://www.vanished.site/page/.

Is this probably a bug or am I missing something here?
Do you need further details?

Thank you!
Keywords: 2.3.0

@mattab commented on June 7th 2014 Member

This is I think by design because most users want them to be tracked under the same canonical URL.

If you want to know for a pageview whether it was loaded under SSL or not, you could use a Custom Variable of scope "page" (for example track "HTTPS" = "Yes" or "Protocol" = "http/https"). See user guide Custom Variables.

Maybe we could change the Visitor Log and display when HTTPS was used?

@anonymous-piwik-user commented on June 10th 2014

Thank you, Matt! I'm going to have a look at the plugin and maybe I'll try to adapt it for an HTTP/HTTPS view.

But nevertheless, the site(s) I was talking about have https canonical URLs. I'm wondering why Piwik still thinks that they should be tracked as normal http. There tracked as https "sometimes", like 1 out of 20. As said, these sites (WordPress) have an https base URL and my Varnish ensures that external requests are redirected to the SSL site.

@mattab commented on September 30th 2014 Member

There tracked as https "sometimes", like 1 out of 20.

That's probably why they are shown as HTTP. if 100% of requests are tracked as https it should work HTTPS in links. Cheers

@amq commented on November 3rd 2014

I am running a site where anonymous users are served HTTP and authenticated HTTPS. Canonical URL is set for every page and it is always HTTP, and I can confirm the '1 out of 20' issue where some seemingly random HTTPS links pop up in Piwik.

@mattab commented on April 25th 2015 Member

This issue was also reported in the forums: http://forum.piwik.org/read.php?2,126198

Piwik incorrectly displays http instead of https

I was stumped about this for a few weeks.

We have re-direct enabled for our site. Anybody that comes into an http page will be sent to https.
Piwik was displaying visitors visiting http versions of pages. It listed http about 50% of the time along with https. We checked our code and our htaccess files and everything seemed fine.

Today, it so happened that an employee was browsing our site and we could see them being fed http versions of the site. Going to their actual browser and checking their browsing history showed NO http pages in their browser only https. Piwik incorrectly displayed http in the Live Visitors view and on the dashboard.

@Harest commented on February 22nd 2016

Any update on this issue ? Until now i was using self-signed SSL cert and i couldn't force https. I've switched to a CA delivering free SSL certs and i now force https on 1 site.
I'm using nginx to redirect permanently (301) http visitors as long as an HSTS header.

server {
        listen   80;
        server_name  mysite.com www.mysite.com;
        rewrite ^/(.*) https://www.mysite.com/$1 permanent;
}

As far as i can tell, any visit (http or https) at the root of the site (mysite.com/) will be displayed as http in Piwik, but visits (http or https) on a subdirectory (mysite.com/test/) will be displayed correctly as https.

I'm actually using the last stable version of Piwik : 2.16.0.

@codifex commented on August 26th 2016

I am wondering how a solution to this problem could look like.

As far as I see, for each visited page (not per visit, just per page) an entry in the log_action table is created. The value of the url_prefix column indicates if the prefix of the first visit (?) of the page was http://, http://www., https://, or https://www.. More information: https://developer.piwik.org/guides/persistence-and-the-mysql-backend

Visiting the same page again using another "prefix" does not seem to update this field. I assume that a HTTPS prefix is only stored if the page does not yet have an entry in this table.

Is the information whether or not a certain page has been visited using HTTP or HTTPS stored somewhere else in the database?

The current situation is confusing because the URLs of static pages will never change on many of the websites I maintain and thus the visits to these pages will be shown as HTTP instead of HTTPS. We serve our websites using HTTPS only for some weeks.

@mattab commented on September 27th 2016 Member

Visiting the same page again using another "prefix" does not seem to update this field. I assume that a HTTPS prefix is only stored if the page does not yet have an entry in this table.

Correct

I am wondering how a solution to this problem could look like.

Me too. One solution could be to have a new Website setting such as This website is served over SSL and when checked/activated, then we'd show all URLs as https in the UI and API.

Maybe there is another better solution as well?

@PotcFdk commented on January 19th 2018

Wouldn't a better solution to this be to have an option to not consider http and https the same thing and instead just show http://example.org/... or https://example.org/... in the visitor log, exactly as it was in the browser?
Saving either http or https in some kind of global place and then applying that to every subsequent visit on that domain sounds very wrong to me, at least in most contexts. You want the visitor log to reflect what page the browser has been on, right?

If there are use cases where you want to group http and https and represent them using a canonical http or https url in the visitor log, you could make it configurable.
What I mean is having three options:

  1. Group http/https into http
  2. Group http/https into https
  3. Leave as-is, show http as http and https as https in the log
@16th-earl commented on August 14th 2019

Related to mattab's proposed solution… If you've chosen an HTTP or HTTPS URL in the site settings, shouldn't the statistics be presented using that protocol? We had a site that moved from http://example.com to https://example.com about 2 months ago. We have the https://example.com URL in the site settings.

However from the Pages report if we looked at the Segmented Visitor Log it was still showing a mixture of HTTP and HTTPS URLs, even though all URLs are accessed over HTTPS. I tried to fix this by updating the log_action table and changing url_prefix from 0 to 2.

Now all the URLs are consistently HTTPS. However in the heading of the report for the home page it still reads "Visit log showing visits where page URL is http://example.com". I don't know where it's getting this from!

In the page overlay report for the home page it tries to use the HTTP URL as well. This results in a error logged from Javascript: "Found invalid iframe origin in hash URL: http://example.com".

So it looks like the HTTP/HTTPS confusion is breaking the page overlay report as well.

@sgiehl commented on January 26th 2021 Member

@mattab @tsteur what would be the preferred solution for this one? Based on the discussion above I would see the following possibilities:

  • Automatically prefer https over http. That means we update the URL prefix option in the database once https was sent with a url, but http is set in the database
  • Extend the measurable settings, so its possible to define whether https should be preferred over http -> let the user decide which should be used for links in UI (and maybe reports). That would require to automatically check all urls in reports if they need to be changed based on that setting. That would for sure only work for urls that are defined for a measurable. Urls that are not defined, but tracked nevertheless would be untouched and remain as tracked (first).
  • Give the user the possibility to configure if Matomo should differ between http and https. That means they would be treated as different urls. Guess that would be a bit more work, as afaik we would need to adjust all queries to take the url_prefix into account (or remove the usage of url_prefix at all)

Guess we could also think about combining option 1 + 2.

@tsteur commented on January 26th 2021 Member

Automatically prefer https over http. That means we update the URL prefix option in the database once https was sent with a url, but http is set in the database

I suppose this could result in a LOT of updates and then when viewing the visitor log it could often change from http to https and back? That's if both http and https is used (like mentioned in comments above) but would work if all traffic is changed from http to https.

Extend the measurable settings, so its possible to define whether https should be preferred over http -> let the user decide which should be used for links in UI (and maybe reports). That would require to automatically check all urls in reports if they need to be changed based on that setting. That would for sure only work for urls that are defined for a measurable. Urls that are not defined, but tracked nevertheless would be untouched and remain as tracked (first).

Thinking that looks a bit complicated for users and would want avoid adding more settings there if any possible.

Give the user the possibility to configure if Matomo should differ between http and https. That means they would be treated as different urls. Guess that would be a bit more work, as afaik we would need to adjust all queries to take the url_prefix into account (or remove the usage of url_prefix at all)

Not 100% sure. I think this be again a bit different issue/feature. Overall there are two different problems in this issue and I'm not sure if we're wanting to fix both:

Feature A: treat http and https URLs as different and track them separately
Bug B: http used to be used, and now https is used. The links in visitor log and reports still go to http.

For Bug B: Some workaround could be checking if a matching site URL is defined in site settings, and for a matching domain checking what protocol is defined. If HTTPS is defined, we prefer using HTTPS. It could be a bit slow though as we have to parse every URL etc when preparing report / visits log. It's also not super user friendly but at least we don't show a setting that won't be needed be most people and it could be explained in an FAQ. I wouldn't really update existing log_action entries when the protocol is changed because existing reports might already include the HTTP anyway etc and I wouldn't mess around with log_action entries as it's hard to revert and there might be duplicates afterwards somehow etc. Could be done though.

For Feature A: I would create an FAQ explaining to track this as a custom dimension as it's very rarely needed. We could add a setting but it would only add more complexity to the code etc when there might be already workarounds possible for custom dimensions depending on the user case. I know it's not crazy difficult to add a setting here, but lots of settings over time make things just harder everywhere and be good to avoid. If there are then still use cases where the custom dimensions aren't good enough we could still see.

@sgiehl commented on January 27th 2021 Member

@tsteur I've created #17151 so at least the urls in visitor log & profile should be automatically changed to https if the https url is configured as site url.
For all action reports that might be a bit more effort. Currently the main site url will be used to build action urls while archiving. So when changing the main url to https new reports should automatically use that for building action urls. Not sure if it's worth to apply a datatable filter to all those reports to adjust the urls if that is not the case...

@sgiehl commented on April 15th 2021 Member

@tsteur Shall we apply any other changes for the next release or are the changes in #17151 good enough for now?

@tsteur commented on April 15th 2021 Member

@sgiehl I would say it's good enough. For new users it's becoming less and less of an issue anyway since most sites don't use HTTP anymore etc and for existing users it should be mostly fixed by this. The mentioned FAQs would need to be created though and then could close the issue for now 👍

@tsteur commented on June 16th 2021 Member

Here we still need to create the FAQ

Powered by GitHub Issue Mirror