@claytondaley opened this Pull Request on October 2nd 2013 Contributor

This pull request adds variables, functions, and logic so PiwikTracker can update client cookies and may suffice to meet the request in ticket #3239 (http://dev.piwik.org/trac/ticket/3239). The code was originally derived from the patch submitted (and ultimately not-accepted) in ticket #2699 (http://dev.piwik.org/trac/ticket/2699). Deeper research showed that the logic inside this patch was incomplete and differed greatly from piwik.js. In response, functions in Piwik.js were copied over and translated into PHP, refactoring existing code to leverage these functions.

@claytondaley commented on October 3rd 2013 Contributor

After submitting the pull, I realized that the logical architecture of the original patch (from #2699) and piwik.js were extremely dissimilar. Consequently, I pulled over a bunch of functions from piwik.js and integrated them into PiwikTracking. The pull is much larger as a result, but also more in-line with the rest of the codebase (particularly piwik.js since it provides similar functionality).

The only issue I couldn't immediately resolve is that PHP HTML-encodes the asterisk when setcookie is called. As a result, I cannot exactly reproduce the session cookie used in Java.

@mattab commented on October 13th 2013 Member

Thanks for the PR, very nice!

The only issue I couldn't immediately resolve is that PHP HTML-encodes the asterisk when setcookie is called. As a result, I cannot exactly reproduce the session cookie used in Java.

you can use setrawcookie() for this purpose.

Code Review:

  • nice port of Piwik JS logic to PHP
  • CookieVisitorId should be cookieVisitorId
  • Various values read from 1st party cookie (lastVisitTs, createTs, visitCount, currentTs) are not sent to piwik.php but should be sent if they are found in cookie
  • note to self: we need some tests to make sure this works as expected and will not be broken in the future
    • Test new visit without a cookie: cookies 'ses' and 'id' should be created,
    • Test new page view: the 'ses' cookie should be deleted and 'id' cookie updated
    • Test new visit where the visitor has a cookie created with Piwik.JS: PiwikTracker should read the cookie as expected
  • Code in loadVisitorIdCookie() after // Initialize default values-- should instead be in the constructor
  • it would be awesome to also port the setCookie for the "Custom Variables" visit scope cookies at the same time as setting the 'ses' and 'id' cookie
    • However: the "ref" cookie is out of scope (since the logic to set this cookie is rather complicated and not needed for PHP client)

Cheers

@claytondaley commented on October 14th 2013 Contributor

I should have time to make some of the smaller changes (e.g. 2nd, 3rd bullets). What's the easiest way to contribute them without disrupting your process? Should I add commits here? Leave this version as-is and submit another patch once it's accepted?

I'd also like to raise some issues around your 5th bullet:

  • While I plan to use PiwikTracker as a direct substitute for piwik.js (in a specialized case), the PHP code could be run without a client (e.g. by a CRON job). In a scenario like this, reading and creating Cookies may not necessary or appropriate. I can't imagine a similar situation with piwik.js. This may justify slightly different logic around cookies, particularly whether they're hard-coded in a constructor.
  • PiwikTracker (pre-pull) did not use the Visitor ID from the cookie by default. The constructor creates a new, random cookie using setNewVisitorID(). A user had to run setVisitorID(getVisitorID()) to load the cookie. I attempted to maximize backwards compatibility and am unclear on your attitude in this regard.
  • There is a very weird issue (I think bug) with VisitorID priority. getVisitorId() has always returned the first of (1) forcedVistorId, set by setVisitorID, (2) the Visitor ID from the cookie, and (3) a new VisitorID created in the constructor and stored in visitorId. However, setNewVisitorID() only changes the visitorID in (3) without resetting (1) or (2). I didn't mess with it (again backwards compatibility concenrs), but it literally doesn't do anything if you've already run setVisitorID (pre-pull) or the cookie-creation code (post-pull).
@mattab commented on October 14th 2013 Member

Should I add commits here?

this would be the best

In a scenario like this, reading and creating Cookies may not necessary or appropriate.

it is OK to set the cookies even when it does not necessarily make sense (cron etc.). It is OK because it cannot cause bugs.

PiwikTracker (pre-pull) did not use the Visitor ID from the cookie by default. The constructor creates a new, random cookie using setNewVisitorID(). A user had to run setVisitorID(getVisitorID()) to load the cookie. I attempted to maximize backwards compatibility and am unclear on your attitude in this regard.

Good point! I fixed it separately in: https://github.com/piwik/piwik/commit/def6da2ee5deaa600bbaae004c3e3f5db348b61e

There is a very weird issue

See previous commit + this one:
https://github.com/piwik/piwik/commit/fbe7b19b0841add1d2f16971a1df7bf6acbc119e

Does it fixes the weirdness?

@claytondaley commented on October 14th 2013 Contributor

I have two observations that interact with one another in this part of the code so I have to bring them up together -- please bear with me.

  1. Pre-pull and pre-aforementioned commits, the cid= and _id= distinction were largely spurious. A programmer could just as easily write $instance->visitorId = <manual id> and it would show up as an _id. Does the distinction carry any real weight?
  2. Separately, I see that a CRON job will not provide cookies by default. Are we confident there aren't other scenario (e.g. a process initiated by a 3rd party web server or an admin user) where a cookie could exist on that 3rd party system and be mistakenly injected into the PiwikTracker process?

These two observations come together in the code (post your commits), which makes it very difficult to set the ID to a random (valid) ID if a user has a cookie. Specifically,

  • getRequest() is now configured to use getVisitorId()
  • getVisitorId() always prioritizes the cookie over the newly generated visitorId
  • setNewVisitorId() doesn't prevent the cookie from being used

With respect to my first question, this helps enforce an _id vs. cid distinction. The only way to override a cookie is to call $instance->setVisitorId(), which results in a forcedVisitorId and a cid. To get a new ID, the call is actually $instance->setVisitorId($instance->visitorId). I would clean this syntax up (e.g. by changing how setNewVisitorId() was implemented), but it definitely forces the programmer to set a cid if a cookie exists.

With respect to my second question, this could be undesirable. If the cookie is ever wrongly-injected, we're forced to send a cid when it'd be more accurate to send an id_. Without a better understanding of the difference on the back-end, however, this seems fine since a process initiated by anyone besides the user will probably need (and have) an auth_token anyway (for example, to set the IP).

An alternative is to:

  • Adjust loadVisitorIdCookie() (in the pull request) to use $this->visitorId instead of $this->CookieVisitorId
  • Put loadVisitorIdCookie() in the constructor
  • Expand setNewVisitorId() (or add a function) to clear all values that could have been loaded by the cookies and then set $this->visitorId to a new ID. This way a programmer can always call this function to make sure they're using a "clean" (no cookie data) instance of PiwikTracker.
  • Revert getRequest() to reference $this->visitorId so it automatically uses the cookie or setNewVisitorId() -- whichever was called most recently -- as an _id.

I don't know enough about the cid vs _id distinction to have a preference, but I've spent a bunch of time in the code recently so I wanted to draw your attention to the potential consequences of these changes.

@mattab commented on October 14th 2013 Member

Pre-pull and pre-aforementioned commits, the cid= and _id= distinction were largely spurious. A programmer could just as easily write $instance->visitorId = and it would show up as an _id. Does the distinction carry any real weight?

Yes cid forces the visitor to use set ID and requires token_auth
See doc http://piwik.org/docs/tracking-api/reference/

cid — (requires token_auth to be set) defines the visitor ID for this request. You must set this value to exactly a 16 character hexadecimal string (containing only characters 01234567890abcdefABCDEF). When specified, the Visitor ID will be “enforced”. This means that if there is no recent visit with this visitor ID, a new one will be created. If a visit is found in the last 30 minutes with your specified Visitor Id, then the new action will be recorded to this existing visit.

whereas _id is an indication of the ID as read by cookies.

Are we confident there aren't other scenario (e.g. a process initiated by a 3rd party web server or an admin user) where a cookie could exist on that 3rd party system and be mistakenly injected into the PiwikTracker process?

yes

Btw I agree the visitor ID situation could be a bit clearer. I think this ticket will help: http://dev.piwik.org/trac/ticket/3490

I don't know enough about the cid vs _id distinction to have a preference, but I've spent a bunch of time in the code recently so I wanted to draw your attention to the potential consequences of these changes.

cid requires token_auth so is something that few people may use. ID read from cookies are passed with _id

@claytondaley commented on October 15th 2013 Contributor

Rebased the pull so I have access to your commits. Added commits to address the session cookie issue (setrawcookie) as well as bullets 2 (cookieVisitorId), 3 (values read from 1st party cookie), and 5 (default values... in the constructor).

I also added code to set the custom variable cookie, but don't have a specific use-case for Custom Variables to do any real testing. I did note some differences between the PiwikTracker implementation and the piwik.js implementation. Specifically, the JS setCustomVariable() calls loadCustomVariables() to get the cookie. The PHP setCustomVariable() does not (although getCustomVariable() does). I haven't used or studied Custom Variables so I'm not comfortable making changes to this part of the code, but it might warrant a second look.

@mattab commented on October 24th 2013 Member

Thanks for PR! I improved/tidied and merged as part of new ticket http://dev.piwik.org/trac/ticket/4239#ticket

This Pull Request was closed on October 24th 2013
Powered by GitHub Issue Mirror