Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline tracking in JavaScript API #9939

Closed
PCSun1987 opened this issue Mar 17, 2016 · 55 comments
Closed

Offline tracking in JavaScript API #9939

PCSun1987 opened this issue Mar 17, 2016 · 55 comments
Assignees
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@PCSun1987
Copy link

Current it's not possible to send tracking event offline.

So one idea would be to extend PIWIK tracking API especially for JavaScript with additional parameter about even time-stamp (optional), so inside JS, you can keep the event locally and once connecting to internet, send all the events with the event happening time.

For basic tracking, probably can do similar way.

https://forum.piwik.org/t/does-piwik-work-even-your-offline/7295/9

@hpvd
Copy link

hpvd commented Mar 17, 2016

+1
would be a good step on way to universal tracker (Piwik 3.0).
It's not only intersting for apps which have temporaly no connection to web,
but also a good thing to make the manual input of "real world" events after they have happened possible and give them the correct place in timeline

@tsteur tsteur added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Mar 28, 2016
@tsteur
Copy link
Member

tsteur commented Mar 28, 2016

This would be a nice feature indeed

@PCSun1987
Copy link
Author

So...how long we would plan to have this functionality? E.g. 3.0 would be released when?

@tsteur
Copy link
Member

tsteur commented Mar 29, 2016

Piwik 3.0 would be in about a year but this feature is not planned yet. Pull request or suggestions on how to implement it are always welcome 👍

@mattab mattab added this to the Long term milestone Mar 31, 2016
@ghost
Copy link

ghost commented Nov 3, 2016

Just looking for such a solution. This is tremendously important for mobile apps.

In the JS tracking client, I have noticed a method called retryMissedPluginCalls() and the array missedPluginTrackerCalls. It could be interesting to hook it in some way in order to intercept the calls to the server in an offline state. Then, upon "online" event, we would call and retry missed calls.

Your thoughts, @tsteur and @mattab ?

@tsteur
Copy link
Member

tsteur commented Nov 3, 2016

retryMissedPluginCalls() is actually a bit different here. Plugins can extend the Piwik JS tracker and there may be cases where either Piwik is loaded first, or the Plugin. If Piwik was loaded first and tries to apply all _paq.push calls, it cannot call the methods for the plugins yet as they are not yet loaded. Therefore once the plugin is loaded they try to call all missed plugin calls again.

Offline tracking is super important nowadays for mobile apps, progressive web apps, .... If someone wanted to work on it I'm happy to give some support. I think it needs to be worked out what the best place is to save requests that couldn't be sent because the user is offline (eg localstorage, ...) and then we need to detect whether user is offline and when user goes online again. Some browsers have an API for that.

There might be one problem when tracking the requests later. I think by default Piwik lets you only track requests up to 4 hours in the past without needing an authentication token. The time is customizable AFAIK but it might be something to take into account that requests older than 4 hours may have to be invalidated.

@ghost
Copy link

ghost commented Nov 4, 2016

Thank you for your prompt reply.

I need to find some solution for this rather promptly, whether using Piwik or something else. Ideally the former though, due to its ability to easily deal with hybrid applications.
¬

retryMissedPluginCalls()

OK, I see. No problem.
¬

I think it needs to be worked out what the best place is to save requests that couldn't be sent

Well, I don't think that this is so important. It should be left open to the developer to decide what to plug into the system. Everyone might have different preferences.
¬

then we need to detect whether user is offline and when user goes online again

Ditto. Developer should supply this to make it easy for Piwik.
Piwik should only supply the mechanism for giving up the "send" and for having the ability to submit it later. This is what I need to find out now.

What I mean by this is that there should be methods to call in Piwik that let it know that "now it is necessary stop sending tracking data and save it instead" and "now resume sending and send what has been stored".

By the way, the "stop sending" and "resume sending" functionality is already working now.

  • Upon getting offline, I stop calling trackPageView(), trackEvent(), etc.
  • And upon getting online, I start calling them.

So that part is simple.

Even calling the JS tracking client (or loading it locally) is easy to solve upon detection of online/offline events. I already have this part solved.

Thus the only remaining thing is "how to store the data, so that Piwik would know when those tracking event happened, so that it would be possible to reconstruct the past sequence correctly upon delayed sending".

I need your or Matthieu's input on this, as you understand the existing code base and its functionality. (Thank you in advance.)
¬

I think by default Piwik lets you only track requests up to 4 hours in the past without needing an authentication token.

I don't understand this one. What kind of a token?
When you are submitting old data, you are still submitting it with the session that is currently active, aren't you?

The point is that the device might be offline for one week or a month. This means that we just need to keep storing the tracking data with original timestamps and simply submit it in a sequence, when the device comes online.

This means that we won't have such data available for analysis immediately, but only eventually – yet it is still important and better than not having it at all.

The question is to what extent the current system could support this scenario without major code changes.
Is it possible to hook some existing sub-systems?

@ghost
Copy link

ghost commented Nov 4, 2016

Updated the previous comment. --

@ghost
Copy link

ghost commented Nov 4, 2016

@tsteur
This is actually an interesting notion that you mentioned... about the plugins for JS tracking client.

It is possible to write such an "offline tracking" solution as a plugin?

If yes, I could look into this ASAP, if I am given a guidance on how such plugins are written. Thanks.

@tsteur
Copy link
Member

tsteur commented Nov 4, 2016

With token I mean you need an authentication token in this case. It is actually hard coded that when you want to track a request that is older than 4 hours you need to authenticate see https://github.com/piwik/piwik/blob/2.17.0/core/Tracker/Request.php#L467-L474 . This is for some security reasons eg you could otherwise track into any Piwik instance data in the past etc. This token can be disabled though here https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L696

I feel like we could maybe add an API to the tracker like setUserOffline() in which we overwrite the internal method sendRequest to add such requests to an array instead. A developer could eg pass some kind of storage class for us to add the request like {addRequest: function (request) {}} and the developer could this way decide what to do with it.

When user becomes online, the developer could call eg a method tracker.setUserOnline(storedRequests) and we (Piwik tracker) would try to re-send these requests in bulk. However, there is this 4 hour problem as described currently.

It could be probably written as a plugin, but this API is not yet official and is undocumented and we would for sure need to add some methods to the tracker. Adding those methods to tracker could be done quickly though. I'll show you rough idea in a bit without thinking too much about it

@tsteur
Copy link
Member

tsteur commented Nov 4, 2016

This could be rough idea:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1

Developer would do

tracker.setUserOffline({push: function (request) {
    // eg localstorage.addItem(request);
}})
tracker.setUserOnline(localstorage.getItems()})

@tsteur
Copy link
Member

tsteur commented Nov 4, 2016

Eventually Piwik would ideally detect offline status itself and store it somewhere.

The biggest problem remains the Piwik backend re the 4 hours in past only

@ghost
Copy link

ghost commented Nov 4, 2016

Thank you.
The code looks reasonable to me.
I would just change the initialisation of the configOfflineStorage to an object instead of an array: configOfflineStorage = {}; on the following line:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1#diff-1279d666063b65e6d6777f902d11574fR3085

@tsteur
Copy link
Member

tsteur commented Nov 4, 2016

I made in an array because an array has out of the box a push method. This way it will be easy for us to add tests for it. A developer would for now set a custom offline storage that is an object with a push method.

I renamed the user term to visitor as Piwik usually uses the term Piwik. Do you think you could work with something like this? Maybe some background would be good as well. Are you developing a mobile app eg via phonegap? mobile web app?

@mattab do you have any thoughts on this?

@ghost
Copy link

ghost commented Nov 4, 2016

array, push method

OK, I understand your point.
¬

I renamed the user term to visitor as Piwik usually uses the term.

Makes sense.
¬

Do you think you could work with something like this?

Absolutely. Looks very good to me. Simple and effective.
¬

Maybe some background would be good as well. Are you developing a mobile app eg via phonegap? mobile web app?

Cordova/Tizen + plain JS + HTML + CSS.
Multiplatform (Android, Amazon, AmigoOS, Blackberry10, iOS, Tizen, Windows).
¬

The biggest problem remains the Piwik backend re the 4 hours in past only

I have just cloned the Piwik repository and I am going to look into the reasoning for this limitation...

@ghost
Copy link

ghost commented Nov 4, 2016

@tsteur

I have just found this:
https://github.com/piwik/piwik/blob/3.x-dev/core/Tracker.php#L256-L260

It looks like the bulk submission automatically bypasses the authentification.

Am I right in the assumption that it applies to our case?

If yes, the whole problem will have been solved tonight. ;-)

@ghost
Copy link

ghost commented Nov 4, 2016

If not, what is the "bulk request" then?

@ghost
Copy link

ghost commented Nov 4, 2016

Oops!
I have noticed only now that those lines are within the setTestEnvironment() function.

@ghost
Copy link

ghost commented Nov 4, 2016

However, I have found this:

; Whether Bulk tracking requests to the Tracking API requires the token_auth to be set.
bulk_requests_require_authentication = 0

https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L684-L685

Could this fit our needs?

@ghost
Copy link

ghost commented Nov 4, 2016

@tsteur
Could it be that you have wrong time on your computer? Or something like that?
Have a look here:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1
Your today's commits during our communication on Nov 4, 2016 appear to be made on Nov 2, 2016, i.e. two days ago(!). That's odd.

@ghost
Copy link

ghost commented Nov 4, 2016

On top of that, you are changing the version of JS Tracking Client that lacks some code related to configIdPageView, which is present in Piwik 2.17.0.

Thus you have effectively overwritten its declaration on the line 3084.

@ghost
Copy link

ghost commented Nov 4, 2016

Also, the semicolon at the end of the line 3084 should be a comma, as the declaration of variables continues on the next line:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1#diff-1279d666063b65e6d6777f902d11574fR3084

@tsteur
Copy link
Member

tsteur commented Nov 4, 2016

Yeah the time in my virtual machine sometimes gets wrong :)

The config you mentioned only applies to bulk requests in general. Not to the recording records in past. For this tracking_requests_require_authentication would need to be set to "1" see https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L696

Regarding idpageview I think you are currently looking at Piwik 3 (not released yet), the change was made on Piwik 2 (branch 2.x-dev).

@ghost
Copy link

ghost commented Nov 4, 2016

Regarding idpageview I think you are currently looking at Piwik 3 (not released yet), the change was made on Piwik 2 (branch 2.x-dev).

Yes, I realised this. Therefore it looks like that 3.0-dev wasn't updated to the latest 2.17 or 2.x-dev.
¬

The config you mentioned only applies to bulk requests in general. Not to the recording records in past.

Oh, that's a pity... :-\

@mattab
Copy link
Member

mattab commented Nov 12, 2016

This could be rough idea: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1

Developer would do

tracker.setUserOffline({push: function (request) {
// eg localstorage.addItem(request);
}})

tracker.setUserOnline(localstorage.getItems()})

This looks great @tsteur ! I'd vote for inclusion in Piwik 3 as a rather powerful new feature once tested & documented. Will help tons of people and make Piwik more resilient!

@mattab mattab modified the milestones: 3.0.0-b4, Long term Nov 12, 2016
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Nov 12, 2016
@BaronBonet
Copy link

Would also like to bump this up. What work needs to be done on the JS end? I could take a crack at it.

@tsteur
Copy link
Member

tsteur commented Jun 18, 2019

I suppose you would need to store the requests eg in a local storage or so and replay them at a later time when the internet connection is back. Not sure if much else needs to be done.

@bdurrer
Copy link

bdurrer commented Jun 27, 2019

bugsnag.com (a bug/log tracking solution) does exactly this to report problems which occured when offline.
This is a must have feature for PWAs, but I guess we could easily implement it ourself using the localStorage and the Tracking API. Would be nice when the JS client already had these capabilities

@tsteur
Copy link
Member

tsteur commented Jun 27, 2019

Totally agreed @bdurrer it's a must have nowadays. @mattab could maybe schedule it for Matomo 4?

@bdurrer or if anyone else could contribute to this we're happy to help.

@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Jun 27, 2019
@mattab mattab changed the title PIWIK JavaScript API enhancement - Offline tracking Offline tracking in JavaScript API Jan 10, 2020
@saifeer
Copy link

saifeer commented Feb 12, 2020

@tsteur, @mattab, IMHO offline storage should be added using service workers (SW). This is the most logical way forward for adding any offline capability. What should be done is:

  • An optional SW is added to the JS tracking client that can be included by PWA developers if they would like offline capability.

  • The SW does 2 things:
    It caches the Matomo JS itself so it can be served on subsequent requests if there is no internet connection
    It listens for all calls to the analytics url. If the calls are successful, nothing more is done else it saves the request in the cache and tries again later (the retry logic can be easily worked out)

Also, the SW should forward all requests irrespective of the time passed since caching the event to the server (in the correct sequence of course). The server can then decide if it wants to accept the request and process it or reject the request and discard the data. This way, in the future, if the server code was changed to allow events older than 4h, it would be transparent to the client.

Lastly, the SW approach means that the JS client itself is completely oblivious to the SW and works regardless of connectivity. The SW is API agnostic as it only saves request without manipulation and this way the offline capability is completely decoupled from the JS client and server.

Its worth noting that GA also uses a similar mechanism to add offline capability to their framework.

@tsteur
Copy link
Member

tsteur commented Feb 12, 2020

Thanks @saifeer very much appreciated 👍

@Maqsyo
Copy link

Maqsyo commented Mar 11, 2020

hope this feature will exists someday

@mattab
Copy link
Member

mattab commented Apr 23, 2020

When we implement this, how will we deal with the fact that currently Tracking API requests are only allowed up to 24 hours in the past?
Refs the setting in config.ini.php under [General] (default is 1 day):

[General]
tracking_requests_require_authentication_when_custom_timestamp_newer_than = 86400;

will it be "acceptable" to drop requests made more than 1 day ago by default, and explain users they can increase the setting in the config? or we'd make this a UI general or per-site setting, or some other solution?

@tsteur
Copy link
Member

tsteur commented Apr 23, 2020

Yes, the idea is that this is acceptable and it's always possible to change it.

@bdurrer
Copy link

bdurrer commented Apr 23, 2020

Learning from others, you'll want to introduce an maximum amount of held back messages so it does not flood local storage or post big payloads

@tsteur
Copy link
Member

tsteur commented May 20, 2020

BTW did we maybe also want a new dimension whether a request was executed online vs offline? I reckon might be separate feature but to be seen. Might create issue for it later.

@tsteur tsteur self-assigned this May 21, 2020
@tsteur
Copy link
Member

tsteur commented May 22, 2020

FYI started some early concept using service worker in https://github.com/matomo-org/matomo/pull/15970/files

Looks like this could work even for bulk requests and if send beacon is used.

It's not too much tested though and haven't tweaked code much and haven't tested it in any browser but chromium. Development is quite early stages.

If someone's proficient in service workers and IndexedDB feel free to leave some comments.

The goal will be to cache the actual JS tracker file, and put all tracking requests in a queue should the user be offline.

@tsteur
Copy link
Member

tsteur commented May 27, 2020

@PCSun1987 @juliusstoerrle @bdurrer @saifeer @Maqsyo

Anyone maybe able to have a look at #15970 or even give it a test? I don't have any service workers in use so not sure if there's something that would cause issues with other service workers etc. Have only done some basic testing so far but if any possible be great to test it.

@tsteur tsteur modified the milestones: 4.0.0, 4.0.0 RC Jul 1, 2020
@sgiehl
Copy link
Member

sgiehl commented Oct 8, 2020

@tsteur Anything left here for Matomo 4?

@tsteur
Copy link
Member

tsteur commented Oct 8, 2020

@sgiehl it's not finished (maybe not even really working yet). We'll try to get some feedback and try to get people to test it so we can see if/how it works and what adjustments it needs, etc.

We'll hopefully have another beta soon and then get some feedback.

Anyone seeing this issue, the feature should be included Matomo 4 beta 3 and newer. Be great to give it a try and comment here if it worked or not and what issues you ran into or what was not clear.

@tsteur
Copy link
Member

tsteur commented Oct 15, 2020

FYI as this feature is available in latest Matomo 4 beta here is an FAQ on how to use it: https://matomo.org/faq/how-to/how-do-i-set-up-matomo-offline-tracking/

If you have any feedback be great to let us know

@mackuba
Copy link

mackuba commented Oct 27, 2020

To be honest, as someone who is first hearing about this feature, to me it sounds kind of… suspicious, let's say. My feeling is that it breaks some kind of implicit user expectation - I would never guess that a website might be tracking my every click even if I'm flying across the Atlantic and browsing a saved copy (if that's how it works, I'm not sure). It doesn't sound like a feature I would expect to find in a privacy-focused software. Just IMHO.

@mackuba
Copy link

mackuba commented Oct 27, 2020

If I may add, a famous quote by Jeff Goldblum from Jurassic Park about scientists comes to mind ;)

@bdurrer
Copy link

bdurrer commented Oct 29, 2020

To be honest, as someone who is first hearing about this feature, to me it sounds kind of… suspicious, let's say. My feeling is that it breaks some kind of implicit user expectation - I would never guess that a website might be tracking my every click even if I'm flying across the Atlantic and browsing a saved copy (if that's how it works, I'm not sure). It doesn't sound like a feature I would expect to find in a privacy-focused software. Just IMHO.

Wait, there are people who think they are not constantly tracked everywhere? :)

PWA and Electron-based web applications are quite a thing. Funny you mentioned flying, because I deploy these things to a few thousand devices of airline cabin crew. A lot of webapplications are expected to work just fine while the connectivity is bad, "offline" does not have to be a long timespan.

@tsteur
Copy link
Member

tsteur commented Nov 8, 2020

Closing this issue now. Documented in https://matomo.org/faq/how-to/how-do-i-set-up-matomo-offline-tracking/

If you run into any issues with this feature, simply create an issue here in our GitHub repository.

Are you a service worker or IndexedDB pro? We’d love if you could help us review our implementation. You can check out the code in the offline tracking pull request. We appreciate any feedback to help us improve this feature! You can comment directly in the PR or contact us

@tsteur tsteur closed this as completed Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests