Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid mode tracking: Javascript+Server Side tracking and matching data via IP #9963

Closed
masteranalyze opened this issue Mar 24, 2016 · 14 comments
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@masteranalyze
Copy link

Hi to all,

The best thing on Piwik is that it does have ability to be able to track : Javascript ,Javascript + Image,and Server Side.

At least on joomla 3,with the plugin of Eorisis,you can do that easily.

Javascript is more user oriented and depends on the user and server side,it totally depends on the server.

1)Javascript tracking haves alot of back downs for example:

-it depends on the user to accept the coockie,no cockie messed up data,no matter if you use PIWIK or GA or any other beacon based system
-it depends that the script to be loaded in all your webpages,if your having an website with 10 pages,is easily to check this,if your having 100.000 pages,more hard to check.
-if the javascript it does not execute -no tracking again innacurate data
-if the user blocks javascript with anti-virus,firewall,adblock,ghostery,Ublock or any other add-on again innacurate data.
-if the user sees your sites via proxy that blocks javascripts
-if the user is using some kind of app in adroid or ios,that does not load the javascript inside the app,but he can read your blog

Google said,that about 5% of the users,haves javascript disable some time ago,but some time ago,people were not so freak out about coockies and stuff that are following them,and i guess you know the new EU law,that all the websites,haves to put,that cocckie messaje,and the user can reject the cocckie mostly,as they don`t want to be tracked,because mostly are not so tech educated,and think coockies = viruses and all kinds of malware,not all of them but the majority will not want to be tracked for all kinds of reasons,and you have no control on whatever is their choice.

Google did not take into account the 200 Million people,that are using Ad-Block,witch is now Main Stream,and some sites are reporting even 50% of their total visitators haves Ad -Block installed.

Maybe 5 years or 10 years ago when GA started,javascript was the best method,but its not anymore thats for sure.

Basically javascript,does depend on the user setup,and from my point of view ,this is totally out of control from any webmaster.

2)Javascript + image - maybe this option is an little bit more accurate,i did not tested,and people that are using this method,feel free to comment on this

3)Server-side -very good,but it does have a few back downs as well ,back downs that i think they can be fixed,and maybe we can come up with some solutions:
I`m using like i said joomla 3 on my side with Eorisis plugin for Piwik in that plugin you have option to Track BOTs,that is very important,as mostly,you want boots out of your stats,Human are Human and Bots are Bots,i setup on that TAB for Bots - IGNORE option on all.

I don`t know and maybe somebody can,clarify me,if i setup IGNORE option to all ,in eorisis plugin for joomla 3,piwik it will track on VISITATORS real time - HUMAN ? or BOTS + HUMAN ?

Or bots are tracked separately?

As after i activated server side tracking,i have BOT TRACKER display into my dashboard,and data started to flow in :

Bingbot 47462 2015-08-09 19:32:02
Yandex 11573 2015-08-09 19:32:01
GoogleBot 3754 2015-08-09 19:30:13
Yahoo! Slurp 1891 2015-08-09 19:31:21
Baiduspider 289 2015-08-09 19:27:25
Media Partners GoogleBot 51 2015-08-09 19:02:29
Magpie Crawler 32 2015-08-09 06:02:10
Wget 30 2015-08-09 15:40:57
XoviBot 4 2015-08-09 19:26:13

-On server side tracking you dont know the resolution of the user browser
-On server side tracking you dont know what plugins the user browser is using.
-On server side it does not appear the time it did take the user to load the page -1s,4s,8s,etc
-Feel free to add more

This an problem,because this kind of data is important.

In awstat for example the first 2 problems is solved by this script:

MiscTrackerUrl
Version : 5.6+
MiscTrackerUrl can be used to make AWStats able to detect some miscellanous
things, that can not be tracked on other way like:

  • Screen size
  • Screen color depth
  • Java enabled
  • Macromedia Director plugin
  • Macromedia Shockwave plugin
  • Realplayer G2 plugin
  • QuickTime plugin
  • Mediaplayer plugin
  • Acrobat PDF plugin
    To enable all this features, you must copy the awstats_misc_tracker.js file
    into a /js/ directory stored in your web document root and add the following
    HTML code at the end of your index page (before ) :
    If code is not added in index page, all this detection capabilities will be
    disabled. You must also check that ShowScreenSizeStats and ShowMiscStats
    parameters are set to 1 to make results appear in report page.
    If you want to use another directory than /js/, you must also change the
    awstatsmisctrackerurl variable into the awstats_misc_tracker.js file.
    Change : Effective for new updates only.
    Possible value: Full URL of javascript tracker file added in HTML code
    Default: "/js/awstats_misc_tracker.js"
    MiscTrackerUrl="/js/awstats_misc_tracker.js"

Maybe something like this can be implemented into Piwik as well,so we can see users screens size +what plugins are they using into their browsers.

And for the time on page,it should be an solution too,i don`t have it right now,but surely it can be solved.

On my side the numbers are very differently in one of my websites :

Piwik JAVASCRIPT tracking method 6 august 2015 - 95 unique visitators ,119 visits,206pageviews,125 unique pageviews

Piwik Server Side Tracking method 8 august 2015 - 2527 unique,2555 visits,7434 pageviews,5757 unique pageviews

According to awstats,in 8 august 2015,i dont know how many unique it was,but its reporting that : 1,747 visits it was.

And in 6 august 2015 Awstats,i dont know how many unique it was,but its reporting that : 3,406 visits it was.

I`m telling this,because AWSTAST,is too server side tracking.

On server side -visitator browser unknown for the date of 8 august 2015 is :1437 visits according to Piwik,i mentioned this,because from what i know,bots mostly does not have no language on browser.Maybe somebody can tell me if this visits with unknown language on browsers is from human or robots.

As you can see,there is alot of difference between JavaScript Tracking and Server Side Tracking.

I don`t trust the javascript based numbers tracking,and probably you will ask me,why?

The answer is very simple,im doing alot of Social networking marketing,and i have bit.ly on my links,on bit.ly i think everybody knows it,its used to shorten your link,but it also does provide data,about people who clicked on your shorten links.

On the date of 6 august it was : 525 clicks,that means mostly 525 visitators got into the website,and the javascript is saying 119 visits,so it`s totally wrong.

In javascript lets say it was 100 unique ,and in server side it was 2500 unique,that means an difference of 25x ,between javascript tracking and server side tracking,im asking you guys,do you think this is normally?!
It should be some differences between javascript and server side tracking,but i think i`m the first from the web,that i note this huge difference of 25x,others that i saw posting in webmasterworld,they see 5-6x difference in google.

What do you people think about this?And what do you trust your server logs or the front end tracked or not user?!

I also searched,and i think,it will be nice,if we can implement some kind of Hybrid Tracking : SERVER SIDE + JAVASCRIPT ,to use the benefit of both.

An feature,that google analytics won`t have ever and more accurate data.

@hpvd
Copy link

hpvd commented Mar 26, 2016

Hi @masteranalyze

great to read we are thinking somehow the same way :-)

There is a ticket: "combine data from different sources (for better image of reality)" #9665

Would be great to see your comments / a copy of these well describing words of the advantages and possibilies also there :-)

@hpvd
Copy link

hpvd commented Mar 26, 2016

There is another one, supporting the same target:
Make usage of log analytics easier / acessible by more users
#9711

@dev-101
Copy link

dev-101 commented Mar 29, 2016

The reason why you see up to 25x difference (sometimes even more, 100x, 1000x) is that - most probably - your visits are coming from all kind of bots, scrapers, seo probes and similar junk. Piwik is filtering all those bad stuff and tries to report real users as much as possible. You will not get any filtering in AWstats or bit.ly or whatever, they will report every visit there, regardless if they are coming from java-script enabled browser/emulator or otherwise. If a 'user' has no js enabled, you can instantly assume that it is strange and 'fishy'. Modern websites today almost exclusively require js to properly run. So, we can really argue here if js method is not desirable/reliable method.

Just my opinion

@masteranalyze
Copy link
Author

The main ideea of this like @hpvd wrote :"There is a ticket: "combine data from different sources (for better image of reality)" #9665" is to get via piwik the most accurate image of the reality.

@dev-101 I partially agree with you on this " The reason why you see up to 25x difference (sometimes even more, 100x, 1000x) is that - most probably - your visits are coming from all kind of bots, scrapers, seo probes and similar junk" - i say partially because except what you said,it can be real humans also in that traffic,with adblock plugin for firefox installed on their browser,that can easily block javascript from piwik,ga,it does not matter.

I tested not only adblock,i tested others,and i am an REAL HUMAN,and i don`t appear in Java Tracking method,i will only appear on the server logs.

Servers logs haves some problems as you mentioned alot of data can be bots,scrapers,junkers,and other tech like :On server side tracking you dont know the resolution of the user browser
-On server side tracking you dont know what plugins the user browser is using.
-On server side it does not appear the time it did take the user to load the page -1s,4s,8s,etc

I noted that this problem in awstats is kind of solve for the real users that have javascript enabled on them,by the awstats/misc tracker ,and that misc tracker can be also available for Piwik in order to get that data.

@dev-101 ,did you now that robots nowdays from google bing,yahoo,can hit your javascript ?
Your stats will be messed up,if that happens of course.

Robots nowdays are not like they once used to be,and it was simple,no javascript -it means that we are dealing with an Robot.

@dev-101 : "If a 'user' has no js enabled, you can instantly assume that it is strange and 'fishy'. why do you think that? So if i use ad block or ublock on my browser,i am strange and fishy?!?

And what if i am strange and fishy?!As i don`t see no problem with that,the main point is to get an better image of the reality,not if someone is strange or fishy.

You cannot assume that,only by thinking that the user is "fishy" is not an user(real human) if he does not use javascript.That is absolute wrong,because im neither strange neither fishy,i can only be maybe 1 of the 200milion that is using adblock lets say,so that is not important for you,because is strange?!

Is not strange at all,is the new world,if you consider it strange,maybe we are all strange for you.

If somebody blocks your javascript with an browser plugin is not strange at all,is very often nowdays,the user can do it easily,and with stupid laws like the eu coockie law,that is only freaking out non-tech people that cannot make an difference between an COOCKIE And an Malware or Virus,they will block the tracking for safety reason of course.

Once they do that,instead of getting an better image of reality,you get an distorsed image of reality,because of some stupid laws maked by people witch does not even know very well to use their own smartphones,they are just some birocrates,that haves nothing to do with Tech!
Else they wouldnt release stupid laws,just to freak out user experience,and crazy kidos to think that because of an tracking coockie,they will not be free anymore,they will be watched anywere they go,because they have an coockie,so what should they do?Block the damm coockie...block the tracking...this is how people react when they dont have no know how about how things are working.

Maybe that user does not want to see ads,maybe that user does not want to be tracked,maybe that user haves an antivirus unproprly configured that will block the piwik tracker,with javascript method,you won`t have an real picture of what is going on.

For me as an webmaster,is important to know the real data,as much as possible,and i guess everyone of us wants this.

To track accurate we need this :
Server logs and filtration on the server logs,with GOOD BOTS ,BAD BOTS ,and REAL HUMANS.

Only by doing that,we can have an real picture of the reality,as on javascript side,the webmaster does not have no control on it.

And piwik is the only analytics on the market,that is able practically to do that,but we must implement this,at least for piwik 3.

@dev-101
Copy link

dev-101 commented Mar 29, 2016

Hi, I get your points and yes, of course I know bots execute & understand js nowadays, but good ones are well-known and verifiable via reverse dns and filtered/excluded by default in Piwik.

About AdBlock, I am afraid there is no reliable solution to that problem right now, given the risk of counting other 'fake' visits. Also, I think that AdBlock [at least in Chrome] does not block Piwik tracking script by default, maybe I am wrong here, but from my limited experience and few tests, that was not the case, until I particularly excluded those IPs from tracking.

Again, this might need detailed testing.

Not sure about Firefox and their privacy, in my brief test it did not affect tracking, but maybe some specific setups (AdBlock configurations) are required to achieve that. But, did not test it that much, to make this clear.

Server tracking is great, but to get it in a more reliable way, I would personally limit it to logged-in visitors only, to minimize false reporting. And, not all websites have that requirement / functionality.

@masteranalyze
Copy link
Author

Test : UBLOCK,test Ghostery,on firefox or were you want,you will visit your website with piwik javascript included,and you will be an GHOST ,you won`t be tracked into your stats.

About the default,most of the users that are using ad block,are tech-savvy,that means,that if they installed it,mostly they know to use and setup it to block it for piwik,ga it does not matter.

Look over this for example : #5094

Server tracking is not great,is the only solution for an better picture of the reality,because only on the servers logs you have control,and only on the servers logs you have all the visits.

The problem is that you have in the logs : GOOD BOOTS ,BAD BOTS(SCRAPPERS,shits,junks,etc) and REAL HUMANS.

Main point is to find solution,to filter good boots is easy,because as you said they are verifiable,etc,and mostly use the word: BOT ,robot,crawler,etc so it`s easy.
The bad part the not easy one is about the bad bots, witch does not follow the rules,and they can have any name not cointaining no word like : bot ,crawler,etc.

If we can find an way to do this,we will have accurate stats,no matter if the users are logged in or not.
The people that are logged in,into an website,you will know about them,as once they logged on ,is clearly they are real humans.

Problem is on the :Guest Visitators.And on the side of : BAD BOTS.

And check bit.ly ,their tracking on clicks is very accurate,and is done only by real humans,their services starts from 1000$ per month,but it`s free for the first 15.000 clicks,if i remember very well.

The thing is that an user with blocking your javascript,can click on your website,and it won`t be recorded with Javascript tracking,it only be recorded by the server logs tracking.

That does not apply to piwik only,this applies to google analytics,statcounter,or any other beacon based tracking system.

@dev-101
Copy link

dev-101 commented Mar 29, 2016

Yes, some very valid points there.

About uBlock (I assume Ghostery works on similar principles?) there could be a workaround solution (a cat & mouse game, really), and not sure if it would be effective, though: obscuring piwik.php and piwik.js with random-generated strings instead (admin configurable), and also, using random length string instead of generic /piwik path, for example. That could prevent their regex from, at least, default/easy blocking of Piwik.

But, uBlock is blocking a lot of stuff, even here, on GitHub I cannot submit a comment with it.

@masteranalyze
Copy link
Author

Like i said,javascript is not an real solution,if you want to get the correct data and the real picture,witch is the main ideea of this topic.

Yes ghostery works on same principles basically,but they are just a few of the plugins that can get this job done right,they are alot more problems that can happen on javascript ,and alot of plugins that can block your tracking,case in witch you will get an distorsed image of your stats.

Only solution ,is filtering what i said,and have server side tracking.

Not to point that server side tracking ,it creates an better user experience,as if no javascript is loaded,the webpage will load more fast.

@tsteur
Copy link
Member

tsteur commented Mar 29, 2016

I wonder if we can mark this issue as a duplicate of #9665 and continue commenting on the other issue or is this one a bit different? I feel like some different topics were covered here and it might not be good to have them all in one issue (eg avoid being blocked by AdBlocker etc).

@masteranalyze
Copy link
Author

However you wish,merged i think is more good!

@dev-101
Copy link

dev-101 commented Mar 29, 2016

Personally, I would leave this issue open, since both the title & discussion here in more details describes the core problem. The other issue is too broad.

@tsteur tsteur added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Mar 29, 2016
@tsteur
Copy link
Member

tsteur commented Mar 29, 2016

@dev-101 good point. I will leave both open

@mattab
Copy link
Member

mattab commented Mar 31, 2016

Marking this as duplicate of #9665 (i've renamed that other issue to hopefully clarify the scope). please comment there if anything is missing or if you have suggestions/ideas!

@mattab mattab closed this as completed Mar 31, 2016
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label Mar 31, 2016
@mattab
Copy link
Member

mattab commented Mar 25, 2020

This issue might be even better idea/ easier to use: #13023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

5 participants