@mattab opened this Issue on August 31st 2014 Member

The goal of this ticket is to create an easy to use tool that will let anyone import in Piwik their Google Analytics data.

Features expected in this tool

  • Import one or several websites
  • Import data for all periods, or only import for a custom date range
  • For each website, we import the report data: Top pages, top events, top browsers, etc. (for the 50 reports available in Piwik or so)
  • Note: will not import the raw log visitor data since it is not available via Google Analytics API
  • Create a simple user interface, which uses a console command

    Tasks

  • Proposed name: Google Analytics Importer
  • Create new project at piwik/importer-google-analytics
  • plugin published on the Marketplace.... or we put it in Matomo core instead?
  • Let user connect via oAuth to the Google APIs, fetch automatically the profiles, websites available.
  • Run import via the UI or directly let user execute the console command
  • Test the importer on a few websites,
    • including a website with medium traffic to check that we work around Google API limits
  • Release the tool to public:

This issue requires #6094 (Create an API to let users import historical report data in Piwik)

@mcollopy commented on January 12th 2015

Will importing history from GA cause duplication in Piwik? e.g. For a site which has been using GA for a year, but only 3 months in Piwik, we might want to import the previous 9 months to get 1 year of history.
But not the overlapping recent 3 months.
Do you plan on deduping or providing a data range filter?
Thanks

@liebana commented on March 11th 2015

What's the overall status of this? It seems that there's no activity at all in both issues. Do you have this in your short term roadmap?

@mattab commented on March 11th 2015 Member

There is no update and while it is currently in the Short term it may not be done in short term due to the complexity of the task.

@dbrgn commented on March 25th 2015

This feature would be great!

@erotavlas-turbo commented on April 10th 2015

This is great feature!

@JagerBomb commented on April 20th 2015

We are really looking forward to this feature in our company. We cannot ask our clients to use our central Piwik installation until this has been released, since they have been relying on Google Analytics for too long and wish to keep their historical data.

@baptistedonaux commented on May 17th 2015

+1

@jdreesen commented on May 19th 2015

+1

@paul-cross commented on July 14th 2015

+1

@Maestro2k5 commented on July 29th 2015

+1

@phpdave11 commented on July 30th 2015

What about importing raw google analytics data? I have a copy of every hit saved, can these be imported into piwik?

@nickwright80 commented on August 1st 2015

+1

@berniezhao commented on August 5th 2015

+1

@Tomcraft1980 commented on August 15th 2015

+1

@ants-house commented on August 19th 2015

+1

@VelNZ commented on September 2nd 2015

+1

@gelldur commented on September 8th 2015

+1

@movak commented on September 9th 2015

+1

@ghost commented on September 18th 2015

+1

@MatthieuScarset commented on October 20th 2015

I'm in! What task need to be done first? :)

@L0garithmic commented on November 18th 2015

Come on guys, we need this!!!

@unknown23person commented on January 28th 2016

+1

@mgonera commented on February 17th 2016

+1

@Reamer commented on April 11th 2016

+1

@anacona16 commented on May 11th 2016

Hi, can you tell me the status for this issue?

@sgiehl commented on May 11th 2016 Member

Not sure if anyone is working on that at all :disappointed:

@Draky50110 commented on September 9th 2016

+1

@mattab commented on September 27th 2016 Member

We won't be able to work on this, unless someone can sponsor us to work on it. Please contact us via https://piwik.org/development/ if you're able to sponsor the team!

@erotavlas-turbo commented on September 27th 2016

Why do not we/you create a crowdfounding campaign? I think it would be of great success...

@dbrgn commented on September 27th 2016

I'd certainly give a few dollars :)

@josephernest commented on October 23rd 2016

+10 (+10$ at least in a crowdfunding)

Possible already-available solution:

Is there a tool that allows us to export from Google Analytics to Apache-log similar access.log files?

Then it would be easy to re-import to Piwik, with import_logs.py...

What do you think?

@rosa2 commented on November 27th 2016

+1

@rosa2 commented on November 27th 2016

@mattab how much money do you need to make the programming? I am very interested to get it.

@cisarpavel commented on December 21st 2016

Do you have some estimate price/hours to develop this feature?

@zody0210 commented on December 25th 2016

+1
I am absolutely willing to pay for such a plugin/script.

@ryanhamilton commented on December 31st 2016

+1 would pay.

@viezel commented on January 13th 2017

+1 I would pay. @mattab can you create a kickstarter or similar?

@shaun-of-the-dead commented on March 5th 2017

Same for me. I'd be happy to contribute financially to help this feature / tool occur!

I have years of Google Analytics data logged before I learnt about Piwik, and I'm sure I'm not the only one. There'd be great value in implementing a tool like this if you guys can pull it off.

@machampagne commented on March 28th 2017

We are also willing to chip in for this feature.

@DJLunacy commented on April 26th 2017

Bueller?

@nopynoop commented on July 21st 2017

+1

@erotavlas-turbo commented on July 21st 2017

We could use patreon.
Take a look at this or this.

@SzymonDukla commented on September 26th 2017

I would like to offer a development of such functionalities (that includes the API and the plugin to import GA data). A monthly financial contribution is crucial. Do you want me to set up a Patreon (or similar)?
In case of questions, please ask on sd@szymondukla.co.uk

@mattab commented on September 26th 2017 Member

We'd also like to work on this, but so far the problem is how would it technically work? @SzymonDukla do you have some thoughts on this? we haven't found a GA API that we can use to import all the data safely and accurately yet...

@SzymonDukla commented on September 27th 2017

I have a project behind me now that involves Analytics data gathering through GA API. It's nearly (as close as we can really get) to raw data. I have a nice and flexible API wrapper written in PHP I'm going to use.
I've set up a Patreon: https://www.patreon.com/ga_for_piwik
I'll be answering all the questions on there.

@mattab commented on September 29th 2017 Member

Hi @SzymonDukla what's your idea of getting as close as you can get raw data? would you call the GA API with many dimensions to get raw data or so?

@fabrice-regnier commented on October 11th 2017

+1 i'd pay too. Please,please, help us to opt out of GA. I guess it would be a great functionnality of piwik!

I can see chats are going on on this subject, nice.

regards.
f.

@SzymonDukla commented on October 13th 2017

Sorry for the delay, last two weeks have been crazy...
Getting back on this project, that's correct @mattab - I have a side project that involves getting and storing all the possible data from GA accounts and then generating monthly/weekly/yearly SEO reports based on that so I think I will have the same approach which is getting as many metrics in as many configurations (as some metrics are now working with other ones) exported into a JSON or similar file that then will be imported via Piwik plugin, processed and merged into Piwik itself.

@fabrice-regnier, thank you very much for your interest! I do that because I have the same problem - years worth of data on Google I have no way to export.

As mentioned, I've set up a Patreon (https://www.patreon.com/ga_for_piwik), let's move all further discussion on this project there for those interested :)

@mattab commented on October 15th 2017 Member

will have the same approach which is getting as many metrics in as many configurations (as some metrics are now working with other ones) exported into a JSON or similar file that then will be imported via Piwik plugin, processed and merged into Piwik itself.

Did you mean 'metrics' or rather 'dimensions'? Our previous version of the GA importer (since deprecated) used this trick, the problem was that back in the days there were not enough precise dimensions, so we couldn't get useful "Per visitor" information and import all visitors.
Could you show us an example of how you'll export the GA data to then import it in Piwik?

@geekbrat commented on April 15th 2018

@SzymonDukla

As mentioned, I've set up a Patreon (https://www.patreon.com/ga_for_piwik), let's move all further discussion on this project there for those interested :)

I was really interested in this and went to back your patreon and then I noticed that you are asking for over $2000/month to create the plugin, and then expect most of us to purchase it on top of that. I have a great appreciation for the amount of work involved and needing to be able to divert your attention from other projects, but I really prefer a game to have either free to play, pay to upgrade or pay to play, free to upgrade models of pricing, and REALLY hate when they stick it to me on both ends, which is exactly what this feels like.

@voarsh commented on August 13th 2018

It looks like @SzymonDukla is no longer supporting this issue and the patreon is not running. Can someone update the issue, outline who is responsible for creating or outsourcing this piece of work?

@suzigrishpul commented on October 24th 2018

Hey @mattab, any updates on this?

@mattab commented on October 24th 2018 Member

No update @suzigrishpul

we want to build this feature as open source, it's really something we would like to develop.

But so far we are not sure of the technical feasability of a GA->Matomo migration. If anyone is expert in GA API and has some idea how we could migrate data over to Matomo, we'd love to hear. We are pretty much keen to implement it in the next few months if it's technically possible.

(We likely will spend a few days investigating the feasability, later this year, if nobody comments here with points.)

@suzigrishpul commented on October 24th 2018

@mattab cool, thanks! I wish I could help, will just watch for an update for now.

@henrikhelsingborg commented on November 6th 2018

Really interested in this.

@alexgogan commented on January 30th 2019

@mattab I can look at getting some of my guys involved in this later in February there is no problem. We are in the process of developing additional implantation with our CMS engine and Matamo. We did this previously with the Google api. So when we are finished this could dig much deeper into this.

@suzigrishpul commented on January 30th 2019

that's really exciting news @alexgogan !

@tsteur commented on February 11th 2019 Member

@alexgogan be good to keep us up to date and let us know if we can be of any help. We're keen on investigating this as well.

Also in general: It's obviously not the same but may be sometimes better than nothing to use Log Analytics to import old data if the logs still exist.

@diosmosis commented on March 20th 2019 Member

I've done some research into the feasibility of importing log data from Google Analytics into Matomo. I've come up with two possible solutions, one which probably isn't workable, and another whose feasibility won't become apparent until a PoC is built. Both solutions allow the visitor log to function w/ the imported data. If we just import reports, this won't be the case.

First, here are the limitations with Google Analytics API:

  • Clients are limited to a total of 50,000 API resuests per day.
  • There are only two APIs, one to get aggregated reports and one to get an individual user's activity.
    • The first API imposes a limit on the number of dimensions you can use, at most 7. So we can't aggregate by every possible dimension and determine the properties for indivdual actions (at least not w/ ease).
    • The second API only provides the activity for a single user, so it doesn't provide a lot of data, and the data will be out of order w/ the rest of the user data.
    • There is no way to get a list of user IDs (or client IDs which are like Matomo config IDs). You can see them in the UI, but there is explicitly no API (and no user id/client id dimension).

Here is the solution that probably isn't workable, but would be easier to understand:

  1. For each day in the time range, use the userActivity.search API method (https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/userActivity/search?authuser=2) to query actions for all users in a site.
  2. Re-order them by action time.
  3. Map them to Matomo tracker requests and track them like normal requests.

Pros

  • It's simple to understand and straightforward to implement.

Cons

  • There's no way to get a list of user IDs/client IDs through the API (AFAIK). Which makes this solution unworkable.
  • The API only allows querying actions for a single user, which means the number of requests depends on the number of users. A site with 50,000 users will require 50,000 API requests per day, which means we'll hit the API limit immediately.
  • Since we are getting actions for a user and not for a day, we'll have to incorporate different user's actions before tracking because we cannot track them out of order. This means we'll have to get a potentially large amount of actions, re-order them and then track them.

The other solution uses the reporting API to make multiple requests that are combined into a result we can use:

  1. For each day in the time range, make multiple report queries, one for every possible pair of dimensions we're interested in. We query just for hits.
  2. The report queries will return a list of values for the respective dimensions. We combine these to get the hits for every possible dimension value. The combination is done progressively and is explained below.
  3. When we have the number of hits (ie, actions) for every dimension value combination, we can recreate actions and re-track them.

Explanation for combining reports to get a report for every dimension:

Ideally, we want to get the number of hits for every dimension value (eg, browser name, action time, referrer name, etc.). Since there's a limit of 7 dimensions, we can't do this. Instead we have to find some way to combine queries for less dimensions. To illustrate how we do this, let's limit the number of dimensions and dimension values. We'll say there are 3 dimensions total (DA, DB, DC), 1 dimension value for each (A, B, C), and GA only allows selecting 2 dimensions.

So w/ these limitations, we have to find out what the report for dimensions = [DA, DB, DC] when we can only query two at a time. We'll do this by first querying every possible combination of two dimensions:

- DA DB DC
DA - N(A, B) N(A, C)
DB N(A, B) - N(B, C)
DC N(A, C) N(B, C) -

(where N(...) is just a placeholder for the hits)

The thing to understand here is that the hits for dimensions = [DA, DB] is the cardinality of the subset of all actions where DA = A & DB = B. We want to find the hits for dimensions = [DA, DB, DC], which is the cardinality of the subset of all actions where DA = A, DB = B & DC = C.

This is the intersection of sets (DA + DB), (DA + DC), (DB + DC), which means we can calculate N(A, B, C) by getting the minimum value of the cardinality of all these sets, ie: |DA + DB + DC| = min(|DA + DB|, |DA + DC|, |DB + DC|).

The idea is to use this property and apply to all the dimensions we're interested in and all their queried values. This will give us essentially the properties for every action.

Pros

  • It is very friendly to the GA API rate limit. The number of queries made will be N * (N - 1) * D where N is the number of dimensions we need & D is the number of days. It will take some time to hit the query limit, no matter what the scale of the site is.
  • While computing every possible combination will still be memory intensive, it will likely be far less memory intensive than the previous solution on a high trafic site.
  • It might actually work.

Cons

  • It may be hard to understand and hard to code.
  • It's possible I'm looking at the solution incorrectly, and a PoC that is time consuming to create will turn out to not actually work.

EDIT: One way to get user IDs/client IDs would be to issue requests like those that are made in the UI. I'm not sure how well that would work, and I'm guessing the page size will be smaller so it would take more requests. Also I am not sure if there are any limits to doing that, since that's outside of the API.

@tsteur commented on March 21st 2019 Member

I would say raw data may not be as important? Of course be great to have it for segmentation, but maybe not needed eg if there's a problem with the 50K req/day limit?

@mattab commented on March 26th 2019 Member

@diosmosis thanks for the report. Sounds like importing RAW data is tricky and not easily do-able at this time.

So we probably want to import Aggregated reports only. And forget about RAW data.

-> As a next step, would be interesting to understand what it would take to build a Google Analytics Report data importer?

@diosmosis commented on March 26th 2019 Member

@mattab We would map matomo dimensions to GA dimensions, query for report data, then map GA values to Matomo values (ie, dimensions). It's pretty straightforward (though value mapping might be difficult). We could also allow users to supply segments to make available (we can add segments to the report fetching API method).

@mattab commented on March 27th 2019 Member

In V1 we would just want to keep it simple and import "All visits" segment (ie. not import any segment and segment data from GA)

@diosmosis commented on June 16th 2019 Member

Status update:

  • created proof of concept importer that successfully imports entities including site, goals & custom dimensions and successfully imports some reports including referrers reports, actions reports & custom dimension reports (though there are issues w/ each)
  • created automated testing setup for the importer that uses GA

Next things todo:

  • [ ] finish code to import the remaining reports and entities
  • [ ] to our test GA site add missing data so we can test every part of the import
  • [ ] manually test w/ a larger GA site
  • [ ] fix any import accuracy issues
@mattab commented on August 11th 2019 Member

Hi Everyone! good news, we are almost done with the Google Analytics to Matomo Importer implementation. We'll start the beta release in a few days. Get ready for testing to import your Google Analytics data in Matomo :sunglasses:

@suzigrishpul commented on August 12th 2019

OMG THIS IS THE BEST NEWS!!!! thank you so much!

On Sun, Aug 11, 2019, 4:59 PM Matthieu Aubry <notifications@github.com>
wrote:

Hi Everyone! good news, we are almost done with the Google Analytics to
Matomo Importer implementation. We'll start the beta release in a few days.
Get ready for testing to import your Google Analytics data in Matomo 😎


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/matomo-org/matomo/issues/6095?email_source=notifications&email_token=AD4OCBKRIZBKZSCMIGK2GRLQECKUHA5CNFSM4ATUFKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BKUOY#issuecomment-520268347,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD4OCBN5BWIBBGDCGZM7IXLQECKUHANCNFSM4ATUFKSQ
.

@kimoudev commented on August 15th 2019

@mattab . Thanks for this great work, Please I have migrated from 3.11 to 3.12 but I don't see the google analytics import section.
Thank you,

@diosmosis commented on August 15th 2019 Member

Hi @kimoudev the plugin hasn't been released yet, there are still a couple import accuracy bugs to work through before it's ready for a public beta. Should be soon though.

@kimoudev commented on August 15th 2019

Hi @diosmosis . Ok, Thank you so much.

@mattab commented on August 22nd 2019 Member

Hi Everyone,

The Google Analytics Importer for Matomo is now released and available to all! :tada:

We're looking forward to hearing how it goes for you to import your GA data into Matomo.
Kuddos @diosmosis for building this tool :muscle:
We're happy to help you migrate your GA data into Matomo and stay in control! :rocket:
Thanks to everyone who will help test this tool and report their feedback.

@mattab commented on August 22nd 2019 Member

To use the Google Analytics Importer, you first need to upgrade to the latest Matomo 3.12.0 beta version. This can be done in 3 clicks, see instructions here

@rosa2 commented on August 23rd 2019

Hello

Thanks thanks a lot. You have made a big collaboration that makes
internet more free and less controlled by Google, the Big Brother nowadays.

Thanks again

On 22/8/19 12:00, Matthieu Aubry wrote:

Hi Everyone,

  The Google Analytics Importer for Matomo is now released and
  available to all! 🎉

We're looking forward to hearing how it goes for you to import your GA
data into Matomo.
Kuddos @diosmosis https://github.com/diosmosis for building this tool 💪
We're happy to help you migrate your GA data into Matomo and stay in
control! 🚀
Thanks to everyone who will help test this tool and report their feedback.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/matomo-org/matomo/issues/6095?email_source=notifications&email_token=AAOBSY3YDZAALGHETVOW5WLQFZPSHA5CNFSM4ATUFKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44R4EY#issuecomment-523836947,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAOBSY2LWVHIMA6BBTMN4ADQFZPSHANCNFSM4ATUFKSQ.

@mattab commented on August 27th 2019 Member

Hope you enjoy the Google Analytics importer tool. You can post your feedback in the issue tracker: https://github.com/matomo-org/plugin-GoogleAnalyticsImporter/issues

Will close it now as we consider it done. Exciting! :tada:

This Issue was closed on August 27th 2019
Powered by GitHub Issue Mirror