Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let users import Google Analytics data in Matomo #6095

Closed
mattab opened this issue Aug 31, 2014 · 73 comments
Closed

Let users import Google Analytics data in Matomo #6095

mattab opened this issue Aug 31, 2014 · 73 comments
Assignees
Labels
Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Aug 31, 2014

The goal of this ticket is to create an easy to use tool that will let anyone import in Piwik their Google Analytics data.

Features expected in this tool

  • Import one or several websites
  • Import data for all periods, or only import for a custom date range
  • For each website, we import the report data: Top pages, top events, top browsers, etc. (for the 50 reports available in Piwik or so)
  • Note: will not import the raw log visitor data since it is not available via Google Analytics API
  • Create a simple user interface, which uses a console command

Tasks

  • Proposed name: Google Analytics Importer
  • Create new project at piwik/importer-google-analytics
  • plugin published on the Marketplace.... or we put it in Matomo core instead?
  • Let user connect via oAuth to the Google APIs, fetch automatically the profiles, websites available.
  • Run import via the UI or directly let user execute the console command
  • Test the importer on a few websites,
    • including a website with medium traffic to check that we work around Google API limits
  • Release the tool to public:
    • update this blog post, this FAQ
    • write a new blog post about the tool

This issue requires #6094 (Create an API to let users import historical report data in Piwik)

@mcollopy
Copy link

Will importing history from GA cause duplication in Piwik? e.g. For a site which has been using GA for a year, but only 3 months in Piwik, we might want to import the previous 9 months to get 1 year of history.
But not the overlapping recent 3 months.
Do you plan on deduping or providing a data range filter?
Thanks

@liebana
Copy link

liebana commented Mar 11, 2015

What's the overall status of this? It seems that there's no activity at all in both issues. Do you have this in your short term roadmap?

@mattab
Copy link
Member Author

mattab commented Mar 11, 2015

There is no update and while it is currently in the Short term it may not be done in short term due to the complexity of the task.

@dbrgn
Copy link

dbrgn commented Mar 25, 2015

This feature would be great!

@mattab mattab modified the milestones: Short term, Mid term Apr 7, 2015
@erotavlasme
Copy link

This is great feature!

@JagerBomb
Copy link

We are really looking forward to this feature in our company. We cannot ask our clients to use our central Piwik installation until this has been released, since they have been relying on Google Analytics for too long and wish to keep their historical data.

@baptistedonaux
Copy link

+1

3 similar comments
@jdreesen
Copy link

+1

@paul-cross
Copy link

+1

@Maestro2k5
Copy link

+1

@phpdave11
Copy link

What about importing raw google analytics data? I have a copy of every hit saved, can these be imported into piwik?

@nickwright80
Copy link

+1

6 similar comments
@berniezhao
Copy link

+1

@Tomcraft1980
Copy link

+1

@ants-house
Copy link

+1

@VelNZ
Copy link

VelNZ commented Sep 2, 2015

+1

@gelldur
Copy link

gelldur commented Sep 8, 2015

+1

@movak
Copy link

movak commented Sep 9, 2015

+1

@suzigrishpul
Copy link

@mattab cool, thanks! I wish I could help, will just watch for an update for now.

@henrikhelsingborg
Copy link

Really interested in this.

@alexgogan
Copy link

@mattab I can look at getting some of my guys involved in this later in February there is no problem. We are in the process of developing additional implantation with our CMS engine and Matamo. We did this previously with the Google api. So when we are finished this could dig much deeper into this.

@suzigrishpul
Copy link

that's really exciting news @alexgogan !

@tsteur
Copy link
Member

tsteur commented Feb 11, 2019

@alexgogan be good to keep us up to date and let us know if we can be of any help. We're keen on investigating this as well.

Also in general: It's obviously not the same but may be sometimes better than nothing to use Log Analytics to import old data if the logs still exist.

@mattab mattab changed the title Let users import Google Analytics data in Piwik Let users import Google Analytics data in Matomo Feb 14, 2019
@diosmosis diosmosis self-assigned this Mar 4, 2019
@mattab mattab modified the milestones: 3.9.0, 3.10.0 Mar 18, 2019
@diosmosis
Copy link
Member

diosmosis commented Mar 20, 2019

I've done some research into the feasibility of importing log data from Google Analytics into Matomo. I've come up with two possible solutions, one which probably isn't workable, and another whose feasibility won't become apparent until a PoC is built. Both solutions allow the visitor log to function w/ the imported data. If we just import reports, this won't be the case.

First, here are the limitations with Google Analytics API:

  • Clients are limited to a total of 50,000 API resuests per day.
  • There are only two APIs, one to get aggregated reports and one to get an individual user's activity.
    • The first API imposes a limit on the number of dimensions you can use, at most 7. So we can't aggregate by every possible dimension and determine the properties for indivdual actions (at least not w/ ease).
    • The second API only provides the activity for a single user, so it doesn't provide a lot of data, and the data will be out of order w/ the rest of the user data.
    • There is no way to get a list of user IDs (or client IDs which are like Matomo config IDs). You can see them in the UI, but there is explicitly no API (and no user id/client id dimension).

Here is the solution that probably isn't workable, but would be easier to understand:

  1. For each day in the time range, use the userActivity.search API method (https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/userActivity/search?authuser=2) to query actions for all users in a site.
  2. Re-order them by action time.
  3. Map them to Matomo tracker requests and track them like normal requests.

Pros

  • It's simple to understand and straightforward to implement.

Cons

  • There's no way to get a list of user IDs/client IDs through the API (AFAIK). Which makes this solution unworkable.
  • The API only allows querying actions for a single user, which means the number of requests depends on the number of users. A site with 50,000 users will require 50,000 API requests per day, which means we'll hit the API limit immediately.
  • Since we are getting actions for a user and not for a day, we'll have to incorporate different user's actions before tracking because we cannot track them out of order. This means we'll have to get a potentially large amount of actions, re-order them and then track them.

The other solution uses the reporting API to make multiple requests that are combined into a result we can use:

  1. For each day in the time range, make multiple report queries, one for every possible pair of dimensions we're interested in. We query just for hits.
  2. The report queries will return a list of values for the respective dimensions. We combine these to get the hits for every possible dimension value. The combination is done progressively and is explained below.
  3. When we have the number of hits (ie, actions) for every dimension value combination, we can recreate actions and re-track them.

Explanation for combining reports to get a report for every dimension:

Ideally, we want to get the number of hits for every dimension value (eg, browser name, action time, referrer name, etc.). Since there's a limit of 7 dimensions, we can't do this. Instead we have to find some way to combine queries for less dimensions. To illustrate how we do this, let's limit the number of dimensions and dimension values. We'll say there are 3 dimensions total (DA, DB, DC), 1 dimension value for each (A, B, C), and GA only allows selecting 2 dimensions.

So w/ these limitations, we have to find out what the report for dimensions = [DA, DB, DC] when we can only query two at a time. We'll do this by first querying every possible combination of two dimensions:

- DA DB DC
DA - N(A, B) N(A, C)
DB N(A, B) - N(B, C)
DC N(A, C) N(B, C) -

(where N(...) is just a placeholder for the hits)

The thing to understand here is that the hits for dimensions = [DA, DB] is the cardinality of the subset of all actions where DA = A & DB = B. We want to find the hits for dimensions = [DA, DB, DC], which is the cardinality of the subset of all actions where DA = A, DB = B & DC = C.

This is the intersection of sets (DA + DB), (DA + DC), (DB + DC), which means we can calculate N(A, B, C) by getting the minimum value of the cardinality of all these sets, ie: |DA + DB + DC| = min(|DA + DB|, |DA + DC|, |DB + DC|).

The idea is to use this property and apply to all the dimensions we're interested in and all their queried values. This will give us essentially the properties for every action.

Pros

  • It is very friendly to the GA API rate limit. The number of queries made will be N * (N - 1) * D where N is the number of dimensions we need & D is the number of days. It will take some time to hit the query limit, no matter what the scale of the site is.
  • While computing every possible combination will still be memory intensive, it will likely be far less memory intensive than the previous solution on a high trafic site.
  • It might actually work.

Cons

  • It may be hard to understand and hard to code.
  • It's possible I'm looking at the solution incorrectly, and a PoC that is time consuming to create will turn out to not actually work.

EDIT: One way to get user IDs/client IDs would be to issue requests like those that are made in the UI. I'm not sure how well that would work, and I'm guessing the page size will be smaller so it would take more requests. Also I am not sure if there are any limits to doing that, since that's outside of the API.

@tsteur
Copy link
Member

tsteur commented Mar 21, 2019

I would say raw data may not be as important? Of course be great to have it for segmentation, but maybe not needed eg if there's a problem with the 50K req/day limit?

@mattab
Copy link
Member Author

mattab commented Mar 26, 2019

@diosmosis thanks for the report. Sounds like importing RAW data is tricky and not easily do-able at this time.

So we probably want to import Aggregated reports only. And forget about RAW data.

-> As a next step, would be interesting to understand what it would take to build a Google Analytics Report data importer?

@diosmosis
Copy link
Member

@mattab We would map matomo dimensions to GA dimensions, query for report data, then map GA values to Matomo values (ie, dimensions). It's pretty straightforward (though value mapping might be difficult). We could also allow users to supply segments to make available (we can add segments to the report fetching API method).

@mattab
Copy link
Member Author

mattab commented Mar 27, 2019

In V1 we would just want to keep it simple and import "All visits" segment (ie. not import any segment and segment data from GA)

@mattab mattab modified the milestones: 3.10.0, 3.11.0 May 28, 2019
@diosmosis
Copy link
Member

Status update:

  • created proof of concept importer that successfully imports entities including site, goals & custom dimensions and successfully imports some reports including referrers reports, actions reports & custom dimension reports (though there are issues w/ each)
  • created automated testing setup for the importer that uses GA

Next things todo:

  • finish code to import the remaining reports and entities
  • to our test GA site add missing data so we can test every part of the import
  • manually test w/ a larger GA site
  • fix any import accuracy issues

@mattab
Copy link
Member Author

mattab commented Aug 11, 2019

Hi Everyone! good news, we are almost done with the Google Analytics to Matomo Importer implementation. We'll start the beta release in a few days. Get ready for testing to import your Google Analytics data in Matomo 😎

@suzigrishpul
Copy link

suzigrishpul commented Aug 12, 2019 via email

@kimoudev
Copy link

kimoudev commented Aug 15, 2019

@mattab . Thanks for this great work, Please I have migrated from 3.11 to 3.12 but I don't see the google analytics import section.
Thank you,

@diosmosis
Copy link
Member

Hi @kimoudev the plugin hasn't been released yet, there are still a couple import accuracy bugs to work through before it's ready for a public beta. Should be soon though.

@kimoudev
Copy link

Hi @diosmosis . Ok, Thank you so much.

@mattab
Copy link
Member Author

mattab commented Aug 22, 2019

Hi Everyone,

The Google Analytics Importer for Matomo is now released and available to all! 🎉

We're looking forward to hearing how it goes for you to import your GA data into Matomo.
Kuddos @diosmosis for building this tool 💪
We're happy to help you migrate your GA data into Matomo and stay in control! 🚀
Thanks to everyone who will help test this tool and report their feedback.

@mattab
Copy link
Member Author

mattab commented Aug 22, 2019

To use the Google Analytics Importer, you first need to upgrade to the latest Matomo 3.12.0 beta version. This can be done in 3 clicks, see instructions here

@rosa2
Copy link

rosa2 commented Aug 23, 2019 via email

@mattab
Copy link
Member Author

mattab commented Aug 27, 2019

Hope you enjoy the Google Analytics importer tool. You can post your feedback in the issue tracker: https://github.com/matomo-org/plugin-GoogleAnalyticsImporter/issues

Will close it now as we consider it done. Exciting! 🎉

@mattab mattab closed this as completed Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests