Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Data Import (dimension widening) #6613

Open
mattab opened this issue Nov 7, 2014 · 4 comments
Open

Custom Data Import (dimension widening) #6613

mattab opened this issue Nov 7, 2014 · 4 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@mattab
Copy link
Member

mattab commented Nov 7, 2014

The goal of this issue is to implement a new feature in Piwik that will let you import a custom data file (eg. CSV format) that contains extra information about either your users, your content and your products. When importing this data you tell Piwik how to match it to your people and content and then Piwik will automatically let you use the Custom Data for Segments, or display it in the UI (eg. Visitor Log), and more!

This feature is similar to Google Analytics Custom Data Import feature: https://support.google.com/analytics/answer/3191417 (which used to be called Dimension Widening)

Types of data you can import

  1. User data (log_visit) — import user metadata, such as a loyalty rating or lifetime customer value, and use these values with segmentation.
  2. Content data (log_action) — group content by importing content metadata, such as author, date published, and article category.
  3. Product Data (log_conversion_item) — gain better merchandising insights by importing product metadata, such as size, color, style, or other product-related dimensions.
  4. Campaign Data — use campaign tracking IDs and import ad campaign-related dimensions, such as source, medium.

Inspired from GA because their way just makes sense.

Workflow

I prepare a data file in CSV format.

  • The file contains a dimension known to Piwik (eg. userId) and then the file contains up to N custom data columns:
  • the file first row contains the column names.
  • One of the columns must be the column known to Piwik. eg. userId.
    • The Piwik dimension must be one of dimensions as found in the Segmentation reference
    • for example when using userId column in CSV file then it must match the User ID values set in Piwik.

Here is a file example:

userId, LifeTimeValue, Industry
clientid1, 500, IT
clientid2, 5, NGO
clientid5, 100, Farming
client765, 222, IT

Then as a Piwik Admin user or Super User:

  • I click to Settings > Custom Data
  • I select a website to import the custom data for
  • I select User as "Type of data to import" (I could also instead import Content or Product or Campaign custom data)
  • I give a name to my custom import eg "Customer LTV & Industry"
  • I select the file to upload and upload it (eg. in tmp/customdata)
  • After upload Piwik detects the columns
    • If a known Piwik dimension is not found in the columns, then display error message that a piwik dimension is required
  • Detect the Custom Data dimensions from the file
  • Ask user whether the custom data dimension(s) should be imported
  • Schedule the import as background task

Proposed steps / TBD

DB Schema:

  • new table piwik_custom_data keeps track of the custom data as simple lookup table
    • idimport, piwik_dimension_value, custom_dimension_value
    • eg. 1, client765, 222
    • storing the idimport lets us easily cleanup a given Custom Data Import
  • new table piwik_custom_data_dimensions
    • idimport, idsite, piwik_dimension_name, custom_dimension_name
    • eg. 1, 5, userId, LifeTimeValue
    • if the imported file contains several custom dimensions then several rows are created in the table, one for each dimension
  • new table piwik_custom_data_import
    • idimport, idsite, ts_imported, name, rows, login, status, deleted, filename
    • example entry for our test custom data import of 9 million rows:
    • 1, 2014-12-01 00:11:22, "Customer LTV & Industry", 9000000, matt, "pending", 0
    • Status could be: pending import, importing, imported

Notes

  • Build this feature in the API as well, so that we can do the import via UI or API (or later even console command)
  • New UI to upload custom data sets, New UI to view the existing data sets and their status
  • Make the custom data dimensions available as Segments
  • To make visualising the custom dimensions easy, display all custom data dimensions in the Visitor Log and Visitor Profile.
  • The custom data could also be used by plugins and /or displayed in new original ways!

Feel free to post your feedback in a comment!

@mattab mattab added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Nov 7, 2014
@mattab mattab added this to the Mid term milestone Nov 7, 2014
@mattab
Copy link
Member Author

mattab commented Nov 6, 2015

Related to powerful new feature: Custom Dimensions #9129

@mattab mattab modified the milestones: Long term, Mid term Dec 5, 2016
@dgregory94
Copy link

I was looking into this feature to be able to have a lookup for data that comes in from my websites. If I know that code XYX is really, "Bob's Burgers", and I want to create reports with Bob's Burgers instead of XYX. Would this feature be something that I could use or is there any other functionality that could solve this problem?

@mattab
Copy link
Member Author

mattab commented Nov 14, 2017

Yes, sounds like you need this feature. But it's not developed yet unfortunately. it would be great to have. In the meantime you can set as "Custom Dimension" the real value you need and then use Custom Dimensions reports to see the reports: https://piwik.org/docs/custom-dimensions/
You may also find custom reports interesting: https://piwik.org/docs/custom-reports/

@peterbo
Copy link
Contributor

peterbo commented Oct 17, 2018

Now, with CustomDimensions, this feature would be even more useful to be implemented:

For an initial version, perhaps it makes sense to restrict data import to the following types, because these are in my opinion the most valuable ones for CRM/cross platform use cases:

  1. Import CustomDimension(s) (visit(or) scope) for userID or visitorID
  2. Import Offline Conversion(s) (respectively Conversions from another platform / caller conversions, etc.) for userID or visitorID

In many cases it would make sense to introduce another scope for customDimensions: visitor scope (proposal here: https://github.com/matomo-org/plugin-CustomDimensions/issues/38)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

3 participants