Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DB Schema for performance #1139

Closed
anonymous-matomo-user opened this issue Feb 2, 2010 · 2 comments
Closed

Improve DB Schema for performance #1139

anonymous-matomo-user opened this issue Feb 2, 2010 · 2 comments
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@anonymous-matomo-user
Copy link

Just pushing an issue your well aware of.

The visit_log table schema needs to be corrected to avoid unnecessarily large data and index sizes.

http://dev.piwik.org/trac/wiki/DatabaseSchema
#5457

I get nowhere near 20,000 visits per day, but the index alone on my log_visits table is ~2MB w/ 14MB of data. Although it is not a bottle neck at this point I am rapidly expanding the number of sites and volume of visits tracked by Piwik.

I thought that was the intention of archives to reduce the log size?

I don't care that piwik us using ~70 MB altogether, as the archives are not commonly used during active tracking. But since an insert and join is needed on the archive and actions table for each visit, it seems like a troublesome path.
Keywords: db

@robocoder
Copy link
Contributor

Thanks for the reminder. If you have patch to improve the schema, please open a new ticket. Discussions are better suited to the forum or piwik-hackers mailing list.

Archives, by their nature, are lossy summaries. Until we reach 1.0, it's advantageous to retain visit information (for debugging, recovery, regeneration, etc).

The key length in the indexes is quite short (relative to the row size), so the overhead has to be attributed to MySQL internals (e.g., BTREE and/or optimizer). For example, looking at the dev trunk, log_visit has these indexes explicitly defined:

  • PRIMARY KEY(idvisit) - 4 bytes
  • KEY index_idsite_date_config (idsite, visit_server_date, config_md5config(8)) - 16 bytes

You can also look at [1807], where we changed an index for the archive tables. EXPLAIN on the ArchiveProcessing SELECT query shows the key length went from 708 (using index_all) to 15 (using index_idsite_dates_period). Note: for 0.5.5, the update script won't retroactively change the indexes on existing archive tables.

@mattab
Copy link
Member

mattab commented Feb 11, 2010

Anthon, why "Note: for 0.5.5, the update script won't retroactively change the indexes on existing archive tables."?

I would expect the index to be updated during upgrade to keep the DB schema consistent across all installations.

I reopened #1129

@anonymous-matomo-user anonymous-matomo-user added this to the Piwik 0.5.5 milestone Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

3 participants