Just pushing an issue your well aware of.
The visit_log table schema needs to be corrected to avoid unnecessarily large data and index sizes.
I get nowhere near 20,000 visits per day, but the index alone on my log_visits table is ~2MB w/ 14MB of data. Although it is not a bottle neck at this point I am rapidly expanding the number of sites and volume of visits tracked by Piwik.
I thought that was the intention of archives to reduce the log size?
I don't care that piwik us using ~70 MB altogether, as the archives are not commonly used during active tracking. But since an insert and join is needed on the archive and actions table for each visit, it seems like a troublesome path.
Thanks for the reminder. If you have patch to improve the schema, please open a new ticket. Discussions are better suited to the forum or piwik-hackers mailing list.
Archives, by their nature, are lossy summaries. Until we reach 1.0, it's advantageous to retain visit information (for debugging, recovery, regeneration, etc).
The key length in the indexes is quite short (relative to the row size), so the overhead has to be attributed to MySQL internals (e.g., BTREE and/or optimizer). For example, looking at the dev trunk, log_visit has these indexes explicitly defined:
You can also look at , where we changed an index for the archive tables. EXPLAIN on the ArchiveProcessing SELECT query shows the key length went from 708 (using index_all) to 15 (using index_idsite_dates_period). Note: for 0.5.5, the update script won't retroactively change the indexes on existing archive tables.
Anthon, why "Note: for 0.5.5, the update script won't retroactively change the indexes on existing archive tables."?
I would expect the index to be updated during upgrade to keep the DB schema consistent across all installations.
I reopened #1129