New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Importing logs very slow - 2 days for 1 Gb #10248
Comments
The archiving phase took about 2 hours. |
In case the "Provider" plugin is enabled, try to disable it. This can make it much faster |
fyi @tsteur the
It's decent number, but to improve performance of the log import process, one can try to increase the imported requests per second. We have before got 600 or even 800 requests per second using log analytics. It will help performance if you use the QueuedTracking plugin: http://plugins.piwik.org/QueuedTracking |
Since this import I set up Piwik to do an hourly import/analyse run as per the docs. What I'm seeing is spectacularly bad performance that's interfering with normal server operation. It's clear from my activity monitoring that Piwik restarts its analysis from the begining of the log file every time it's run, and it hits the database for every single record, every time - so as the current log file gets bigger, Piwik's processing time gets longer and longer, until the log file is rotated and it drops back down again. You can see this happening in this graph: the little peaks get bigger through the day: Over a week it looks like this: This doest look like much, but it's actually saturating disk access. Here's the database POV: This is all due to Piwik activity because the app on this server only uses MyISAM, and this graph only shows InnoDB, which Piwik uses. You can see here the InnoDB activity since installing Piwik: the initial import (that I originally opened this ticket for) followed by 2 weeks of hourly imports: What I gather from this is that Piwik does not know how to skip records that have been seen before - so rather than looking up the latest timestamp in the DB and then scanning the log file and ignoring all lines prior to that, or storing a log file inode and byte ofset so that it can jump straight to where it left off, it starts from the beginning every time and absolutely hammers the database and iowait. I'm not using reverse DNS lookups, I don't need them for this case. Is this normal? |
@Synchro Thanks for the feedback. it is "normal" that Piwik works in this way, but we are aware this is not good enough and we hope to do better in the future 👍 I've created a new issue to discuss this particular problem: Log Analytics could detect log lines that were already imported and skip them automatically #144 I hope the description helps. for now you may have to workaround this issue by importing each log file only once, eg. once a day. |
I've just started using Piwik to analyse some web server logs (vanilla nginx access logs) for 2 sites on the same server, one with around 9 million log lines, the other with 24million, combined compressed log size around 1Gb, covering 1 year of traffic. I'm only interested in file-based analysis, so I'm not using the javascript tracker at all.
The import worked, but it was very, very slow, taking over 2 days for the import. The server has 5 CPUs and 8G of RAM and is running PHP 5.4 and MySQL 5.6, and is generally lightly loaded. Here are the results of the second site's import using 3 importer processes:
I don't have the stats from the first import, but it was also very long.
The resulting MySQL database is 6.4Gb on disk. I have started the Piwik archiving process to process these logs, but there is no indication of how long it might take - is there any way to find out? So far I've got feedback like this:
So to pre-process all daily stats for 1 site took about 50 mins. I'm concerned that the amount of CPU and database activity it's using may affect the site.
Is this kind of performance normal for Piwik? I've previously used AWStats to process similar-sized logs and had it take only a few minutes.
The text was updated successfully, but these errors were encountered: