Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./import_logs.py fatal error processing Jetty request logs (log files with multiple spaces) #7228

Closed
degenaro opened this issue Feb 16, 2015 · 16 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@degenaro
Copy link

See http://forum.piwik.org/read.php?2,124212

10.199.199.10 - - [16/Feb/2015:10:43:45 +0000] "GET /jobs.jsp HTTP/1.1" 200 0 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /ducc-servlet/cluster-name HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /ducc-servlet/version HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /ducc-servlet/login-link HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /ducc-servlet/logout-link HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /ducc-servlet/classic-jobs-data HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /js/ducc.local.js?=1424083427017 HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /js/ducc.local.js?
=1424083427018 HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
10.199.199.10 - - [16/Feb/2015:10:43:47 +0000] "GET /ducc-servlet/authenticator-version HTTP/1.1" 200 0 "http://192.168.6.67:42133/jobs.jsp" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"

@diosmosis
Copy link
Member

If I use the latest log importer script, I can successfully import these visits (log importer output is below).

Can you try importing w/ the latest script: https://raw.githubusercontent.com/piwik/piwik/master/misc/log-analytics/import_logs.py

Log importer output:

0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log ./test.log...
9 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)

Logs import summary
-------------------

    7 requests imported successfully
    0 requests were downloads
    2 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        0 invalid log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        0 requests done by bots, search engines...
        2 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    7 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 1 seconds
    Requests imported per second: 4.54 requests per second

@degenaro
Copy link
Author

Still fails.

bash-4.1$ ./import_logs.py --url=http://localhost/piwik --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --token-auth=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 2015_02_16.request.log
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log 2015_02_16.request.log...
Fatal error: cannot automatically determine the log format using the first 100000 lines of the log file.
Maybe try specifying the format with the --log-format-name command line argument.

I would attach the entire log file that fails but "Unfortunately, we don't support that file type." when I try to.

@degenaro degenaro reopened this Feb 18, 2015
@degenaro
Copy link
Author

Sorry, closed by accident.

@diosmosis
Copy link
Member

Does it work w/ the subset of logs you posted in the ticket? Ie, if you copy-paste them to a file and run the importer w/ just those logs, will it run, successfully?

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Feb 19, 2015
@mattab mattab added this to the Piwik 2.12.0 milestone Feb 19, 2015
@degenaro
Copy link
Author

Yes, that works. The entire log comprises much more and that fails.

On Wed, Feb 18, 2015 at 11:38 AM, Benaka notifications@github.com wrote:

Does it work w/ the subset of logs you posted in the ticket? Ie, if you
copy-paste them to a file and run the importer w/ just those logs, will it
run, successfully?


Reply to this email directly or view it on GitHub
#7228 (comment).

@mattab
Copy link
Member

mattab commented Feb 19, 2015

@degenaro then can you paste us a log that fails, so we can reproduce the issue?

@degenaro
Copy link
Author

I tried to cut and paste here (which was tedious due to cut and paste buffer size?), then got an error when trying to close and comment because the amount of data was too large?

The file comprising the log is 1448 lines long.

@mattab
Copy link
Member

mattab commented Feb 20, 2015

@degenaro post the logs on http://pastebin.com/ or another similar site?

@mattab mattab reopened this Feb 20, 2015
@degenaro
Copy link
Author

[log files redacted]

@degenaro
Copy link
Author

The above log is 412 lines. It fails for me.

@degenaro degenaro reopened this Feb 23, 2015
@diosmosis
Copy link
Member

There are only 305 lines in your comment (it is cut off at the end), and they correctly parsed by the latest log importer. Please use an external service (like dropbox or pastebin) to post the log, or email an archive to hello@piwik.org.

@degenaro
Copy link
Author

Failing log sent via e-mail.

@diosmosis
Copy link
Member

I received the logs and can reproduce the error, will post here when I find the cause.

@diosmosis
Copy link
Member

Ok, the error is due to spaces in the log lines. Some fields are separated by two spaces instead of one and the log importer can't handle that (for non W3C extended log formats). Working on a fix.

diosmosis pushed a commit that referenced this issue Feb 24, 2015
diosmosis pushed a commit that referenced this issue Feb 25, 2015
@mattab mattab changed the title ./import_logs.py fatal error processing Jetty request logs? ./import_logs.py fatal error processing Jetty request logs (log files with multiple spaces) Mar 2, 2015
@mattab mattab modified the milestones: Piwik 2.11.2, Piwik 2.12.0 Mar 2, 2015
@degenaro
Copy link
Author

degenaro commented Mar 7, 2015

When will the fix appear in a release and how do I get the fix between now
and then?

Thanks.

Lou.

On Sun, Mar 1, 2015 at 11:29 PM, Matthieu Aubry notifications@github.com
wrote:

Closed #7228 #7228 via #7294
#7294.


Reply to this email directly or view it on GitHub
#7228 (comment).

@diosmosis
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants