1.11 - import_logs.py broken parser #3805

anonymous-matomo-user · 2013-03-08T20:59:35Z

There's broken parser since 1.11

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combined

1.2.3.4 - - [07/Mar/2013:08:46:03 +0100] "GET /19.pdf HTTP/1.0" 200 4324023 "-" "Apache-HttpClient/4.2.1 (java 1.5)" 276 4324023
5.6.7.8 - - [07/Mar/2013:05:12:44 +0100] "GET /index.htm HTTP/1.1" 206 33106 "http://refer" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22" 437 33106

1.10.1

Direct Entry http://somesite/19.pdf
Website: refer http://somesite/index.htm

1.11

Direct Entry http://somesite/19.pdf%20HTTP/1.0%22%20200%204324023%20%22-%22%20%22Apache-HttpClient/4.2.1%20(java
Direct Entry http://somesite/

The text was updated successfully, but these errors were encountered:

mattab · 2013-03-11T09:23:50Z

In 2debefd: Refs #3805 Adding test and it's working OK so cannot replicate problem...

mattab · 2013-03-11T09:25:57Z

please submit failing test or more info as it seems to work 2debefd#L0R14

anonymous-matomo-user · 2013-03-11T19:04:03Z

Attachment:
gentoo.log

anonymous-matomo-user · 2013-03-11T19:07:55Z

Can you try import attached gentoo.log directly?

It fails for me with default import_logs.py. When I change -

        self.regex = re.compile(regex + '\s*$') # make sure regex includes end of line

back to what was there in 1.10.1 it works

        self.regex = re.compile(regex)

so I guess the problem is somewhere there, but I cannot figure out what.

thanks

diosmosis · 2013-03-16T22:19:12Z

@matt, I found the issue w/ this bug and uploaded a fix in master...3805, can you review? Here's an explanation of the fix:

Seems that the change of order to the format exposed a couple bugs. First the change I committed before isn't flexible enough w/ log files (like gentoo.log which is ncsa_extended w/ two extra fields). Also, regex quantifiers are greedy, so now that common gets tested before ncsa_extended, 'common' matches w/ incorrect groups and ncsa_extended is never tried.

I've reverted my original change (the '\s*$' one) and modified the format autodetection logic to use the format that returns the most groups (ie, the one that matches and returns the most information). I've also modified the regexes to account for regex greediness. There are some extra tests and I fixed a bug w/ the S3 regex.

mattab · 2013-03-16T23:02:38Z

all looks good to me!

diosmosis · 2013-03-23T05:30:38Z

@matt Ok, greediness was only an issue w/ my change, so I've reverted that and added some more tests. I've pushed it again so you can take a look if you want. Will commit tomorrow.

diosmosis · 2013-03-23T22:33:27Z

In 62b43d8: Fixes #3805, reverted change in log importer that looked for end-of-line after format regex match and modified format autodetection logic to pick the format based on whether the format matches and the number of groups returned in the match.

Notes:

Added several more tests to log importer tests.py. Added tests for checking format of log files w/ extra junk info on log lines. Added individual tests for parsing regex format.
Modified log files used in ImportLogs test, added extra junk info to end of some lines.
Fixed failing test in tests.py for the S3 log file format.

anonymous-matomo-user added this to the 1.12 - The Great 1.x Backlog milestone Jul 8, 2014

anonymous-matomo-user added T: Bug labels Jul 8, 2014

anonymous-matomo-user assigned diosmosis Jul 8, 2014

diosmosis mentioned this issue Mar 25, 2013

Error with import_logs.py when importing IIS logs #3834

Closed

mattab mentioned this issue Mar 29, 2013

Import of logfiles fails #3862

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.11 - import_logs.py broken parser #3805

1.11 - import_logs.py broken parser #3805

anonymous-matomo-user commented Mar 8, 2013

mattab commented Mar 11, 2013

mattab commented Mar 11, 2013

anonymous-matomo-user commented Mar 11, 2013

anonymous-matomo-user commented Mar 11, 2013

diosmosis commented Mar 16, 2013

mattab commented Mar 16, 2013

diosmosis commented Mar 23, 2013

diosmosis commented Mar 23, 2013

1.11 - import_logs.py broken parser #3805

1.11 - import_logs.py broken parser #3805

Comments

anonymous-matomo-user commented Mar 8, 2013

mattab commented Mar 11, 2013

mattab commented Mar 11, 2013

anonymous-matomo-user commented Mar 11, 2013

anonymous-matomo-user commented Mar 11, 2013

diosmosis commented Mar 16, 2013

mattab commented Mar 16, 2013

diosmosis commented Mar 23, 2013

diosmosis commented Mar 23, 2013