Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing w3c extended logs #5418

Closed
anonymous-matomo-user opened this issue Jul 7, 2014 · 2 comments
Closed

Importing w3c extended logs #5418

anonymous-matomo-user opened this issue Jul 7, 2014 · 2 comments
Labels
Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@anonymous-matomo-user
Copy link

Hi,

Is there any documentation as to how we can import custom formats using the import_logs.py script?

We have some IIS logs that fail using any of the log-format-name options.

I'm a little confused as to how to use the log regex. Our IIS logs are currently setup as follows:

#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status sc-bytes 

Which produce log entries as such:

2014-06-03 05:14:44 W3SVC726003028 10.0.1.3 GET /index.html - 80 - 10.62.32.123 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/35.0.1916.114+Safari/537.36 - 304 0 0 344

Can you point me in the right direction with this?

Thanks,

Dan

@d-lumb
Copy link

d-lumb commented Jul 8, 2014

I've got as far as this with my regex -

--log-format-regex="(?P^\d+[-\d+]+\s[\d+:]+) \S+ \S+ (?P[A-Z]+) (?P\S+) \S+ \d+ \S+ (?P[.\d]+) \S+ (?P<user_agent>\S+) (?P\S+) (?P\d{3}) \d+ \d+ (?P\d+)"

But it's giving me an invalid date error on import.

2014-07-08 16:19:31,340: [DEBUG] Invalid line detected (invalid date): 2014-05-01 22:49:08 W3SVC726003028 10.0.1.3 GET /index.html - 80 - 119.63.193.131 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.0) - 200 0 64 0

Any ideas?

Thanks

@mattab mattab removed the P: normal label Aug 3, 2014
@mattab mattab added this to the Mid term milestone Aug 3, 2014
@diosmosis
Copy link
Member

The original poster's log will be parsed w/o any special options (the format will be auto-detected). For custom IIS fields, the --iis-map-field option can be used.

@d-lumb The regex won't work because the log importer uses named capture groups. You can see them in The IisFormat class in the log importer script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

4 participants