Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import_logs.py fails to detect log type with multiple IPs in first line #4230

Closed
anonymous-matomo-user opened this issue Oct 18, 2013 · 5 comments
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Milestone

Comments

@anonymous-matomo-user
Copy link

When using the X-forwarded-for header for load-balanced sites or proxied traffic, it is possible for the webserver to record multiple IPs on a line. This appears to break the log detection of import_logs.py.

Broken example log:

218.108.232.188, 10.183.250.139 - - [17/Oct/2013:00:33:34 -0400] "GET /blog/ HTTP/1.0" 200 11714 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)"
108.128.162.178 - - [17/Oct/2013:00:33:47 -0400] "GET / HTTP/1.1" 200 8040 "http://www.referringsite.com/news/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.26 Safari/537.36"

Stack trace:

Traceback (most recent call last):
  File "/var/www/piwik/misc/log-analytics/import_logs.py", line 1575, in <module>
    main()
  File "/var/www/piwik/misc/log-analytics/import_logs.py", line 1539, in main
    parser.parse(filename)
  File "/var/www/piwik/misc/log-analytics/import_logs.py", line 1390, in parse
    format = self.detect_format(file)
  File "/var/www/piwik/misc/log-analytics/import_logs.py", line 1349, in detect_format
    logging.debug('Format %s is the best match', format.name)
AttributeError: 'NoneType' object has no attribute 'name'

While a quick fix is to move any offending lines beyond a "good" line, this is not easily automated.

Modifying log above so script works:

108.128.162.178 - - [17/Oct/2013:00:33:47 -0400] "GET / HTTP/1.1" 200 8040 "http://www.referringsite.com/news/" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.26 Safari/537.36"
218.108.232.188, 10.183.250.139 - - [17/Oct/2013:00:33:34 -0400] "GET /blog/ HTTP/1.0" 200 11714 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)"

I admit, my python-foo is not excellent but I may look over the weekend and try to patch the code. I believe the best option is to catch the error in detection and try the next line.

@anonymous-matomo-user
Copy link
Author

For anyone looking for a workaround, I tested specifying the log format with --log-format-name and that appears to work.

@mattab
Copy link
Member

mattab commented Oct 23, 2013

Is this format of specifying IPs "218.108.232.188, 10.183.250.139" common and widely accepted? can you point to official docs from Apache and/or Nginx showing this ? if so we'd like to fix. Pull request very welcome.

@anonymous-matomo-user
Copy link
Author

According to wikipedia (and some years of experience), it "is the de-facto standard for identifying the originating IP address of a client connecting to a web server through an HTTP proxy or load balancer." ![1] Personally, I do not know of another way to get the originating IP of a client machine on a load balanced site.

The format is officially defined here http://tools.ietf.org/html/draft-ietf-appsawg-http-forwarded-10#section-5.2

I believe the fix is simpler than I originally thought, just add an optional , &lt;ip> to the existing regex. It should be able to accept a (theoretically) infinite number of IPs. I've seen as many as 4 but I am not aware of a hard limit.

![1] - http://en.wikipedia.org/wiki/X-Forwarded-For

@anonymous-matomo-user
Copy link
Author

#137 has been submitted for this issue. Please review and let me know if there is more to do.

@mattab
Copy link
Member

mattab commented Mar 12, 2015

Issue was moved to the new repository for Piwik Log Analytics: https://github.com/piwik/piwik-log-analytics/issues

refs #7163

@mattab mattab closed this as completed Mar 12, 2015
@mattab mattab added the duplicate For issues that already existed in our issue tracker and were reported previously. label Mar 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate For issues that already existed in our issue tracker and were reported previously. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

2 participants