@anthosz opened this Issue on October 7th 2021

Hello,

Expected Behavior

When I check a custom segment (set in the url directly, not created via gui), I'm supposed to see only results that can be found in log_* tables.

Current Behavior

Depending of the custom segment, It's randomly the case (only one website is set). These 2 examples have several requests by days since several years.
Example:

I have the feeling that for the first point, it seems that segment is related to this path but the issue is that there is no segment set in the db:

  • 0 segments (0 pre-processed, 0 processed in real-time)
  • 0 tracking failures
  • 1 websites
  • 18 activated plugins

Steps to Reproduce (for Bugs)

For some url, it works and for some urls, doesn't works. It's depending of the url but don't know why.

Context

Trying to get statistics from filter in a long period.

As far as I can see, if no segment specified, it's done in "live" (so based on the log_* tables):
https://developer.matomo.org/guides/archiving
https://developer.matomo.org/guides/database-schema#log-data-persistence

But in fact, there is few url that have their own statistics in the archive tables (without any existing segment) in several periods (by example for the first point mentioned in current behavior: we have visits every months but I can see visits for may -> october (nothing in 2020 or during this year between january & april (don't know why) but stats in 2019)).

Your Environment

  • Matomo version: 4.4.1
  • MySQL version: 5.7
  • PHP version: 7.4
  • log are provided by access log
@tsteur commented on October 8th 2021 Member

As far as I can see, if no segment specified, it's done in "live" (so based on the log_* tables):

That's the case unless browser archiving is disabled and in config/config.ini.php the option browser_archiving_disabled_enforce = 1 is set. As it seems to work for some other segments that doesn't seem to be the case for you.

In the first example you are viewing a week in 2017
image

is that expected or did you want to view a range over several years? In that case you would also need to adjust period and date in the URL after the # hash part

@anthosz commented on October 8th 2021

@tsteur thank you for your answer. I confirm that browser_archiving_disabled_enforce = 0 is set (from global config).

image

So I don't understand why few data's missing between end 2019 & may 2021 (config was never modified in config.ini.php)

Concerning another 'custom segment' (set by url (contact.html in the previous example)):
image

I cannot see anything before july (180 days retentions) but it's not the first time that I do this "custom segment" (in live via url)

Best regards,

@tsteur commented on October 11th 2021 Member

@anthosz I'm assuming there are actually periods matching that segment where it doesn't show any data?

Does it show up if you invalidate the reporting data for that period where it currently shows 0 see https://matomo.org/faq/how-to/faq_155/

This might cause the data to show up if there should be any. It won't answer why it's not there in the first place. It's possible that maybe there was a bug in some older version of Matomo that caused it to not appear or it could be also few other reasons but it's hard to track these down.

@anthosz commented on October 11th 2021

@tsteur I think yes that it match this segment.

Invalidation works but I cannot replay the archiving (I have only 3 months retention concerning logs_*).

It's possible that can be related to an older version of Matomo yes.

So to summarize:
Once an URL is tracked -> someone "generate" a new segment on matomo (via url directly) -> this url (segment/report generation for this specific url) is supposed to be kept (and generated via report) during all time of reporting data retention ?

Do you have a way to find the segments list? Due to the fact it's done via url directly, it's doesn't appear in segment table.

Or if it's in archive_blob, is there a simple way to extract the list? I cannot find a way to show the datas

@tsteur commented on October 13th 2021 Member

You might also want to check if there's any special configuration for below config process_new_segments_from in your /config/config.ini.php:

image

If you have only 3 months log data retention then indeed invalidating archiving is not an option. If you create these segments dynamically via URL then there's actually not really any way to find an option for this. Possible that the API was not called for that segment during the other months and thus the reports weren't created? I think in these cases you might want to actually create fixed segments so they are in the segment table and so reports are always created using "auto archiving" see https://matomo.org/docs/setup-auto-archiving/ (unless your segments are very dynamic, then you would otherwise need to make sure to request all the data reports for the logs get deleted)

@anthosz commented on October 14th 2021

@tsteur Thank you very much!

Indeed, segments are very dynamic so we will check how to deal with it. At least, all is clear in my mind now :)

I have only a last one thing, I checked in log_action table and something seems strange:

2 | localhost/test.html | 2674612 | 1 | 0
3 | 301/URL = http%3A%2F%localhost%2Ftest | 1204892756 | 4 | NULL

How it is possible to see category 4 while we use access log?

According to the doc: category 4 are supposed to be name page only but it's not in the accesslog.
Moreover, the format is always the same.

It is possible to tell to Matomo to disabling few categories?

@tsteur commented on October 14th 2021 Member

@anthosz it's not possible to disable certain categories.

I'm assuming you're looking in the access logs for the website and not Matomo?

It seems to me like someone is tracking 301 redirects there similar to what we do in our guide for 404 error pages: https://matomo.org/faq/how-to/faq_60/ . Meaning someone is setting a custom page title there.

Maybe someone is tracking into your production analytics by running the site or app locally or so?

I'll close this for now as I think the original issue is clarified. Happy to still follow up in comments to clarify this too.

@anthosz commented on October 15th 2021

@tsteur Indeed, all visits are generated via access log and the py script. All the logs have the same format/syntax (contain only logs from specific path for specific domain).

During the import, I use --enable-http-errors & --enable-http-redirects so probably related?

I have also the feeling that it move to category 4 only when a referrer is found. If no referrer, the 301 seems to be Category 1.

I don't really understand why because the referrer is supposed to be in log_visit table.

@tsteur commented on October 18th 2021 Member

During the import, I use --enable-http-errors & --enable-http-redirects so probably related?

Yes, that would be causing this 👍

I have also the feeling that it move to category 4 only when a referrer is found. If no referrer, the 301 seems to be Category 1.

I don't think that's the case. This should happen only when the response code starts with 4XX or 5XX (done in https://github.com/matomo-org/matomo-log-analytics/blob/4.x-dev/import_logs.py#L2197-L2218 )

@anthosz commented on October 18th 2021

@tsteur Thank you very much! All is clear now :)

This Issue was closed on October 14th 2021
Powered by GitHub Issue Mirror