Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API call Contents.getContentNames randomly buggy #15628

Open
feikede opened this issue Feb 27, 2020 · 16 comments
Open

API call Contents.getContentNames randomly buggy #15628

feikede opened this issue Feb 27, 2020 · 16 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc.

Comments

@feikede
Copy link

feikede commented Feb 27, 2020

This call randomly returns an empty [] json. I.e. currently:

/?method=Contents.getContentNames&label=anwalt%3A6122&date=2020-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e...384

returns [] with headers


Cache-Control | must-revalidate
-- | --
Connection | Keep-Alive
Content-Length | 2
Content-Type | application/json; charset=utf-8
Date | Thu, 27 Feb 2020 08:55:27 GMT
Keep-Alive | timeout=5, max=200
Server | Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.3.10
X-Matomo-Request-Id | 85f8e
X-Powered-By | PHP/7.3.10

While the exact same call with segmentation by label returns a (the correct) result:
/?method=Contents.getContentNames&segment=contentName==anwalt%3A6122&label=anwalt%3A6122&date=2020-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e...


Cache-Control | must-revalidate
-- | --
Connection | Keep-Alive
Content-Length | 201
Content-Type | application/json; charset=utf-8
Date | Thu, 27 Feb 2020 08:46:51 GMT
Keep-Alive | timeout=5, max=200
Server | Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.3.10
X-Matomo-Request-Id | 5fd41
X-Powered-By | PHP/7.3.10

[{"label":"anwalt:6122","nb_visits":6,"nb_impressions":10,"nb_interactions":10,"sum_daily_nb_uniq_visitors":5,"interaction_rate":"100\u00a0%","segment":"contentName==anwalt%3A6122","idsubdatatable":1}]

Think this is a bug, or am I missing something?

BTW: This seems to be related to any kind of caching. Sometimes the call WITH segmentation delivers wrong results - sometimes without - and sometimes it's all gone one day later (have the cron running every hour, have about 10K req/day). Have latest V 3.13.3 installed.

@tsteur
Copy link
Member

tsteur commented Feb 27, 2020

Random thought here.

  • Data is archived, eg shortly after midnight
  • First tracking requests with content comes in
  • Previous data is invalidated
  • Data is archived
  • Previous data is invalidated
  • API is fetched before the next archive finished, it's maybe not using the latest invalidated archive for some reason but the first one for example.
  • Data is archived

I haven't tried to reproduce this. Looking at the code we would use the most recent invalidated archive though because of https://github.com/matomo-org/matomo/blob/3.13.3/core/DataAccess/Model.php#L243 and https://github.com/matomo-org/matomo/blob/3.13.3/core/DataAccess/ArchiveSelector.php#L128-L136 where we should use the first idArchive which is the most recent. Quickly looked into it and would say it almost can't be that.

Could be some other race condition as I've never heard of it before. Like not sure if at some point maybe a done flag is written to early for example.

@feikede do you know if you have browser archiving enabled? You can see this in Admin -> General Settings. Also for how long is the empty result returned if you know? Like are we talking few seconds here or few minutes?
Thanks for mentioning the issue and your help 👍

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Feb 27, 2020
@feikede
Copy link
Author

feikede commented Feb 28, 2020

@tsteur Thanks for your analysis. As for your questions:

  • we are talking about some hours here (not minutes or seconds)
  • browser archiving is not enabled

Is it normal to have about 2500 tables in the database?

Maybe the matomo-archive.log helps?

INFO [2020-02-28 12:05:01] 26904  ---------------------------
INFO [2020-02-28 12:05:01] 26904  INIT
INFO [2020-02-28 12:05:01] 26904  Running Matomo 3.13.3 as Super User
INFO [2020-02-28 12:05:01] 26904  ---------------------------
INFO [2020-02-28 12:05:01] 26904  NOTES
INFO [2020-02-28 12:05:01] 26904  - Async process archiving supported, using CliMulti.
INFO [2020-02-28 12:05:01] 26904  - Reports for today will be processed at most every 900 seconds. You can change this value in Matomo UI > Settings > General Settings.
INFO [2020-02-28 12:05:01] 26904  - Reports for the current week/month/year will be requested at most every 3600 seconds.
INFO [2020-02-28 12:05:01] 26904  - Archiving was last executed without error 59 min 54s ago
INFO [2020-02-28 12:05:01] 26904  - Will invalidate archived reports for 2020-02-28 for following websites ids: 1
INFO [2020-02-28 12:05:01] 26904  ---------------------------
INFO [2020-02-28 12:05:01] 26904  START
INFO [2020-02-28 12:05:01] 26904  Starting Matomo reports archiving...
INFO [2020-02-28 12:05:01] 26904  - tracking data found for website id 1 since 2020-02-28 11:05:07 UTC (since the last successful archiving)
INFO [2020-02-28 12:05:01] 26904  Will pre-process for website id = 1, period = day, date = last1
INFO [2020-02-28 12:05:01] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:02] 26904  Archived website id = 1, period = day, 0 segments, 472 visits in last 2 days, 113 visits today, Time elapsed: 1.200s
INFO [2020-02-28 12:05:03] 26904  Will pre-process for website id = 1, period = week, date = last1
INFO [2020-02-28 12:05:03] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:04] 26904  Archived website id = 1, period = week, 0 segments, 0 visits in last 2 weeks, 0 visits this week, Time elapsed: 1.172s
INFO [2020-02-28 12:05:04] 26904  Will pre-process for website id = 1, period = month, date = last1
INFO [2020-02-28 12:05:04] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:05] 26904  Archived website id = 1, period = month, 0 segments, 0 visits in last 2 months, 0 visits this month, Time elapsed: 1.231s
INFO [2020-02-28 12:05:05] 26904  Will pre-process for website id = 1, period = year, date = last1
INFO [2020-02-28 12:05:05] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:06] 26904  Archived website id = 1, period = year, 0 segments, 0 visits in last 2 years, 0 visits this year, Time elapsed: 1.354s
INFO [2020-02-28 12:05:06] 26904  Archived website id = 1, 4 API requests, Time elapsed: 4.972s [1/1 done]
INFO [2020-02-28 12:05:06] 26904  Done archiving!
INFO [2020-02-28 12:05:06] 26904  ---------------------------
INFO [2020-02-28 12:05:06] 26904  SUMMARY
INFO [2020-02-28 12:05:06] 26904  Total visits for today across archived websites: 113
INFO [2020-02-28 12:05:06] 26904  Archived today's reports for 1 websites
INFO [2020-02-28 12:05:06] 26904  Archived week/month/year for 1 websites
INFO [2020-02-28 12:05:06] 26904  Skipped 0 websites
INFO [2020-02-28 12:05:06] 26904  - 0 skipped because no new visit since the last script execution
INFO [2020-02-28 12:05:06] 26904  - 0 skipped because existing daily reports are less than 900 seconds old
INFO [2020-02-28 12:05:06] 26904  - 0 skipped because existing week/month/year periods reports are less than 3600 seconds old
INFO [2020-02-28 12:05:06] 26904  Total API requests: 4
INFO [2020-02-28 12:05:06] 26904  done: 1/1 100%, 113 vtoday, 1 wtoday, 1 wperiods, 4 req, 5060 ms, no error
INFO [2020-02-28 12:05:06] 26904  Time elapsed: 5.060s
INFO [2020-02-28 12:05:06] 26904  ---------------------------
INFO [2020-02-28 12:05:06] 26904  SCHEDULED TASKS
INFO [2020-02-28 12:05:06] 26904  Starting Scheduled tasks... 
INFO [2020-02-28 12:05:06] 26904  Scheduler: executing task Piwik\Plugins\CustomPiwikJs\Tasks.updateTracker...
INFO [2020-02-28 12:05:06] 26904  Scheduler: finished. Time elapsed: 0.001s
INFO [2020-02-28 12:05:06] 26904  Scheduler: executing task Piwik\Plugins\PrivacyManager\Tasks.deleteLogData...
INFO [2020-02-28 12:05:06] 26904  Scheduler: finished. Time elapsed: 0.001s
INFO [2020-02-28 12:05:06] 26904  Scheduler: executing task Piwik\Plugins\PrivacyManager\Tasks.anonymizePastData...
INFO [2020-02-28 12:05:06] 26904  Scheduler: finished. Time elapsed: 0.000s
INFO [2020-02-28 12:05:06] 26904  done
INFO [2020-02-28 12:05:06] 26904  ---------------------------

I'tt try to track down the next case in detail.

@tsteur
Copy link
Member

tsteur commented Feb 29, 2020

2500 tables is not quite normal, is it maybe a rather old installation? Possible there was a bug a while ago.

Not available for few hours doesn't sound normal. Is in the same time the result not available in the UI either?

@feikede
Copy link
Author

feikede commented Feb 29, 2020

Ah, sorry, it's only 107 tables, I looked at the wrong number.
And the result is available in the UI while it's not correct in the reporting API at the same time. I'll try to find another case today to track that down.

@tsteur
Copy link
Member

tsteur commented Mar 1, 2020

Thanks @feikede . Is there maybe any caching active in your server set up or so? I'm asking because the UI is also just using the API and I can't think of a reason why the UI should be showing data while the API doesn't. Not sure what your server setup looks like or if your webserver sends maybe some caching headers.

@feikede
Copy link
Author

feikede commented Mar 4, 2020

No, there's no caching, it's a plain matomo LAMP install on CentOS 7.7. I have another case today (exactly now):
/?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3

returns empty array [], while

/?method=Contents.getContentNames&segment=contentName==anwalt:2695&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3

returns

[{"label":"anwalt:2695","nb_visits":10,"nb_impressions":12,"nb_interactions":2,"sum_daily_nb_uniq_visitors":10,"interaction_rate":"16,67\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":1}]

and

/?method=Contents.getContentNames&segment=contentName==anwalt:2695&label=anwalt:2695&date=last30&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3

returns

{"2020-02-04":[],"2020-02-05":[],"2020-02-06":[],"2020-02-07":[],"2020-02-08":[],"2020-02-09":[],"2020-02-10":[],"2020-02-11":[],"2020-02-12":[],"2020-02-13":[],"2020-02-14":[],"2020-02-15":[],"2020-02-16":[],"2020-02-17":[],"2020-02-18":[],"2020-02-19":[],"2020-02-20":[],"2020-02-21":[],"2020-02-22":[],"2020-02-23":[],"2020-02-24":[],"2020-02-25":[],"2020-02-26":[],"2020-02-27":[],"2020-02-28":[],"2020-02-29":[],"2020-03-01":[],"2020-03-02":[],"2020-03-03":[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":1}],"2020-03-04":[{"label":"anwalt:2695","nb_uniq_visitors":1,"nb_visits":1,"nb_impressions":3,"nb_interactions":2,"interaction_rate":"66,67\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":1}]}

while

/?method=Contents.getContentNames&label=anwalt:2695&date=last30&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3
returns

{"2020-02-04":[],"2020-02-05":[],"2020-02-06":[],"2020-02-07":[],"2020-02-08":[],"2020-02-09":[],"2020-02-10":[],"2020-02-11":[],"2020-02-12":[],"2020-02-13":[],"2020-02-14":[],"2020-02-15":[],"2020-02-16":[],"2020-02-17":[],"2020-02-18":[],"2020-02-19":[],"2020-02-20":[],"2020-02-21":[],"2020-02-22":[],"2020-02-23":[],"2020-02-24":[],"2020-02-25":[],"2020-02-26":[],"2020-02-27":[],"2020-02-28":[],"2020-02-29":[],"2020-03-01":[],"2020-03-02":[],"2020-03-03":[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":39}],"2020-03-04":[]}

The UI shows 2 Impressions on 3.3.20 (4.3 is missing here, and no impr. before march) - and it's totally different from above's "label":"anwalt:2695","nb_visits":10,"nb_impressions":12

So in summary, this seems pretty useless. Do you know what's the suggested way to count content impressions and API-report them in matomo?

@tsteur
Copy link
Member

tsteur commented Mar 4, 2020

@sgiehl @diosmosis maybe any idea right away how UI and API might report different results?

@diosmosis
Copy link
Member

@feikede Does this only happen with dates that include today? Or does it happen w/ earlier dates? Does this affect the export link provided by the UI as well? Does it only affect ranges?

@tsteur I don't know why the UI would and API would report different results, seems very strange, but seems likely related to range archiving in some way.

@feikede
Copy link
Author

feikede commented Mar 5, 2020

Today
/?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e3..
still returns [], matching access_log entry is
2.x.y.180 - - [05/Mar/2020:08:41:45 +0100] "GET /?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=yy HTTP/1.1" 200 2 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0"

@diosmosis
/?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,2020-03-04&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..
also returns []

While
/?method=Contents.getContentNames&label=anwalt:2695&date=last5&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e3..
returns
{"2020-03-01":[],"2020-03-02":[],"2020-03-03":[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":39}],"2020-03-04":[{"label":"anwalt:2695","nb_uniq_visitors":1,"nb_visits":1,"nb_impressions":3,"nb_interactions":2,"interaction_rate":"66,67\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":145}],"2020-03-05":[]}
now.

@diosmosis How should I test the "export link" issue?

@feikede
Copy link
Author

feikede commented Mar 5, 2020

btw:
/?method=Contents.getContentNames&label=anwalt:2695&date=2020-03-03&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e returns
[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":39}]
maybe it's a period=range issue?

@diosmosis
Copy link
Member

@feikede at the bottom of each graph there is an icon that can be used to get an API URL for a report:

image

clicking on it would show:

image

The link it generates should be directly to the API, I'm wondering if this feature is broken for you as well for content reports.

maybe it's a period=range issue?

It very well could be, though it's still extremely strange, and I'm not immediately sure what could be causing it.

@feikede
Copy link
Author

feikede commented Mar 31, 2020

Today, on 2020-03-31, I have the following buggy behavior:

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,today&period=range&format=JSON
empty result

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,2020-03-30&period=range&format=JSON
empty result

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,2020-03-29&period=range&format=JSON
correct result ([{"label":"anwalt:6122","nb_visits":7,"nb_impressions":11,"nb_interactions":10,"sum_daily_nb_uniq_visitors":6,"interaction_rate":"90,91\u00a0%","segment":"contentName==anwalt%3A6122","idsubdatatable":844}])

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,2020-03-28&period=range&format=JSON
correct result ([{"label":"anwalt:6122","nb_visits":7,"nb_impressions":11,"nb_interactions":10,"sum_daily_nb_uniq_visitors":6,"interaction_rate":"90,91\u00a0%","segment":"contentName==anwalt%3A6122","idsubdatatable":844}])

etc.

I give up using this now, it's not reliable. Thanks for your help

@tsteur
Copy link
Member

tsteur commented Mar 31, 2020

Sorry to hear @feikede . We would be maybe able to have a look should we have full server access (incl db) but otherwise this be hard to identify the issue. We can't possibly imagine how this is happening since Matomo uses the same API's in the reporting UI as well. It does somehow suggest there is some issue somewhere else in the server setup but of course there could be also just as well be some edge case issue in Matomo.

@twi123
Copy link

twi123 commented Oct 1, 2020

Think i have similar issue with 3.14.1:
/index.php?module=Contents&action=getContentNames&idSite=6&period=day&date=2020-09-30&segment=&showtitle=1&random=4231
and I got this:

Content Name Visits Unique visitors Actions Actions per Visit Avg. Time on Website Bounce Rate Conversion Rate
Вася опубликовал новость от 13.13.2013 50% 3 50% 3 - - 0s 0% 0%
Публикация 50% 3 50% 3 - - 0s 0% 0%
Totals 100% 6 100% 6 - - - - -

Note on empty "actions" column

But with this /index.php?module=API&method=Contents.getContentNames&idSite=6&period=day&date=2020-09-30&segment=&format=HTML&token_auth=...
i got this:

label nb_uniq_visitors nb_visits nb_impressions nb_interactions interaction_rate _metadata _idSubtable
Вася опубликовал новость от 13.13.2013 3 3 6 11 183.33% 'segment' => 'contentName==%D0%92%D0%B0%D1%81%D1%8F+%D0%BE%D0%BF%D1%83%D0%B1%D0%BB%D0%B8%D0%BA%D0%BE%D0%B2%D0%B0%D0%BB+%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D1%8C+%D0%BE%D1%82+13.13.2013' 1
Публикация 3 3 5 6 120% 'segment' => 'contentName==%D0%9F%D1%83%D0%B1%D0%BB%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D1%8F' 2

Note on nb_interactions column (nevermind of too much interactions -- result of manual _paq.push('trackContentInteractions'...) calls via js console for testing purposes)

As I can see, a call method getContentNames via module Contents returns different results with calling method Contents.getContentNames from module API. Is it normal? Or I miss something?

@twi123
Copy link

twi123 commented Oct 1, 2020

Or it's just something like a mess in column names in report UI (i.e. nb_actions instead of nb_interactions)?

@tsteur
Copy link
Member

tsteur commented Oct 1, 2020

fyi replied in #16505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

5 participants