@tsteur opened this Issue on August 25th 2021 Member

Seeing this in the tests


and an error in the UI


Seems this regressed.

@sgiehl commented on August 25th 2021 Member

That actually seems to fail randomly

@Findus23 commented on August 25th 2021 Member

This is more or less expected as it is crawling websites that randomly block or rate limit the requests.

@tsteur commented on August 25th 2021 Member

@Findus23 so it's maybe now also blocked on travis because we run tests more often maybe? Maybe in that case we need to write the tests slightly different to ideally detect rate limit and ignore it (skip test or so) but not fail when the data or URL changes (not fail for any other error).

To fix it on cloud maybe we would need to request the data using JavaScript to go directly to the search engines? But I assume then it might be bit harder to test things work?

@Findus23 commented on August 26th 2021 Member

But then again I thought we changed the tests to test against a local copy of the HTTP response, but I might be wrong here

@sgiehl commented on August 26th 2021 Member

I guess there are still some tests that try to fetch the data from bing, alexa and google to ensure the response doesn't change

@tsteur commented on August 26th 2021 Member

created an internal issue for cloud to fix the rate limiting there where we might simply try to get data using JS directly from search engine or so.

Meanwhile as part of this issue be great to look into making the test stable. Like we could force to run into the rate limit and then check if the response code or response message includes rate limit information then we can catch this error specifically and ignore it in tests.

@geekdenz commented on September 14th 2021 Contributor

That actually seems to fail randomly

You are right @sgiehl ! That was the clue.

See https://github.com/matomo-org/matomo/blob/41b8240fcf79291dbabb2e30a38f321d44b64cf5/plugins/SEO/tests/Integration/SEOTest.php#L42

I debugged it and it seems it randomly fails on the User Agent that is passed.

So, I wrote this script:

Ran it and found that it outputs an empty search result on offset $i % 3 === 2:


So, it seems to be this UA that it does not like:


I'll change it to a different one, as Bing seems to not work with that. Also, I think it should maybe not be completely random and should be different on every subsequent call. But i realise that might be hard to do. Could it be $build_no % count($user_agents)?

@geekdenz commented on September 14th 2021 Contributor

FYI I was able to reproduce this in Safari changing the UA string to Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

@justinvelluppillai commented on September 14th 2021 Member

@geekdenz nice find. Don't know if mt_rand is great in a test either, wonder if it could do something along the lines of either using each of the user agents one after the other or what @tsteur suggested and keep trying a different user agent if one fails (depending on whether there's a reason to test different user agents or not).

@geekdenz commented on September 14th 2021 Contributor

I think using the same user agent every time would suffice actually.

@geekdenz commented on September 14th 2021 Contributor

@sgiehl @tsteur

Why are we using an array of user agents here? I feel like there would have been an original reason/issue with just using one. Or was it just added to avoid a problem that was merely anticipated?

@sgiehl commented on September 15th 2021 Member

@geekdenz The code to query the metrics uses $_SERVER['HTTP_USER_AGENT']. So the useragent of the current user. I wonder if it might make sense to check if the results are empty and retry the query with another useragent. So the results are fetched correctly even if the useragent is not supported 🤔
Using random user agents in tests sounds valid. Otherwise we might not have discovered this. But would be good to comment in the code somewhere that bing might not return results for some useragents maybe

@justinvelluppillai commented on September 15th 2021 Member

@sgiehl @geekdenz I think the test should either test a few different UAs or just one rather than random. If we test one, we could use that same one with confidence for the backup useragent to retry with on empty results. Maybe worth making that as a separate issue - I've merged this one for now to prevent the failing test meantime

@tsteur commented on September 17th 2021 Member

@geekdenz this still seems to be failing randomly see https://app.travis-ci.com/github/matomo-org/matomo/jobs/537951969#L1130-L1130

or am I looking wrong? The test ran after the other PR was merged AFAIK

@geekdenz commented on September 17th 2021 Contributor

Yeah. Thanks for pointing it out. I saw this earlier as well but thought it might be an artifact of a merge not having happened somewhere. I think maybe the find I did was just looking like this in Safari. I'll have another look in Chrome. And then I might reinstate the array and make it retry with another UA if the test fails on the first etc. and only have it throw a failure of all fails.

@justinvelluppillai commented on October 5th 2021 Member

I think it's possible for this to fail if the actual search results on Bing contains the word "results". It's a rare case, maybe happens occasionally when microsoft.com publishes new company "results" for investors or at random. Perhaps a solution would be to make the regex that searches for the number of search results more restrictive. The change from @geekdenz at least prevents the test failing 1 in 3 times and may also be sufficient to close this.

@peterhashair commented on October 11th 2021 Contributor

I had a discussion with @justinvelluppillai on that one, we suggest printing the IP if tests failed in the travis, if the same IP keeps failing, probably because the IP blocked already. Let me know if there is a security risk of printing the IP of Travis.

This Issue was closed on October 11th 2021
Powered by GitHub Issue Mirror