Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEO Ranking plugin not working - problem with ping? - Fix the random test failures #17919

Closed
tsteur opened this issue Aug 25, 2021 · 17 comments · Fixed by #17993, #18109, #18125 or #18142
Closed
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Aug 25, 2021

Seeing this in the tests

image

and an error in the UI

image

Seems this regressed.

@tsteur tsteur added Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should. labels Aug 25, 2021
@tsteur tsteur added this to the 4.5.0 milestone Aug 25, 2021
@sgiehl
Copy link
Member

sgiehl commented Aug 25, 2021

That actually seems to fail randomly

@Findus23
Copy link
Member

This is more or less expected as it is crawling websites that randomly block or rate limit the requests.

@tsteur
Copy link
Member Author

tsteur commented Aug 25, 2021

@Findus23 so it's maybe now also blocked on travis because we run tests more often maybe? Maybe in that case we need to write the tests slightly different to ideally detect rate limit and ignore it (skip test or so) but not fail when the data or URL changes (not fail for any other error).

To fix it on cloud maybe we would need to request the data using JavaScript to go directly to the search engines? But I assume then it might be bit harder to test things work?

@Findus23
Copy link
Member

But then again I thought we changed the tests to test against a local copy of the HTTP response, but I might be wrong here

@sgiehl
Copy link
Member

sgiehl commented Aug 26, 2021

I guess there are still some tests that try to fetch the data from bing, alexa and google to ensure the response doesn't change

@tsteur
Copy link
Member Author

tsteur commented Aug 26, 2021

created an internal issue for cloud to fix the rate limiting there where we might simply try to get data using JS directly from search engine or so.

Meanwhile as part of this issue be great to look into making the test stable. Like we could force to run into the rate limit and then check if the response code or response message includes rate limit information then we can catch this error specifically and ignore it in tests.

@geekdenz geekdenz self-assigned this Sep 14, 2021
@geekdenz
Copy link
Contributor

That actually seems to fail randomly

You are right @sgiehl ! That was the clue.

See

$_SERVER['HTTP_USER_AGENT'] = $user_agents[mt_rand(0, count($user_agents) - 1)];

I debugged it and it seems it randomly fails on the User Agent that is passed.

So, I wrote this script:
https://gist.github.com/geekdenz/2496eaaf7c437ba49bc389e75a10b880

Ran it and found that it outputs an empty search result on offset $i % 3 === 2:
image

So, it seems to be this UA that it does not like:

'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',

I'll change it to a different one, as Bing seems to not work with that. Also, I think it should maybe not be completely random and should be different on every subsequent call. But i realise that might be hard to do. Could it be $build_no % count($user_agents)?

@geekdenz
Copy link
Contributor

FYI I was able to reproduce this in Safari changing the UA string to Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

image

geekdenz pushed a commit that referenced this issue Sep 14, 2021
@justinvelluppillai
Copy link
Contributor

@geekdenz nice find. Don't know if mt_rand is great in a test either, wonder if it could do something along the lines of either using each of the user agents one after the other or what @tsteur suggested and keep trying a different user agent if one fails (depending on whether there's a reason to test different user agents or not).

@geekdenz
Copy link
Contributor

I think using the same user agent every time would suffice actually.

@geekdenz
Copy link
Contributor

@sgiehl @tsteur

Why are we using an array of user agents here? I feel like there would have been an original reason/issue with just using one. Or was it just added to avoid a problem that was merely anticipated?

@sgiehl
Copy link
Member

sgiehl commented Sep 15, 2021

@geekdenz The code to query the metrics uses $_SERVER['HTTP_USER_AGENT']. So the useragent of the current user. I wonder if it might make sense to check if the results are empty and retry the query with another useragent. So the results are fetched correctly even if the useragent is not supported 🤔
Using random user agents in tests sounds valid. Otherwise we might not have discovered this. But would be good to comment in the code somewhere that bing might not return results for some useragents maybe

justinvelluppillai pushed a commit that referenced this issue Sep 15, 2021
* change bogus user agent that fails on bing.com

fixes #17919

* only use one user agent for seo test
@justinvelluppillai
Copy link
Contributor

@sgiehl @geekdenz I think the test should either test a few different UAs or just one rather than random. If we test one, we could use that same one with confidence for the backup useragent to retry with on empty results. Maybe worth making that as a separate issue - I've merged this one for now to prevent the failing test meantime

@tsteur tsteur reopened this Sep 17, 2021
@tsteur
Copy link
Member Author

tsteur commented Sep 17, 2021

@geekdenz this still seems to be failing randomly see https://app.travis-ci.com/github/matomo-org/matomo/jobs/537951969#L1130-L1130
image

or am I looking wrong? The test ran after the other PR was merged AFAIK

@geekdenz
Copy link
Contributor

Yeah. Thanks for pointing it out. I saw this earlier as well but thought it might be an artifact of a merge not having happened somewhere. I think maybe the find I did was just looking like this in Safari. I'll have another look in Chrome. And then I might reinstate the array and make it retry with another UA if the test fails on the first etc. and only have it throw a failure of all fails.

@justinvelluppillai
Copy link
Contributor

I think it's possible for this to fail if the actual search results on Bing contains the word "results". It's a rare case, maybe happens occasionally when microsoft.com publishes new company "results" for investors or at random. Perhaps a solution would be to make the regex that searches for the number of search results more restrictive. The change from @geekdenz at least prevents the test failing 1 in 3 times and may also be sufficient to close this.

@justinvelluppillai justinvelluppillai modified the milestones: 4.5.0, 4.6.0 Oct 7, 2021
@tsteur tsteur changed the title SEO Ranking plugin not working - problem with ping? SEO Ranking plugin not working - problem with ping? - Fix the random test failures Oct 7, 2021
@peterhashair peterhashair self-assigned this Oct 7, 2021
@tsteur tsteur reopened this Oct 8, 2021
@peterhashair
Copy link
Contributor

I had a discussion with @justinvelluppillai on that one, we suggest printing the IP if tests failed in the travis, if the same IP keeps failing, probably because the IP blocked already. Let me know if there is a security risk of printing the IP of Travis.

@peterhashair peterhashair mentioned this issue Oct 12, 2021
11 tasks
@justinvelluppillai justinvelluppillai added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
6 participants