Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Fallback Method for Alexa in SEO Plugin #13552

Merged
merged 4 commits into from Dec 4, 2018
Merged

Added Fallback Method for Alexa in SEO Plugin #13552

merged 4 commits into from Dec 4, 2018

Conversation

ozdemirburak
Copy link
Contributor

@ozdemirburak ozdemirburak commented Oct 7, 2018

fixes #13427

Greetings,

I've added an alternative method for fetching the global ranking from Alexa.

On the other hand, calling http://data.alexa.com/data?cli=10&url=DOMAIN is working currently in my case, however I had this issue in the previous month, so I do not know whether they are blacklisting or limiting IP ranges or not.

Finally, if one unexpectedly calls this new method frequently, probably will end up with being blacklisted.

@Findus23 Findus23 added Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement. Needs Review PRs that need a code review labels Oct 9, 2018
@Findus23
Copy link
Member

Findus23 commented Oct 9, 2018

Hi,

Many thanks for your contribution. Can you explain what the fallback does?

I still only get Okay when I request http://data.alexa.com/data?cli=10&url=example.com

@ozdemirburak
Copy link
Contributor Author

ozdemirburak commented Oct 9, 2018

Hello,

First, if an exception occurs, like in your case, then it sends a HTTP request to public Alexa ranking page of the website, for instance https://www.alexa.com/siteinfo/example.com, then matches the global ranking and local ranking rows.

Since that HTML is kinda dirty, it replaces multiple whitespace with a single space first, then what we get is something like below.

<strong class="metrics-data align-vmiddle">19,460</strong>
<strong class="metrics-data align-vmiddle">6,517</strong>

Finally, it matches what is between strong with the class metrics-data align-vmiddle, which is 19,460 here and returns it as an integer, 19460.

If it can't match anything, for instance imagine a scenario where Amazon/Alexa developers decide to change the value of the strong attribute's class, then it will return null.

BTW, this is my output right now, queried from Turkish IP address, and, is the OK message in XML format also, I can not remember?

<?xml version="1.0" encoding="UTF-8"?>
<!-- Need more Alexa data?  Find our APIs here: https://aws.amazon.com/alexa/ -->
<ALEXA VER="0.9" URL="example.com/" HOME="0" AID="=" IDN="example.com/">
<SD><POPULARITY URL="example.com/" TEXT="19460" SOURCE="panel"/><REACH RANK="16581"/><RANK DELTA="-3629"/><COUNTRY CODE="IN" NAME="India" RANK="6517"/></SD></ALEXA>

@mattab mattab added this to the 3.7.0 milestone Oct 9, 2018
@Findus23
Copy link
Member

Ah, I misunderstood your code. Now it makes sense and at least according to https://stackoverflow.com/questions/50279057/alexa-site-rank-api your solution is the only way that works anymore (you could maybe try https://www.alexa.com/minisiteinfo/stackoverflow.com as it is probably simmer)

I have now tried from multiple networks including an university network and iPv6 and in never worked for me.

@ozdemirburak
Copy link
Contributor Author

Updated the URL, and now using DOMXPath if it is OK to filter out the node value since it will need a complex and not good looking regex to filter out that value.

@diosmosis diosmosis merged commit 12b522d into matomo-org:3.x-dev Dec 4, 2018
@diosmosis
Copy link
Member

Works for me, thanks for the great contribution @ozdemirburak !

@ozdemirburak ozdemirburak deleted the alexa-fallback-method branch December 4, 2018 23:21
sgiehl pushed a commit that referenced this pull request Dec 6, 2018
* added fallback method for Alexa, fixes issue #13427

* do not use short array syntax for consistency with other methods

* use mini link for Alexa, use DomXPath to filter out the global ranking instead of regex
diosmosis pushed a commit that referenced this pull request Dec 8, 2018
* Add reports dimensions to metadata of report and rows

* translate dimension columns

* updates test files

* fix possible error when no report is available

* update tests

* Improve subdimension detection

* Adjust tests for labelX logic

* Makes flattener compatible with 3 dimensions

* Adds new method getThirdLeveltableDimension to report class

* Do not ask for 2fa authentication code when CoreUpdater is being requested (#13796)

Could fix an edge case where user is logged in, but hasn't confirmed the auth code (so the user is not actually logged in), and then an update appears.

* Added Fallback Method for Alexa in SEO Plugin (#13552)

* added fallback method for Alexa, fixes issue #13427

* do not use short array syntax for consistency with other methods

* use mini link for Alexa, use DomXPath to filter out the global ranking instead of regex

* Use db sessions by default, deprecate file session handler (#13540)

* use db sessions by default, deprecate file session handler

* trying to fix tests

* Prevent trigger errors on demand for instances that are opened to anonymous (#13535)

fix #13513

* Remove the previous exception in base validator so the same error is not printed twice (#13801)

* Fixing build  (#13802)

* update submodule

* Update screenshots and try to get test to pass.

* Get SingleMetricView to pass. (#13803)

* Quickform2 throws warnings with PHP7.2 (#13463)

fixes #13272

Haven't actually tested it but should fix the issue. If tests pass, the logic would be still the same. I don't have a PHP 7.2 running here otherwise at the moment

* Send bulk requests in chunks when needed (#13444)

* send bulk requests in chunks

* send requests correctly

* Make log and report data screen less technical (#13464)

* When you are logged out, the URL gets lost when you log in (#13441)

It won't remember any hash as the hash won't be visible in the referrer etc but it would work for most other pages.

To make it work for hash it would get likely way more complicated like we would need to persist it through JS, temporarily store it somewhere and redirect accordingly. It fixes the case mentioned in the issue.

fix #13328

* show full information of URL only on extra click (#13585)

* Add option to opt in to use send beacon (#13451)

* Add option to opt in to use send beacon

* Fix JS tracker test.

* do not overrwite existing subrow metadata

* update test files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SEO plugin Alexa broken
4 participants