I've added an alternative method for fetching the global ranking from Alexa.
On the other hand, calling
http://data.alexa.com/data?cli=10&url=DOMAIN is working currently in my case, however I had this issue in the previous month, so I do not know whether they are blacklisting or limiting IP ranges or not.
Finally, if one unexpectedly calls this new method frequently, probably will end up with being blacklisted.
Many thanks for your contribution. Can you explain what the fallback does?
I still only get
Okay when I request http://data.alexa.com/data?cli=10&url=example.com
First, if an exception occurs, like in your case, then it sends a HTTP request to public Alexa ranking page of the website, for instance https://www.alexa.com/siteinfo/example.com, then matches the global ranking and local ranking rows.
Since that HTML is kinda dirty, it replaces multiple whitespace with a single space first, then what we get is something like below.
<strong class="metrics-data align-vmiddle">19,460</strong> <strong class="metrics-data align-vmiddle">6,517</strong>
Finally, it matches what is between strong with the class metrics-data align-vmiddle, which is 19,460 here and returns it as an integer, 19460.
If it can't match anything, for instance imagine a scenario where Amazon/Alexa developers decide to change the value of the strong attribute's class, then it will return null.
BTW, this is my output right now, queried from Turkish IP address, and, is the OK message in XML format also, I can not remember?
<?xml version="1.0" encoding="UTF-8"?> <!-- Need more Alexa data? Find our APIs here: https://aws.amazon.com/alexa/ --> <ALEXA VER="0.9" URL="example.com/" HOME="0" AID="=" IDN="example.com/"> <SD><POPULARITY URL="example.com/" TEXT="19460" SOURCE="panel"/><REACH RANK="16581"/><RANK DELTA="-3629"/><COUNTRY CODE="IN" NAME="India" RANK="6517"/></SD></ALEXA>
Ah, I misunderstood your code. Now it makes sense and at least according to https://stackoverflow.com/questions/50279057/alexa-site-rank-api your solution is the only way that works anymore (you could maybe try https://www.alexa.com/minisiteinfo/stackoverflow.com as it is probably simmer)
I have now tried from multiple networks including an university network and iPv6 and in never worked for me.
Updated the URL, and now using DOMXPath if it is OK to filter out the node value since it will need a complex and not good looking regex to filter out that value.