@brandtmark opened this Issue on January 14th 2022

Our websites are accessed externally and internally (with an IP from the local network range).
GeoIp is installed and enabled.

Despite having set enable_language_to_country_guess=0, the geolocation information for users from the local network is set based on their browser language and therefore totally wrong.

This seems to happen in Common::extractCountryCodeFromBrowserLanguage and I’ve got the feeling that this function does not respect the config value correctly.

public static function extractCountryCodeFromBrowserLanguage($browserLanguage, $validCountries, $enableLanguageToCountryGuess)
{
    /** <a class='mention' href='https://github.com/var'>@var</a> LanguageDataProvider $dataProvider */
    $dataProvider = StaticContainer::get('Piwik\Intl\Data\Provider\LanguageDataProvider');

    $langToCountry = $dataProvider->getLanguageToCountryList();

    if ($enableLanguageToCountryGuess) {
        if (preg_match('/^([a-z]{2,3})(?:,|;|$)/', $browserLanguage, $matches)) {
            // match language (without region) to infer the country of origin
            if (array_key_exists($matches[1], $langToCountry)) {
                return $langToCountry[$matches[1]];
            }
        }
    }

        // the following code should run only if $enableLanguageToCountryGuess is true

    if (!empty($validCountries) && preg_match_all('/[-]([a-z]{2})/', $browserLanguage, $matches, PREG_SET_ORDER)) {
        foreach ($matches as $parts) {
            // match location; we don't make any inferences from the language
            if (array_key_exists($parts[1], $validCountries)) {
                return $parts[1];
            }
        }
    }
    return self::LANGUAGE_CODE_INVALID;
}

Expected Behavior

When setting "enable_language_to_country_guess=0" in config, the country code should never be guessed from the browser language. It should be "unknown".

Current Behavior

Despite having enable_language_to_country_guess=0 set in config, the geolocation for users with local network IP addresses is extracted from the browser language resulting in completely wrong countries.

Possible Solution

The function Common::extractCountryCodeFromBrowserLanguage should respect the setting correctly

@sgiehl commented on January 17th 2022 Member

Hi @brandtmark
Thanks for creating this issue. The setting might be a bit misleading. Disabling the country guess by language, actually only disables using the language to country mapping as defined here:
https://github.com/matomo-org/matomo/blob/115527353a9e75e01aa4d263408956ae45403bea/core/Intl/Data/Resources/languages-to-countries.php#L19-L60

But it still uses the country if it's defined in the language. Like en-US indicates the US, while en-GB would indicate England, de-CH would be Swiss,... If someone uses a language like fr only, that wouldn't be mapped to France without the language guess.

I guess this could be clarified a bit better.

Nevertheless, I guess you are actually looking for an option to disable falling back to the default provider if the GeoIP provider didn't have results, right?

@brandtmark commented on January 17th 2022

Hi @sgiehl

Thanks for the explanation.

Exactly. I would prefer that clients who access our websites from the internal network don't to show up as visitors from US, UK, etc. just because of their browser language.

@sgiehl commented on January 17th 2022 Member

Should be quite simple to introduce a new config setting that makes this possible. I'll set up a pull request to do that. But can't promise when this will be included in Matomo. Likely not before 4.8.0, as 4.7.0-rc1 was already released.

@brandtmark commented on January 17th 2022

Wow - i didn't expect this issue to be addressed that quickly.
Thanks a lot! Amazing!

Powered by GitHub Issue Mirror