Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After updating to Matomo 4, UI for 2018 is empty but archive data is present #17160

Open
Littlericket opened this issue Jan 28, 2021 · 16 comments

Comments

@Littlericket
Copy link

Littlericket commented Jan 28, 2021

Hi,

we've recently updated to 4.1.1 (from 3.x) and we're unable to view archived data for 2018, but archive data is present (also calculated nb_visits from archive_numeric). It seems like archived data is still there but somehow not visible on the UI, and we can't really explain why this happen. 2019 is working fine.

We also can't just invalidate the archive tables because raw logs are being removed after 365 days.

archive_numeric (2018_12, Query is not correct but only to show that data should be there):
image

UI:
image

Any clue?

@Littlericket Littlericket changed the title After updating to Matomo 4, UI for 2018, 2017 is empty but archive data is present After updating to Matomo 4, UI for 2018 is empty but archive data is present Jan 28, 2021
@tsteur
Copy link
Member

tsteur commented Jan 28, 2021

@Littlericket it's hard to say without access to the database. Have you configured report deletions as well besides raw data deletion after 365 days? The data would then be removed though AFAIK so I'm assuming that's not the case. I can't think of a way how this would be related to the Matomo 4 upgrade so it might be caused by something else but you never know. (Maybe someone invalidated reports etc).

We could maybe have a quick look if there's a chance for us to get access to the database.

For these archives that you're showing there, can you select the values for the related "done" flags? Basically, what you'd want to check for each idarchive

select * from matomo_archive_numeric_2020_09 where idarchive IN (33,40,59) and name like "%done%"

You'd need to replace the table name and the value for the idarchive and replace it with the idArchives you selected above.

If you're familiar with mysql you could also check if there are any duplicates for the same dates. It would look like this:

set sql_mode='';

select count(*), idsite, date1, period, value, `name`, ts_archived from matomo_archive_numeric_2020_09 where period = 3 and date1 = '2020-09-01' and name like "%done%" group by idsite, name, date1, period, value

You would need to replace the table name and the value for date1. If you can post otherwise the output for this that be great.

@Littlericket
Copy link
Author

Hi @tsteur,

thanks for the quick reply! Unfortunatly I cannot give access to the database, but I'll try providing the result of the queries.
We haven't setup archive deletion, only the raw logs are getting removed. We've enabled this because the database of raw logs reached a few hundred gigs and we don't have that much RAM to keep all the data sets in memory.

Result of your first query (large result set): https://s.littlericket.me/2b8oaeqhly
Result of the second duplicates query: https://s.littlericket.me/syrfo5qhcv

As far as I can tell there are no rows with count > 1, so no duplicates?

@tsteur
Copy link
Member

tsteur commented Jan 31, 2021

Thanks @literalplus Are you seeing the problem for "all visits" or only for some specific segments maybe?

@Littlericket
Copy link
Author

@tsteur this is the case for "all visits" and also all defined segments.

@tsteur
Copy link
Member

tsteur commented Feb 1, 2021

@Littlericket thanks for this. Can you execute the query

select count(*), idsite, date1, period, value, `name`, ts_archived
from piwik_archive_numeric_2018_12
where period = 3 and date1 = '2018-12-01' and name like "%done%"
group by idsite, name, date1, period, VALUE
ORDER BY 1 desc

also only for a specific idsite where you experience this? This might help better understand things. eg add AND idsite=1 to the WHERE clause.

@Littlericket
Copy link
Author

@tsteur sure! Limited to idsite=1
Result set is at https://s.littlericket.me/lu8tryt918

@tsteur
Copy link
Member

tsteur commented Feb 2, 2021

Thanks @Littlericket This looks good to me. Eg I see a done archive for that idSite & month meaning all visits should show data. I'm assuming you're seeing data for December 2018?

Could you also query the data for the year like this?

select count(*), idsite, date1, period, value, `name`, ts_archived
from piwik_archive_numeric_2018_01
where period = 4 and date1 = '2018-01-01' and name like "%done%" AND idsite=1
group by idsite, name, date1, period, VALUE
ORDER BY 1 desc

@Littlericket
Copy link
Author

Littlericket commented Feb 3, 2021

Hi @tsteur, again, thanks for the support!
There is no visible data in the UI when viewing december:
image

Heres the result set of your given query: https://s.littlericket.me/syminfdbdr
Also "almost" the same query (without the value column because of binary) for the blob archive: https://s.littlericket.me/ml80njslta

@tsteur
Copy link
Member

tsteur commented Feb 4, 2021

@diosmosis do you maybe have any idea?

@Littlericket I know you already said it's not possible to give us access to the DB. If there's otherwise the chance you could send us a dump of an entire archive table that be great. We would likely not need any other data.

@diosmosis
Copy link
Member

diosmosis commented Feb 4, 2021

@Littlericket some ideas to help diagnose:

  • could you try loading an API request for the data like: index.php?module=API&method=VisitsSummary.get&period=day&date=2018-01-01&idSite=1&token_auth=...

  • if that doesn't return data, could you add the following script to the root of your matomo install and run it from the command line (then remove it)? (you'll have to set the username/password in the script below, preferably to a view user)

<?php
use Piwik\ErrorHandler;
use Piwik\ExceptionHandler;
use Piwik\FrontController;

if (!defined('PIWIK_DOCUMENT_ROOT')) {
    define('PIWIK_DOCUMENT_ROOT', dirname(__FILE__) == '/' ? '' : dirname(__FILE__));
}
if (file_exists(PIWIK_DOCUMENT_ROOT . '/bootstrap.php')) {
    require_once PIWIK_DOCUMENT_ROOT . '/bootstrap.php';
}
if (!defined('PIWIK_INCLUDE_PATH')) {
    define('PIWIK_INCLUDE_PATH', PIWIK_DOCUMENT_ROOT);
}

require_once PIWIK_INCLUDE_PATH . '/core/bootstrap.php';

if (!defined('PIWIK_PRINT_ERROR_BACKTRACE')) {
    define('PIWIK_PRINT_ERROR_BACKTRACE', false);
}

if (!defined('PIWIK_ENABLE_ERROR_HANDLER') || PIWIK_ENABLE_ERROR_HANDLER) {
    ErrorHandler::registerErrorHandler();
    ExceptionHandler::setUp();
}

FrontController::setUpSafeMode();

if (!defined('PIWIK_ENABLE_DISPATCH')) {
    define('PIWIK_ENABLE_DISPATCH', true);
}

$environment = new \Piwik\Application\Environment(null);
$environment->init();

if (!\Piwik\Common::isPhpCliMode()) {
    exit;
}

$controller = FrontController::getInstance();
$controller->init();

$idSite = 1;
$date = '2018-12-01';
$username = 'USERNAME';
$password = 'PASSWORD';

$auth = new \Piwik\Plugins\Login\Auth();
$auth->setLogin($username);
$auth->setPassword($password);
\Piwik\Access::getInstance()->reloadAccess($auth);

$data = \Piwik\Plugins\VisitsSummary\API::getInstance()->get($idSite, 'day', $date);
print_r($data->getRows());

$params = new \Piwik\ArchiveProcessor\Parameters(new \Piwik\Site($idSite), \Piwik\Period\Factory::build('day', $date), new \Piwik\Segment('', [$idSite]));
$result = \Piwik\DataAccess\ArchiveSelector::getArchiveIdAndVisits($params);

print "idarchives: " . print_r($result[0], true);
print "visits: " . $result[1] . "\n";
print "usable: " . ($result[3] ? 'true' : 'false') . "\n";
  • also is browser triggered archiving enabled for your system?

@Littlericket
Copy link
Author

@tsteur can give you the archive table as a download, but I need to contact you privately instead of posting the dump here..

@diosmosis

  1. API request returns zeros for all xml keys:
<result>
<nb_uniq_visitors>0</nb_uniq_visitors>
<nb_users>0</nb_users>
<nb_visits>0</nb_visits>
<nb_actions>0</nb_actions>
<nb_visits_converted>0</nb_visits_converted>
<bounce_count>0</bounce_count>
<sum_visit_length>0</sum_visit_length>
<max_actions>0</max_actions>
<bounce_rate>0 %</bounce_rate>
<nb_actions_per_visit>0</nb_actions_per_visit>
<avg_time_on_site>0</avg_time_on_site>
</result>
$ php 17160.php
Array
(
    [0] => Piwik\DataTable\Row Object
        (
            [maxVisitsSummed] => 0
            [metadata:Piwik\DataTable\Row:private] => Array
                (
                )

            [isSubtableLoaded:Piwik\DataTable\Row:private] =>
            [subtableId] =>
            [storage:ArrayObject:private] => Array
                (
                    [nb_uniq_visitors] => 0
                    [nb_users] => 0
                    [nb_visits] => 0
                    [nb_actions] => 0
                    [nb_visits_converted] => 0
                    [bounce_count] => 0
                    [sum_visit_length] => 0
                    [max_actions] => 0
                )

        )

)
idarchives: Array
(
    [0] => 836463
)
visits: 0
usable: true
  1. browser triggered archiving is disabled
browser_archiving_disabled_enforce = 1
enable_create_realtime_segments = 0
enable_browser_archiving_triggering = 0

@diosmosis
Copy link
Member

@Littlericket can you run the following SQL:

SELECT * FROM piwik_archive_numeric_2018_01 WHERE idarchive = 836463;

If you'd like to email the data, you can email it to dizzy@innocraft.com.

@diosmosis
Copy link
Member

diosmosis commented Feb 10, 2021

@Littlericket looking at the data you sent it looks like there is just no data in the archives for those dates. The only thing I can think of that might have caused this is deleting old log data, then having archives of old data be re-archived causing new archives with zero visits to be created. Do you have backups with the old archive data?

EDIT: actually that doesn't seem like the case if the data was present before updating, since the ts_archived is 2019...

@tsteur perhaps we should prevent invalidating or re-archiving archives that have no log visits and are older than the delete_log_data_older_than value? though this might still be a problem if the value is changed.

@Littlericket
Copy link
Author

Littlericket commented Feb 10, 2021

@diosmosis thanks for the investigation! No, we dont have any backup of old archive data. If there are visits (visible in the upper screenshots) can't we just somehow adjust the archive parameters to display the archives which have data in the UI?

If I understand correctly, there has been data somewhen but the data got refreshed by a archive cron run with zero calculated visits when raw data is getting removed?

@tsteur
Copy link
Member

tsteur commented Feb 10, 2021

@diosmosis I've been looking through the data and indeed there is only a done flag but no other report data. I've been looking through the code (eg DataAccess model) and I can't explain or find anything where we would delete archives but keep the done flag.

perhaps we should prevent invalidating or re-archiving archives that have no log visits and are older than the delete_log_data_older_than value? though this might still be a problem if the value is changed.

I'm now thinking we might not be able to add this because a user might want to invalidate/remove data manually? It could be done though maybe and we could throw an exception when someone tries to invalidate data for such a range and require a force parameter or so. Generally I don't think that's the problem here though since Matomo would still keep and show invalidated data until the data has been rearchived which is not the case here.

The done flag is from 2019-09-23 18:07:52 so it couldn't have been created recently.

@Littlericket I suppose no data was deleted there manually in the database somehow? Do you know if the data was still there before updating Matomo 4? Like maybe it appeared to have gone lost after Matomo 4 update and maybe for a while nobody looked at the data and it maybe got missing before that?

Generally I see a lot of done flags in that table without any data. Also for various segments that were archived over 2020. Are you using any custom or third party Matomo plugins @Littlericket ?

I'm trying to understand what could delete archive data in Matomo but keep the done flag. I can't see anything yet in the code where this happens.

@Littlericket
Copy link
Author

@tsteur very sorry for the late response!
We are not using any third-party plugins besides:

  • InvalidateReports
  • MarketingCampaignsReporting
  • QueuedTracking

, which never had any problems at all.
I am not sure if the data was "gone" or not visible before the upgrade, but after the upgrade we checked several things. And of course we have not deleted any data manually.

What we had a year ago was a invalidation for all time ranges which had crashed due to low memory, but that should not delete any data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants