Hi,
we've recently updated to 4.1.1 (from 3.x) and we're unable to view archived data for 2018, but archive data is present (also calculated nb_visits from archive_numeric). It seems like archived data is still there but somehow not visible on the UI, and we can't really explain why this happen. 2019 is working fine.
We also can't just invalidate the archive tables because raw logs are being removed after 365 days.
archive_numeric (2018_12, Query is not correct but only to show that data should be there):
UI:
Any clue?
@Littlericket it's hard to say without access to the database. Have you configured report deletions as well besides raw data deletion after 365 days? The data would then be removed though AFAIK so I'm assuming that's not the case. I can't think of a way how this would be related to the Matomo 4 upgrade so it might be caused by something else but you never know. (Maybe someone invalidated reports etc).
We could maybe have a quick look if there's a chance for us to get access to the database.
For these archives that you're showing there, can you select the values for the related "done" flags? Basically, what you'd want to check for each idarchive
select * from matomo_archive_numeric_2020_09 where idarchive IN (33,40,59) and name like "%done%"
You'd need to replace the table name and the value for the idarchive and replace it with the idArchives you selected above.
If you're familiar with mysql you could also check if there are any duplicates for the same dates. It would look like this:
set sql_mode='';
select count(*), idsite, date1, period, value, `name`, ts_archived from matomo_archive_numeric_2020_09 where period = 3 and date1 = '2020-09-01' and name like "%done%" group by idsite, name, date1, period, value
You would need to replace the table name and the value for date1. If you can post otherwise the output for this that be great.
Hi @tsteur,
thanks for the quick reply! Unfortunatly I cannot give access to the database, but I'll try providing the result of the queries.
We haven't setup archive deletion, only the raw logs are getting removed. We've enabled this because the database of raw logs reached a few hundred gigs and we don't have that much RAM to keep all the data sets in memory.
Result of your first query (large result set): https://s.littlericket.me/2b8oaeqhly
Result of the second duplicates query: https://s.littlericket.me/syrfo5qhcv
As far as I can tell there are no rows with count > 1, so no duplicates?
Thanks @literalplus Are you seeing the problem for "all visits" or only for some specific segments maybe?
@tsteur this is the case for "all visits" and also all defined segments.
@Littlericket thanks for this. Can you execute the query
select count(*), idsite, date1, period, value, `name`, ts_archived
from piwik_archive_numeric_2018_12
where period = 3 and date1 = '2018-12-01' and name like "%done%"
group by idsite, name, date1, period, VALUE
ORDER BY 1 desc
also only for a specific idsite where you experience this? This might help better understand things. eg add AND idsite=1
to the WHERE
clause.
@tsteur sure! Limited to idsite=1
Result set is at https://s.littlericket.me/lu8tryt918
Thanks @Littlericket This looks good to me. Eg I see a done
archive for that idSite & month meaning all visits
should show data. I'm assuming you're seeing data for December 2018?
Could you also query the data for the year like this?
select count(*), idsite, date1, period, value, `name`, ts_archived
from piwik_archive_numeric_2018_01
where period = 4 and date1 = '2018-01-01' and name like "%done%" AND idsite=1
group by idsite, name, date1, period, VALUE
ORDER BY 1 desc
Hi @tsteur, again, thanks for the support!
There is no visible data in the UI when viewing december:
Heres the result set of your given query: https://s.littlericket.me/syminfdbdr
Also "almost" the same query (without the value column because of binary) for the blob archive: https://s.littlericket.me/ml80njslta
@diosmosis do you maybe have any idea?
@Littlericket I know you already said it's not possible to give us access to the DB. If there's otherwise the chance you could send us a dump of an entire archive table that be great. We would likely not need any other data.
@Littlericket some ideas to help diagnose:
could you try loading an API request for the data like: index.php?module=API&method=VisitsSummary.get&period=day&date=2018-01-01&idSite=1&token_auth=...
<?php
use Piwik\ErrorHandler;
use Piwik\ExceptionHandler;
use Piwik\FrontController;
if (!defined('PIWIK_DOCUMENT_ROOT')) {
define('PIWIK_DOCUMENT_ROOT', dirname(__FILE__) == '/' ? '' : dirname(__FILE__));
}
if (file_exists(PIWIK_DOCUMENT_ROOT . '/bootstrap.php')) {
require_once PIWIK_DOCUMENT_ROOT . '/bootstrap.php';
}
if (!defined('PIWIK_INCLUDE_PATH')) {
define('PIWIK_INCLUDE_PATH', PIWIK_DOCUMENT_ROOT);
}
require_once PIWIK_INCLUDE_PATH . '/core/bootstrap.php';
if (!defined('PIWIK_PRINT_ERROR_BACKTRACE')) {
define('PIWIK_PRINT_ERROR_BACKTRACE', false);
}
if (!defined('PIWIK_ENABLE_ERROR_HANDLER') || PIWIK_ENABLE_ERROR_HANDLER) {
ErrorHandler::registerErrorHandler();
ExceptionHandler::setUp();
}
FrontController::setUpSafeMode();
if (!defined('PIWIK_ENABLE_DISPATCH')) {
define('PIWIK_ENABLE_DISPATCH', true);
}
$environment = new \Piwik\Application\Environment(null);
$environment->init();
if (!\Piwik\Common::isPhpCliMode()) {
exit;
}
$controller = FrontController::getInstance();
$controller->init();
$idSite = 1;
$date = '2018-12-01';
$username = 'USERNAME';
$password = 'PASSWORD';
$auth = new \Piwik\Plugins\Login\Auth();
$auth->setLogin($username);
$auth->setPassword($password);
\Piwik\Access::getInstance()->reloadAccess($auth);
$data = \Piwik\Plugins\VisitsSummary\API::getInstance()->get($idSite, 'day', $date);
print_r($data->getRows());
$params = new \Piwik\ArchiveProcessor\Parameters(new \Piwik\Site($idSite), \Piwik\Period\Factory::build('day', $date), new \Piwik\Segment('', [$idSite]));
$result = \Piwik\DataAccess\ArchiveSelector::getArchiveIdAndVisits($params);
print "idarchives: " . print_r($result[0], true);
print "visits: " . $result[1] . "\n";
print "usable: " . ($result[3] ? 'true' : 'false') . "\n";
@tsteur can give you the archive table as a download, but I need to contact you privately instead of posting the dump here..
API request returns zeros for all xml keys:
<result>
<nb_uniq_visitors>0</nb_uniq_visitors>
<nb_users>0</nb_users>
<nb_visits>0</nb_visits>
<nb_actions>0</nb_actions>
<nb_visits_converted>0</nb_visits_converted>
<bounce_count>0</bounce_count>
<sum_visit_length>0</sum_visit_length>
<max_actions>0</max_actions>
<bounce_rate>0 %</bounce_rate>
<nb_actions_per_visit>0</nb_actions_per_visit>
<avg_time_on_site>0</avg_time_on_site>
</result>
$ php 17160.php
Array
(
[0] => Piwik\DataTable\Row Object
(
[maxVisitsSummed] => 0
[metadata:Piwik\DataTable\Row:private] => Array
(
)
[isSubtableLoaded:Piwik\DataTable\Row:private] =>
[subtableId] =>
[storage:ArrayObject:private] => Array
(
[nb_uniq_visitors] => 0
[nb_users] => 0
[nb_visits] => 0
[nb_actions] => 0
[nb_visits_converted] => 0
[bounce_count] => 0
[sum_visit_length] => 0
[max_actions] => 0
)
)
)
idarchives: Array
(
[0] => 836463
)
visits: 0
usable: true
3. browser triggered archiving is disabled
browser_archiving_disabled_enforce = 1
enable_create_realtime_segments = 0
enable_browser_archiving_triggering = 0
@Littlericket can you run the following SQL:
SELECT * FROM piwik_archive_numeric_2018_01 WHERE idarchive = 836463;
If you'd like to email the data, you can email it to dizzy@innocraft.com.
@Littlericket looking at the data you sent it looks like there is just no data in the archives for those dates. The only thing I can think of that might have caused this is deleting old log data, then having archives of old data be re-archived causing new archives with zero visits to be created. Do you have backups with the old archive data?
EDIT: actually that doesn't seem like the case if the data was present before updating, since the ts_archived is 2019...
@tsteur perhaps we should prevent invalidating or re-archiving archives that have no log visits and are older than the delete_log_data_older_than value? though this might still be a problem if the value is changed.
@diosmosis thanks for the investigation! No, we dont have any backup of old archive data. If there are visits (visible in the upper screenshots) can't we just somehow adjust the archive parameters to display the archives which have data in the UI?
If I understand correctly, there has been data somewhen but the data got refreshed by a archive cron run with zero calculated visits when raw data is getting removed?
@diosmosis I've been looking through the data and indeed there is only a done flag but no other report data. I've been looking through the code (eg DataAccess model) and I can't explain or find anything where we would delete archives but keep the done
flag.
perhaps we should prevent invalidating or re-archiving archives that have no log visits and are older than the delete_log_data_older_than value? though this might still be a problem if the value is changed.
I'm now thinking we might not be able to add this because a user might want to invalidate/remove data manually? It could be done though maybe and we could throw an exception when someone tries to invalidate data for such a range and require a force parameter or so. Generally I don't think that's the problem here though since Matomo would still keep and show invalidated data until the data has been rearchived which is not the case here.
The done flag is from 2019-09-23 18:07:52
so it couldn't have been created recently.
@Littlericket I suppose no data was deleted there manually in the database somehow? Do you know if the data was still there before updating Matomo 4? Like maybe it appeared to have gone lost after Matomo 4 update and maybe for a while nobody looked at the data and it maybe got missing before that?
Generally I see a lot of done flags in that table without any data. Also for various segments that were archived over 2020. Are you using any custom or third party Matomo plugins @Littlericket ?
I'm trying to understand what could delete archive data in Matomo but keep the done flag. I can't see anything yet in the code where this happens.