Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disappearance of segment archive data during 3.0 => 4.0 update #17138

Closed
OnlineArts opened this issue Jan 24, 2021 · 45 comments
Closed

Disappearance of segment archive data during 3.0 => 4.0 update #17138

OnlineArts opened this issue Jan 24, 2021 · 45 comments
Labels
wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.

Comments

@OnlineArts
Copy link

OnlineArts commented Jan 24, 2021

As mentioned in the Matomo forum, some users are not able to display historical segment data after the Matomo 4 upgrade.

We could observe, that after the Matomo 4 upgrade historical data (older than one month) disappeared from segmentation sites. By calling a segmentation overview we get the historical data from exactly one month. For example, opening the segment #1 page on the 2. January leads that all historical data till the 2. December appeared on this page. Opening the segment #2 page on the 15. January -> 15. December.

Archive invalidation and manual re-processing does not affect the appearance of historical data. This issue appears in at least Matomo 4.10 and 4.11. No obvious related config parameters changed that behavior.

Could that be related to the options from #17129 ? The segmentation in that side id is set to SEGMENTED REPORTS ARE PROCESSED IN REAL TIME (DEFAULT)

Edit:
Ok, the change of the mentioned option to SEGMENTED REPORTS ARE PRE-PROCESSED (FASTER, REQUIRES CRON)" results only that the segments now listed during the ./console core:archive execution. But still did not add historical data. As I see, it is still missing during the processing for the incomplete month of December.

Could that be in some kind related to the utf8 to utf8mb4 conversion that I performed after the upgrade?

@tsteur
Copy link
Member

tsteur commented Jan 24, 2021

Maybe it will be fixed by #17029 I wonder @diosmosis ?

@OnlineArts any chance you can post the output of your system check which you can find in "Administration -> System Check"? There should be an option to copy the system report (which is then anonymised). This would help us know eg about your report generation (archiving) settings etc.

@diosmosis
Copy link
Member

@tsteur it could be, depends on the segment definition. @OnlineArts would you be able to provide some of the segments that won't archive?

@OnlineArts
Copy link
Author

OnlineArts commented Jan 25, 2021

@diosmosis
The segments are all structured like: site-url starts with https://ourdomain.com.

I can see in the database entries (x_segment) the definitions are twice URL-encoded like: pageUrl=^https%253A%252F%252Fourdomain.com%252F

I assumed that should be
https%3A%2F%2Fourdomain.com%2F
% -> %25

I overwrote the definition with once URL-encoded values in the database but it did not change anything. I miss during the archiving process for example the monthly entries for December and November for this segment.

sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-10-31,2021-01-25
sudo -u www-data ./console core:archive --force-idsites=67 --force-date-range=2020-10-31,2021-01-24
INFO [2021-01-25 14:57:44] 53793  ---------------------------
INFO [2021-01-25 14:57:44] 53793  INIT
INFO [2021-01-25 14:57:44] 53793  Running Matomo 4.1.1 as Super User
INFO [2021-01-25 14:57:44] 53793  ---------------------------
INFO [2021-01-25 14:57:44] 53793  NOTES
INFO [2021-01-25 14:57:44] 53793  - Async process archiving supported, using CliMulti.
INFO [2021-01-25 14:57:44] 53793  - Reports for today will be processed at most every 1800 seconds. You can change this value in Matomo UI > Settings > General Settings.
INFO [2021-01-25 14:57:44] 53793  - Archiving was last executed without error 1 min 49s ago
INFO [2021-01-25 14:57:44] 53793  - Will process 1 websites (--force-idsites)
INFO [2021-01-25 14:57:44] 53793  - Will process specified sites: 67
INFO [2021-01-25 14:57:44] 53793  ---------------------------
INFO [2021-01-25 14:57:44] 53793  START
INFO [2021-01-25 14:57:44] 53793  Starting Matomo reports archiving...
INFO [2021-01-25 14:57:44] 53793  Start processing archives for site 67.
INFO [2021-01-25 14:57:44] 53793    Will invalidate archived reports for today in site ID = 67's timezone (2021-01-25 00:00:00).
INFO [2021-01-25 14:57:44] 53793    Will invalidate archived reports for yesterday in site ID = 67's timezone (2021-01-24 00:00:00).
INFO [2021-01-25 14:57:45] 53793  Archived website id 67, period = day, date = 2021-01-24, segment = '', 374 visits found. Time elapsed: 1.078s
INFO [2021-01-25 14:57:45] 53793  Archived website id 67, period = day, date = 2021-01-23, segment = '', 311 visits found. Time elapsed: 1.078s
INFO [2021-01-25 14:57:45] 53793  Archived website id 67, period = day, date = 2021-01-22, segment = '', 405 visits found. Time elapsed: 1.078s
INFO [2021-01-25 14:57:46] 53793  Archived website id 67, period = day, date = 2021-01-24, segment = 'pageUrl=^https%3A%2F%2ourdomain.com%2F', 16 visits found. Time elapsed: 0.565s
INFO [2021-01-25 14:57:46] 53793  Archived website id 67, period = day, date = 2021-01-21, segment = '', 434 visits found. Time elapsed: 1.081s
INFO [2021-01-25 14:57:46] 53793  Archived website id 67, period = day, date = 2021-01-20, segment = '', 431 visits found. Time elapsed: 1.081s
INFO [2021-01-25 14:57:48] 53793  Archived website id 67, period = day, date = 2021-01-19, segment = '', 460 visits found. Time elapsed: 1.092s
INFO [2021-01-25 14:57:48] 53793  Archived website id 67, period = day, date = 2021-01-18, segment = '', 416 visits found. Time elapsed: 1.092s
INFO [2021-01-25 14:57:48] 53793  Archived website id 67, period = day, date = 2021-01-17, segment = '', 276 visits found. Time elapsed: 1.092s
INFO [2021-01-25 14:57:49] 53793  Archived website id 67, period = week, date = 2021-01-18, segment = '', 2831 visits found. Time elapsed: 1.085s
INFO [2021-01-25 14:57:49] 53793  Archived website id 67, period = day, date = 2021-01-16, segment = '', 244 visits found. Time elapsed: 1.085s
INFO [2021-01-25 14:57:49] 53793  Archived website id 67, period = day, date = 2021-01-15, segment = '', 376 visits found. Time elapsed: 1.085s
INFO [2021-01-25 14:57:51] 53793  Archived website id 67, period = week, date = 2021-01-18, segment = 'pageUrl=^https%3A%2F%2ourdomain.com%2F', 88 visits found. Time elapsed: 2.379s
INFO [2021-01-25 14:57:51] 53793  Archived website id 67, period = day, date = 2021-01-14, segment = '', 430 visits found. Time elapsed: 2.379s
INFO [2021-01-25 14:57:51] 53793  Archived website id 67, period = day, date = 2021-01-13, segment = '', 417 visits found. Time elapsed: 2.379s
INFO [2021-01-25 14:57:52] 53793  Archived website id 67, period = day, date = 2021-01-12, segment = '', 379 visits found. Time elapsed: 0.824s
INFO [2021-01-25 14:57:52] 53793  Archived website id 67, period = day, date = 2021-01-11, segment = '', 375 visits found. Time elapsed: 1.082s
INFO [2021-01-25 14:57:52] 53793  Archived website id 67, period = day, date = 2021-01-10, segment = '', 257 visits found. Time elapsed: 1.082s
INFO [2021-01-25 14:57:53] 53793  Archived website id 67, period = week, date = 2021-01-11, segment = '', 2497 visits found. Time elapsed: 1.089s
INFO [2021-01-25 14:57:53] 53793  Archived website id 67, period = day, date = 2021-01-09, segment = '', 242 visits found. Time elapsed: 1.089s
INFO [2021-01-25 14:57:53] 53793  Archived website id 67, period = day, date = 2021-01-08, segment = '', 326 visits found. Time elapsed: 1.089s
INFO [2021-01-25 14:57:55] 53793  Archived website id 67, period = day, date = 2021-01-07, segment = '', 401 visits found. Time elapsed: 1.086s
INFO [2021-01-25 14:57:55] 53793  Archived website id 67, period = day, date = 2021-01-06, segment = '', 352 visits found. Time elapsed: 1.086s
INFO [2021-01-25 14:57:55] 53793  Archived website id 67, period = day, date = 2021-01-05, segment = '', 366 visits found. Time elapsed: 1.086s
INFO [2021-01-25 14:57:56] 53793  Archived website id 67, period = day, date = 2021-01-04, segment = '', 424 visits found. Time elapsed: 1.091s
INFO [2021-01-25 14:57:56] 53793  Archived website id 67, period = day, date = 2021-01-03, segment = '', 296 visits found. Time elapsed: 1.091s
INFO [2021-01-25 14:57:56] 53793  Archived website id 67, period = day, date = 2021-01-02, segment = '', 240 visits found. Time elapsed: 1.091s
INFO [2021-01-25 14:57:57] 53793  Archived website id 67, period = week, date = 2021-01-04, segment = '', 2368 visits found. Time elapsed: 1.081s
INFO [2021-01-25 14:57:57] 53793  Archived website id 67, period = day, date = 2021-01-01, segment = '', 158 visits found. Time elapsed: 1.081s
INFO [2021-01-25 14:57:57] 53793  Archived website id 67, period = day, date = 2020-12-31, segment = '', 168 visits found. Time elapsed: 1.081s
INFO [2021-01-25 14:57:59] 53793  Archived website id 67, period = month, date = 2021-01-01, segment = '', 8476 visits found. Time elapsed: 1.594s
INFO [2021-01-25 14:57:59] 53793  Archived website id 67, period = day, date = 2020-12-30, segment = '', 233 visits found. Time elapsed: 1.594s
INFO [2021-01-25 14:57:59] 53793  Archived website id 67, period = day, date = 2020-12-29, segment = '', 264 visits found. Time elapsed: 1.594s
INFO [2021-01-25 14:58:05] 53793  Archived website id 67, period = month, date = 2021-01-01, segment = 'pageUrl=^https%3A%2F%2ourdomain.com%2F', 278 visits found. Time elapsed: 6.251s
INFO [2021-01-25 14:58:05] 53793  Archived website id 67, period = day, date = 2020-12-28, segment = '', 268 visits found. Time elapsed: 6.251s
INFO [2021-01-25 14:58:05] 53793  Archived website id 67, period = day, date = 2020-12-27, segment = '', 199 visits found. Time elapsed: 6.251s
INFO [2021-01-25 14:58:06] 53793  Archived website id 67, period = year, date = 2021-01-01, segment = '', 8476 visits found. Time elapsed: 1.103s
INFO [2021-01-25 14:58:06] 53793  Archived website id 67, period = day, date = 2020-12-26, segment = '', 148 visits found. Time elapsed: 1.103s
INFO [2021-01-25 14:58:07] 53793  Archived website id 67, period = year, date = 2021-01-01, segment = 'pageUrl=^https%3A%2F%2ourdomain.com%2F', 278 visits found. Time elapsed: 0.545s
INFO [2021-01-25 14:58:07] 53793  Archived website id 67, period = week, date = 2020-12-21, segment = '', 1422 visits found. Time elapsed: 0.803s
INFO [2021-01-25 14:58:08] 53793  Archived website id 67, period = week, date = 2020-12-28, segment = '', 1627 visits found. Time elapsed: 1.042s
INFO [2021-01-25 14:58:10] 53793  Archived website id 67, period = month, date = 2020-12-01, segment = '', 9791 visits found. Time elapsed: 1.558s
INFO [2021-01-25 14:58:11] 53793  Archived website id 67, period = year, date = 2020-01-01, segment = '', 28723 visits found. Time elapsed: 1.046s
INFO [2021-01-25 14:58:11] 53793  Finished archiving for site 67, 43 API requests, Time elapsed: 26.719s [1 / 1 done]

@tsteur
It's in German:

# Mandatory checks

## PHP-Version >= 7.2.5: ✔ 7.3.19-1~deb10u1


## PDO Erweiterung: ✔ 


## PDO\MYSQL Erweiterung: ✔ 


## MYSQLI Erweiterung: ✔ 


## Weitere erforderliche Anforderungen: ✔ zlib ✔ SPL ✔ iconv ✔ json ✔ mbstring ✔ Reflection


## Erforderliche Funktionen: ✔ debug_backtrace ✔ eval ✔ hash ✔ gzcompress ✔ gzuncompress ✔ pack


## Benötigte PHP Konfiguration (php.ini): ✔ session.auto_start = 0 ✔ max_execution_time = 0 OR >= 30


## Verzeichnisse mit Schreibzugriff: ✔ /var/cache/matomo ✔ /var/cache/matomo/assets ✔ /var/cache/matomo/cache ✔ /var/cache/matomo/climulti ✔ /var/cache/matomo/latest ✔ /var/cache/matomo/logs ✔ /var/cache/matomo/sessions ✔ /var/cache/matomo/tcpdf ✔ /var/cache/matomo/templates_c



# Optional checks

## Dateiintegrität: ✔ 


## Tracker-Status: ✔ 


## Speicherlimit: ✔ 2048M


## Zeitzone: ✔ 


## Öffnen einer URL: ✔ curl


## PageSpeed deaktiviert: ✔ 


## GD > 2.x + Freetype (graphics): ✔ 


## Andere Erweiterungen: ✔ json ✔ libxml ✔ dom ✔ SimpleXML


## Andere Funktionen: ✔ shell_exec ✔ set_time_limit ✔ mail ✔ parse_ini_file ✔ glob ✔ gzopen ✔ md5_file


## Dateisystem: ✔ 


## Cron einrichten - Prozesse via CLI steuern: ✔ OK


## Letzter erfolgreicher Abschluss der Archivierung: ✔ Der Archivierungsprozess wurde vor 00:34:48 erfolgreich abgeschlossen.


## Datenbankfähigkeiten: ✔ UTF8mb4 charset ✔ LOAD DATA INFILE ✔ CREATE TEMPORARY TABLES ✔ Changing transaction isolation level


## Maximale Packetgröße: ✔ 


## Erzwungene SSL Verbindung: ✔ 


## Standorterkennung: ⚠ Warning: Die voreingestellte Standorterkennung versucht das Herkunftsland des Besuchers anhand dessen verwendeter Sprache zu erkennen. Dies ist nicht sehr genau, daher <a href="https://matomo.org/docs/geo-locate/" rel="noreferrer noopener" target="_blank">wird empfohlen Geolocation-Datenbank zu installieren und zu nutzen.</a>


## Update über HTTPS: ✔ 


## Schreibbarer JavaScript-Tracker ("/matomo.js" & "/piwik.js"): ✔ 



# Informational results

## Matomo Version: 4.1.1


## Matomo Update History: 3.14.1,


## Matomo Install Version: Unknown - pre 3.8.


## PHP_OS: Linux


## PHP_BINARY: /usr/sbin/php-fpm7.3


## PHP SAPI: fpm-fcgi


## Timezone Version: 0.system


## PHP Timezone: UTC


## PHP Time: 1611585784


## PHP Datetime: 2021-01-25 14:43:04


## PHP Disabled functions: pcntl_alarm, pcntl_fork, pcntl_waitpid, pcntl_wait, pcntl_wifexited, pcntl_wifstopped, pcntl_wifsignaled, pcntl_wifcontinued, pcntl_wexitstatus, pcntl_wtermsig, pcntl_wstopsig, pcntl_signal, pcntl_signal_get_handler, pcntl_signal_dispatch, pcntl_get_last_error, pcntl_strerror, pcntl_sigprocmask, pcntl_sigwaitinfo, pcntl_sigtimedwait, pcntl_exec, pcntl_getpriority, pcntl_setpriority, pcntl_async_signals, 


## PHP INI max_execution_time: 30


## PHP INI post_max_size: 2M


## PHP INI max_input_vars: 1000


## PHP INI zlib.output_compression: 


## Curl Version: 7.64.0, OpenSSL/1.1.1d


## Suhosin Installed: 0


## DB Prefix: p_


## DB Charset: utf8mb4


## DB Adapter: PDO\MYSQL


## MySQL Version: 10.3.27-MariaDB-0+deb10u1


## Num Tables: 428


## Browser Segment Archiving Enabled: 1


## Development Mode Enabled: 0


## Internet Enabled: 1


## Multi Server Environment: 0


## Custom User Path: 0


## Custom Include Path: 0


## Plugins Activated: API, Actions, Annotations, BulkTracking, Contents, CoreAdminHome, CoreConsole, CoreHome, CorePluginsAdmin, CoreUpdater, CoreVisualizations, CustomDimensions, CustomJsTracker, CustomVariables, DBStats, Dashboard, DevicePlugins, DevicesDetection, Diagnostics, Goals, Heartbeat, ImageGraph, Installation, Intl, IntranetMeasurable, LanguagesManager, Live, LoginLdap 4.3.0, Marketplace, Monolog, Morpheus, MultiSites, Overlay, PagePerformance, PrivacyManager, ProfessionalServices, Provider, Proxy, Referrers, Resolution, SEO, ScheduledReports, SegmentEditor, SitesManager, Tour, Transitions, TwoFactorAuth, UserCountry, UserCountryMap, UserId, UserLanguage, UsersManager, VisitFrequency, VisitTime, VisitorInterest, VisitsSummary, WebsiteMeasurable


## Plugins Deactivated: Ecommerce, Events, ExampleRssWidget 1.0, Feedback, GeoIp2, Insights, Login, MobileAppMeasurable, MobileMessaging, PiwikPro 4.1.1, RssWidget, TagManager, Widgetize


## Plugins Invalid: 


## Server Info: nginx/1.18.0


## Had visits in last 1 day: 1


## Had visits in last 3 days: 1


## Had visits in last 5 days: 1


## Archive Time Last Started: 1611583501


## Archive Time Last Finished: 1611583696


## Num invalidations: 0 queued, 0 in progress


## User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0


## Browser Language: de-de,de,en-gb,en


## Anonymize Referrer: 


## Do Not Track enabled: 1

@diosmosis
Copy link
Member

@OnlineArts this is possibly the same issue as #17029. Would you be able to apply that change, invalidate the segment data and run core:archive? This should archive the segment data.

@OnlineArts
Copy link
Author

Hey, it does not look like that. After changing the files I get an error instead of a graph. During the archive process later one I could not see the segments. But I could observe that even before changing the files and right now, after reverting the changes, too.

@diosmosis
Copy link
Member

@OnlineArts that's unfortunate, could you show me the error?

And could you try invalidating a segment (via the plugin preferably) and running the following SQL query and posting the results:

SELECT DISTINCT name from archive_invalidations;

Then can you load the segment in the UI and check the &segment= parameter and post it here?

@OnlineArts
Copy link
Author

Hi, it looks like that the segments are not considered during the processing. This site id has 5 segments defined. I did several tests and they are separated by "###" in the log.

sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-11-30,2021-12-31

SELECT DISTINCT name from p_archive_invalidations;
+------+
| name |
+------+
| done |
+------+
1 row in set (0.000 sec)

sudo -u www-data ./console core:archive --force-idsites=67 --force-date-range=2020-11-30,2021-12-31
INFO [2021-01-31 23:11:09] 75856  ---------------------------
INFO [2021-01-31 23:11:09] 75856  INIT
INFO [2021-01-31 23:11:09] 75856  Running Matomo 4.1.1 as Super User
INFO [2021-01-31 23:11:09] 75856  ---------------------------
INFO [2021-01-31 23:11:09] 75856  NOTES
INFO [2021-01-31 23:11:09] 75856  - Async process archiving supported, using CliMulti.
INFO [2021-01-31 23:11:09] 75856  - Reports for today will be processed at most every 1800 seconds. You can change this value in Matomo UI > Settings > General Settings.
INFO [2021-01-31 23:11:09] 75856  - Archiving was last executed without error 2 min 53s ago
INFO [2021-01-31 23:11:09] 75856  - Will process 1 websites (--force-idsites)
INFO [2021-01-31 23:11:09] 75856  - Will process specified sites: 67
INFO [2021-01-31 23:11:09] 75856  ---------------------------
INFO [2021-01-31 23:11:09] 75856  START
INFO [2021-01-31 23:11:09] 75856  Starting Matomo reports archiving...
INFO [2021-01-31 23:11:09] 75856  Start processing archives for site 67.
INFO [2021-01-31 23:11:12] 75856  Archived website id 67, period = day, date = 2021-01-29, segment = '', 4 visits found. Time elapsed: 0.308s
INFO [2021-01-31 23:11:12] 75856  Archived website id 67, period = day, date = 2021-01-27, segment = '', 4 visits found. Time elapsed: 0.308s
INFO [2021-01-31 23:11:12] 75856  Archived website id 67, period = day, date = 2021-01-26, segment = '', 4 visits found. Time elapsed: 0.566s
INFO [2021-01-31 23:11:14] 75856  Archived website id 67, period = day, date = 2021-01-25, segment = '', 90 visits found. Time elapsed: 0.819s
INFO [2021-01-31 23:11:14] 75856  Archived website id 67, period = day, date = 2021-01-24, segment = '', 374 visits found. Time elapsed: 1.592s
INFO [2021-01-31 23:11:14] 75856  Archived website id 67, period = day, date = 2021-01-23, segment = '', 311 visits found. Time elapsed: 1.592s
INFO [2021-01-31 23:11:15] 75856  Archived website id 67, period = week, date = 2021-01-25, segment = '', 102 visits found. Time elapsed: 0.568s
INFO [2021-01-31 23:11:15] 75856  Archived website id 67, period = day, date = 2021-01-22, segment = '', 405 visits found. Time elapsed: 1.083s
INFO [2021-01-31 23:11:15] 75856  Archived website id 67, period = day, date = 2021-01-21, segment = '', 434 visits found. Time elapsed: 1.083s
INFO [2021-01-31 23:11:17] 75856  Archived website id 67, period = day, date = 2021-01-20, segment = '', 431 visits found. Time elapsed: 1.098s
INFO [2021-01-31 23:11:17] 75856  Archived website id 67, period = day, date = 2021-01-19, segment = '', 460 visits found. Time elapsed: 1.356s
INFO [2021-01-31 23:11:17] 75856  Archived website id 67, period = day, date = 2021-01-18, segment = '', 416 visits found. Time elapsed: 1.356s
INFO [2021-01-31 23:11:18] 75856  Archived website id 67, period = week, date = 2021-01-18, segment = '', 2831 visits found. Time elapsed: 1.077s
INFO [2021-01-31 23:11:18] 75856  Archived website id 67, period = day, date = 2021-01-17, segment = '', 276 visits found. Time elapsed: 1.077s
INFO [2021-01-31 23:11:18] 75856  Archived website id 67, period = day, date = 2021-01-16, segment = '', 244 visits found. Time elapsed: 1.077s
INFO [2021-01-31 23:11:19] 75856  Archived website id 67, period = day, date = 2021-01-15, segment = '', 376 visits found. Time elapsed: 0.828s
INFO [2021-01-31 23:11:19] 75856  Archived website id 67, period = day, date = 2021-01-14, segment = '', 430 visits found. Time elapsed: 0.829s
INFO [2021-01-31 23:11:19] 75856  Archived website id 67, period = day, date = 2021-01-13, segment = '', 417 visits found. Time elapsed: 1.087s
INFO [2021-01-31 23:11:20] 75856  Archived website id 67, period = day, date = 2021-01-12, segment = '', 379 visits found. Time elapsed: 0.839s
INFO [2021-01-31 23:11:20] 75856  Archived website id 67, period = day, date = 2021-01-11, segment = '', 375 visits found. Time elapsed: 0.839s
INFO [2021-01-31 23:11:20] 75856  Archived website id 67, period = day, date = 2021-01-10, segment = '', 257 visits found. Time elapsed: 1.097s
INFO [2021-01-31 23:11:21] 75856  Archived website id 67, period = week, date = 2021-01-11, segment = '', 2497 visits found. Time elapsed: 1.095s
INFO [2021-01-31 23:11:21] 75856  Archived website id 67, period = day, date = 2021-01-09, segment = '', 242 visits found. Time elapsed: 1.095s
INFO [2021-01-31 23:11:21] 75856  Archived website id 67, period = day, date = 2021-01-08, segment = '', 326 visits found. Time elapsed: 1.095s
INFO [2021-01-31 23:11:22] 75856  Archived website id 67, period = day, date = 2021-01-07, segment = '', 401 visits found. Time elapsed: 1.082s
INFO [2021-01-31 23:11:22] 75856  Archived website id 67, period = day, date = 2021-01-06, segment = '', 352 visits found. Time elapsed: 1.082s
INFO [2021-01-31 23:11:22] 75856  Archived website id 67, period = day, date = 2021-01-05, segment = '', 366 visits found. Time elapsed: 1.083s
INFO [2021-01-31 23:11:24] 75856  Archived website id 67, period = day, date = 2021-01-04, segment = '', 424 visits found. Time elapsed: 1.079s
INFO [2021-01-31 23:11:24] 75856  Archived website id 67, period = day, date = 2021-01-03, segment = '', 296 visits found. Time elapsed: 1.079s
INFO [2021-01-31 23:11:24] 75856  Archived website id 67, period = day, date = 2021-01-02, segment = '', 240 visits found. Time elapsed: 1.080s
INFO [2021-01-31 23:11:25] 75856  Archived website id 67, period = week, date = 2021-01-04, segment = '', 2368 visits found. Time elapsed: 1.092s
INFO [2021-01-31 23:11:25] 75856  Archived website id 67, period = day, date = 2021-01-01, segment = '', 158 visits found. Time elapsed: 1.092s
INFO [2021-01-31 23:11:25] 75856  Archived website id 67, period = day, date = 2020-12-31, segment = '', 168 visits found. Time elapsed: 1.092s
INFO [2021-01-31 23:11:26] 75856  Archived website id 67, period = month, date = 2021-01-01, segment = '', 8492 visits found. Time elapsed: 1.094s
INFO [2021-01-31 23:11:26] 75856  Archived website id 67, period = day, date = 2020-12-30, segment = '', 233 visits found. Time elapsed: 1.094s
INFO [2021-01-31 23:11:26] 75856  Archived website id 67, period = day, date = 2020-12-29, segment = '', 264 visits found. Time elapsed: 1.094s
INFO [2021-01-31 23:11:27] 75856  Archived website id 67, period = year, date = 2021-01-01, segment = '', 8492 visits found. Time elapsed: 0.568s
INFO [2021-01-31 23:11:27] 75856  Archived website id 67, period = day, date = 2020-12-28, segment = '', 268 visits found. Time elapsed: 0.826s
INFO [2021-01-31 23:11:27] 75856  Archived website id 67, period = day, date = 2020-12-27, segment = '', 195 visits found. Time elapsed: 0.826s
INFO [2021-01-31 23:11:28] 75856  Archived website id 67, period = week, date = 2020-12-28, segment = '', 1627 visits found. Time elapsed: 0.825s
INFO [2021-01-31 23:11:28] 75856  Archived website id 67, period = week, date = 2020-12-21, segment = '', 195 visits found. Time elapsed: 0.825s
INFO [2021-01-31 23:11:28] 75856  Archived website id 67, period = month, date = 2020-11-01, segment = '', 11227 visits found. Time elapsed: 0.825s
INFO [2021-01-31 23:11:29] 75856  Archived website id 67, period = month, date = 2020-12-01, segment = '', 1128 visits found. Time elapsed: 1.041s
INFO [2021-01-31 23:11:30] 75856  Archived website id 67, period = year, date = 2020-01-01, segment = '', 19600 visits found. Time elapsed: 1.040s
INFO [2021-01-31 23:11:30] 75856  Finished archiving for site 67, 44 API requests, Time elapsed: 21.344s [1 / 1 done]
INFO [2021-01-31 23:11:30] 75856  Done archiving!
INFO [2021-01-31 23:11:30] 75856  ---------------------------
INFO [2021-01-31 23:11:30] 75856  SUMMARY
INFO [2021-01-31 23:11:30] 75856  Processed 44 archives.
INFO [2021-01-31 23:11:30] 75856  Total API requests: 44
INFO [2021-01-31 23:11:30] 75856  done: 44 req, 21363 ms, no error
INFO [2021-01-31 23:11:30] 75856  Time elapsed: 21.363s
INFO [2021-01-31 23:11:30] 75856  ---------------------------
INFO [2021-01-31 23:11:30] 75856  SCHEDULED TASKS
INFO [2021-01-31 23:11:30] 75856  Starting Scheduled tasks... 
INFO [2021-01-31 23:11:30] 75856  done
INFO [2021-01-31 23:11:30] 75856  ---------------------------


SELECT DISTINCT name from p_archive_invalidations;
Empty set (0.000 sec)

###

/var/www/matomo# sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-11-30,2021-12-31 --segment=pageUrl=^https%253A%252F%252Fourdomain.com%252F
Invalidating day periods in 2020-11-30,2021-12-31 [segment = pageUrl=^https%3A%2F%2Fourdomain.com%2F]...
Invalidating week periods in 2020-11-30,2021-12-31 [segment = pageUrl=^https%3A%2F%2Fourdomain.com%2F]...
Invalidating month periods in 2020-11-30,2021-12-31 [segment = pageUrl=^https%3A%2F%2Fourdomain.com%2F]...
Invalidating year periods in 2020-11-30,2021-12-31 [segment = pageUrl=^https%3A%2F%2Fourdomain.com%2F]...

SELECT DISTINCT name from p_archive_invalidations;
+--------------------------------------+
| name                                 |
+--------------------------------------+
| done98a5fa77cd2881919feb73c5fc5daa2d |
+--------------------------------------+

sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-11-30,2021-12-31

SELECT DISTINCT name from p_archive_invalidations;
+--------------------------------------+
| name                                 |
+--------------------------------------+
| done98a5fa77cd2881919feb73c5fc5daa2d |
+--------------------------------------+

###

sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-11-30,2021-12-31 --segment=pageUrl=^https://ourdomain.com/

SELECT DISTINCT name from p_archive_invalidations;
+--------------------------------------+
| name                                 |
+--------------------------------------+
| done98a5fa77cd2881919feb73c5fc5daa2d |
+--------------------------------------+

sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-11-30,2021-12-31 --force-idsegment=80

SELECT DISTINCT name from p_archive_invalidations;
+--------------------------------------+
| name                                 |
+--------------------------------------+
| done98a5fa77cd2881919feb73c5fc5daa2d |
+--------------------------------------+

After a while (2-5 min) the SELECT Statment will lead to an empty result.

&segment=pageUrl%3D^https%25253A%25252F%25252Fourdomain.com%25252F is the requested parameter

The missing visitor graph says (in German):
Oops… während der Anfrage ist ein Problem aufgetreten. Möglicherweise war der Server temporär überlastet, oder eventuell haben Sie einen Bericht mit zu vielen Daten angefordert. Bitte noch einmal versuchen. Wenn dieser Fehler wiederholt auftritt kontaktieren Sie bitte Ihren Matomo Administrator, um Unterstützung zu erhalten....

roughly translated:
An error occurred during your request. The server could be temporarily overloaded, or you are requesting too much data. Try it again. If this error appears again please contact the administrator.

Unfortunately, I can not find any log entries in our Nginx error log file.

Could that issue relate to the fact that I'm using a specific site id and date or our setting, that we are deleting old data after 30 days? I assume, that Matomo should be still able to read the pageURL from our SQL archive blobs and numerics since I could do that manually. But still, the data are disappeared after the Matomo 4 upgrade.

(please keep in mind that I'm replaced our TLD)

@diosmosis
Copy link
Member

@OnlineArts can you try again w/ the following command:

./console core:invalidate-report-data --sites=67 --dates=2020-11-30,2021-12-31 --segment=pageUrl%3D^https%25253A%25252F%25252Fourdomain.com%25252F

(where the segment value is the same as in the query parameter) and w/ the changes in the PR I linked applied?

Could that issue relate to the fact that I'm using a specific site id and date or our setting, that we are deleting old data after 30 days?

If the data for the segment exists in the Visits Log, then this would not be the cause. If there is nothing in the visits log for the specific period, then there wouldn't be any data to aggregate again (which you can get back by replaying webserver logs if needed).

@OnlineArts
Copy link
Author

OnlineArts commented Jan 31, 2021

@diosmosis
Edit: I started command but it still runs.

Thank you for your fast response. I think we get closer to the current issue. The visit logs are empty for that dates, but our archives are going back until the year <2010. I was able to read out these compressed archives but it is not comfortable to read the arrays. Is there a non-destructive way to recover these data into Matomo or the visit logs?

Usually, the deletion should not be the issue since we segment graphs still contained the relevant numbers.

Several months ago I was able to invalidate and re-processing download logs from the past 6 months. Would this not work again with Matomo 4?

By the way, I think I have the Matomo database dump before the upgrade to version 4.

@diosmosis
Copy link
Member

Is there a non-destructive way to recover these data into Matomo or the visit logs?

The archives are aggregated using the log data, you can't go backwards from them to log data. Unfortunately, the only way to get them back into Matomo is by replaying webserver logs. You can replay logs of HTTP requests to your website (ie apache/nginx logs) or you can replay logs to your matomo instance (ie, logs of HTTP requests that go to your matomo.php/piwik.php) (this is the better choice as it will include more data). Both require using github.com/matomo-org/log-analytics. Is this something you can do.

Several months ago I was able to invalidate and re-processing download logs from the past 6 months. Would this not work again with Matomo 4?

If you mean invalidate and re-archive based on data in the log_* tables, then it would only work if the data was still in the log_* data tables. If it's not there because it was deleted, then there's unfortunately nothing to aggregate.

By the way, I think I have the Matomo database dump before the upgrade to version 4.

If it contains visit data for the the dates you need segment data for, then you could use it. But since this problem occurred after the upgrade to version 4, I'm guessing it won't have the data you need.

@OnlineArts
Copy link
Author

@diosmosis
Thank you for your answer again.
Unfortunately, we had to delete our weblogs due to the GDPR after seven days.

Does Matomo 4 upgrade change the behavior or default privacy settings? Or does Matomo "re-archived" everything during the upgrade and lost the old deleted visit log/segment logs aggregates?

The only possibility I see to restore these data would be to write fake weblogs that contain the full URL from the archive_blob_* tables. This should probably work, right?

@diosmosis
Copy link
Member

Does Matomo 4 upgrade change the behavior or default privacy settings?

There shouldn't be anything that would result in such a change in settings.

Or does Matomo "re-archived" everything during the upgrade and lost the old deleted visit log/segment logs aggregates?

Nothing in the upgrade would have deleted old log data. Only the related PrivacyManager feature should do this. What date did you update your Matomo? And what is the earliest date you are missing segment data? Can you check what the earliest visit_last_action_time value is in the log_visit table? You shouldn't be missing any segment data from dates where you were using Matomo 3.

The only possibility I see to restore these data would be to write fake weblogs that contain the full URL from the archive_blob_* tables. This should probably work, right?

That's aggregated data, so that wouldn't work. You'd have to guess all the other information, like referrer URL, etc.

@OnlineArts
Copy link
Author

OnlineArts commented Feb 1, 2021

I updated Matomo on the first January. The first Matomo web page request for one segment was on the 11. January 2021 and thus we could see for this segment only data until the 11. December 2020. Later we opened another segment page at the 23. January and get only data until the 23. December.

You shouldn't be missing any segment data from dates where you were using Matomo 3.

Right, but the segments are older and contained old data before the upgrade ;-)

SELECT idsegment, deleted, ts_created, ts_last_edit FROM p_segment WHERE enable_only_idsite = 67;
+-----------+---------+---------------------+---------------------+
| idsegment | deleted | ts_created          | ts_last_edit        |
+-----------+---------+---------------------+---------------------+
|        77 |       0 | 2020-06-10 12:21:04 | 2020-07-09 12:29:42 |
|        78 |       0 | 2020-06-10 12:21:20 | 2020-07-09 12:30:36 |
|        79 |       0 | 2020-06-10 12:21:46 | 2020-07-09 12:30:56 |
|        80 |       0 | 2020-06-10 12:22:05 | 2021-01-31 23:01:06 |
|        82 |       0 | 2020-07-09 12:23:21 | 2020-07-09 12:30:11 |
|        84 |       0 | 2020-10-01 02:49:08 | NULL                |
|        88 |       0 | 2020-11-24 08:31:29 | 2020-11-24 08:31:42 |
+-----------+---------+---------------------+---------------------+
SELECT visit_last_action_time FROM p_log_visit ORDER BY visit_last_action_time ASC LIMIT 1;
+------------------------+
| visit_last_action_time |
+------------------------+
| 2020-12-27 00:00:17    |
+------------------------+
1 row in set (2.654 sec)

Just for illustration, our segmented data are lost since the first web page request of the segmentation page in mid-January after the Matomo 4 upgrade in the first January:
matomo-segments

(the drop a few days ago is because the user created a separate Matomo instance)

@diosmosis
Copy link
Member

diosmosis commented Feb 1, 2021

@OnlineArts ok, I understand what's happening now. AFAIK nothing should be deleting old segment reports automatically. You should be able to restore the old blob data from your backup, but if there's something strange going on then you'll keep losing data. Can you post your settings for the "Regularly delete old raw data" and "Delete old aggregated report data" sections? In the meantime I would recommend regularly creating backups (if you're not doing so already).

EDIT: and just to be clear, you're missing old data for EVERY segment, correct?

@OnlineArts
Copy link
Author

OnlineArts commented Feb 1, 2021

@OnlineArts ok, I understand what's happening now. AFAIK nothing should be deleting old segment reports automatically. You should be able to restore the old blob data from your backup, but if there's something strange going on then you'll keep losing data.

How could I trace that? By observing the blob data table sizes?

Can you post your settings for the "Regularly delete old raw data" and

It was set to 30. I disabled that today. I don't think that this loss of older data than 30 days was by chance.

"Delete old aggregated report data" sections?

This was and is disabled.

In the meantime I would recommend regularly creating backups (if you're not doing so already).

Unfortunately, only for seven days and we realized that one week too late. I will initialize tape backups on a weekly basis.

EDIT: and just to be clear, you're missing old data for EVERY segment, correct?

Yes, on all segments of all our site IDs.

I can not swear that this is related to the Matomo update but due to the time frame, I assume it is related to the update.

@OnlineArts
Copy link
Author

Is there a way to identify the Matomo version from a database dump? I would like to setup a Matomo instance from the old dump and see would I could get out of data there.

@diosmosis
Copy link
Member

Is there a way to identify the Matomo version from a database dump?

It would be in the option table where option_name = 'version_core';

@diosmosis
Copy link
Member

How could I trace that? By observing the blob data table sizes?

You could monitor it by checking whether segment data from tables were suddenly going missing. For example, queries like SELECT COUNT(DISTINCT name) FROM archive_numeric_2021_01 WHERE name LIKE 'done%' and SELECT COUNT(DISTINCT name) FROM archive_blob_2021_01 WHERE name LIKE 'done%'.

I would check the counts before and after each core:archive run. If something is deleting the data automatically, then it would be in a scheduled task, which are run just before core:archive exits. Knowing when the delete happens and looking at the scheduled tasks that ran might help narrow down the cause.

@OnlineArts
Copy link
Author

OnlineArts commented Feb 1, 2021

@diosmosis
Thank you. I could find a backup from the 1st of January and will check that out. I will take action for future updates and archiving procedures.

Since I'm not the only one with this issue would be great to find a solution. Maybe #11394 is related to that too. Could indicating something is going on during the update process.

Edit: By the way, I converted the MariaDB database from utf8 to utf8mb4 after the update.

@diosmosis
Copy link
Member

@OnlineArts the forum issue appears to be a different issue (new segments not triggering archiving of past data, rather than pre-existing past segment data disappearing). I'm definitely not sure why this would happen to your matomo.

Edit: By the way, I converted the MariaDB database from utf8 to utf8mb4 after the update.

One thing that might be worth checking: load the backup into a new database for a new matomo, then update it w/o doing anything else and see if the segments disappear. (Also if you're willing to share the database dump (or a modified version of it w/ less data), you could email it to me at dizzy@innocraft.com and I could try to reproduce locally).

@OnlineArts OnlineArts changed the title Segments (re)processing of historical data in Matomo 4 Disappearance of segment archive data Feb 2, 2021
@OnlineArts
Copy link
Author

OnlineArts commented Feb 2, 2021

@diosmosis
You're right, that is something different.

Now it's getting weird. I could restore our database with Matomo 3.14.1 from the first of January 2021 and see our segment data until the 1st of December 2020.

I restore now the database again but this time overwrite the option "delete_logs_enable" to NULL before I'm starting the web page. Let's see. Could it be, that the segment data are falsely affected by this option?

Unfortunately, I can not send you the data, due to governmental laws. Our gzipped database dump requires 20 GiB.

@OnlineArts
Copy link
Author

Gotcha!
Performing the following SQL updates immediately after the database import before opening the web page allows me to see all our lost segment data! But after executing once the core:archive I lost them all until the 1st December again.

UPDATE p_option SET option_value = "" WHERE option_name = "delete_logs_enable";
UPDATE p_option SET option_value = 365 WHERE option_name = "delete_logs_older_than";

The most tricky question: is it possible to merge those two databases? the old backup with the current tracking database? They diverse since 32 days and the version 3.14 to 4.1.1 upgrade.

So the data loss could happen during the upgrade or by opening the webpage before the upgrade.

@diosmosis
Copy link
Member

@OnlineArts

But after executing once the core:archive I lost them all until the 1st December again.

Do you have the output for this core:archive run? If you can replicate the incident and run core:archive w/ -vvv and post or send me the output, that would help me diagnose the issue.

The most tricky question: is it possible to merge those two databases? the old backup with the current tracking database?

Yes, it's possible, you'd just need to use the archive data for the old backup until the data loss ends. So inserting archive_numeric & archive_blob data in the old backup to the new database if the new database is missing data for the specific idsite/date1/date2/period/name combination.

You'd could do something like:

1. for every archive_numeric row where name like 'done%' and name <> 'done' and ts_archived is the latest for the idsite/date1/date2/period/name combination in the old backup:
  2. check if the same idsite/date1/date2/period/name combination exists in the new database
  3. if not, get the idarchive, then insert all archive_numeric/archive_blob data w/ the same idarchive into the appropriate table in the new database

This will essentially make the reports available in the new database. But I wouldn't do this until we know for sure they won't just get deleted again.

@OnlineArts
Copy link
Author

@diosmosis

Do you have the output for this core:archive run? If you can replicate the incident and run core:archive w/ -vvv and post or send me the output, that would help me diagnose the issue.

Yes, I will do that. It just takes some hours to load the dump again.

@OnlineArts
Copy link
Author

As far as I know, the archiver of Matomo 3.14.1 does not wipe out the segment data. I looked on the wrong page. On the combined segment page view, it looks like the segment data are missing. But on the single segment page view, they are still present.

I created today a fresh instance from the database dump (1. January). With the activated option to delete raw data after 30 days back again and performed a manual Matomo 4.1.1 update. I did not run the archiver before. Probably, that is the reason why I got the last 30 days' visits from 1. December to the 1. January. After the update, the segment data were lost again.

Matomo
update.log
The next step will be to disable the option to delete old raw data and redo the process and update.

@diosmosis
Copy link
Member

@OnlineArts thanks for collecting all this useful information!

When you say you updated manually, you mean you ran the SQL and commands yourself instead of using the updater? From the update log, the only SQL that touches the archive tables are the:

ALTER TABLE `p_archive_blob_2018_11` CHANGE `name` `name` VARCHAR(190)

SQL statements. The name column is what identifies an archive as being for "all visits" or a segment, so it's possible these statements are doing something strange. Though this wouldn't explain why the segments disappear again after an archive.

If you are going to manually update again, could you run a couple commands/sql statements at a time, then check whether the count of archives in a table changes?

@OnlineArts
Copy link
Author

OnlineArts commented Feb 8, 2021

When you say you updated manually, you mean you ran the SQL and commands yourself instead of using the updater? From the update log, the only SQL that touches the archive tables are the:

@diosmosis
Manually means I run the updater via the CLI and not the web updater.
I did not run explicitly the archiver after the updater. Cronjobs are not enabled on the recovery instances.

If you are going to manually update again, could you run a couple commands/sql statements at a time, then check whether the count of archives in a table changes?

Yes, if you could provide me all statements. Thankfully, the mysql console accepts several inputs at once.

Edit:
I had to clarify that the archiver of 3.14.1 does not remove the segment data. I thought that because I looked to the combined "all visits + segment visits" page. As you see in my image, the segments are truncated only on that view. So the issue could be related to the updater itself.

@OnlineArts
Copy link
Author

Ok, I reproduced the recovery, set delete_logs_enable to "" and delete_logs_older_than to 365 and performed the update. Segments data are again lost. These options did not have any effect on that.

This time I would like to check to count the archive_numerics like %done%, before the upgrade, after the upgrade, and the first time opening the page after the update.

@diosmosis
Copy link
Member

diosmosis commented Feb 9, 2021

@OnlineArts

Yes, if you could provide me all statements. Thankfully, the mysql console accepts several inputs at once.

These are available via the web updater and via the core:update command. If you run core:update w/o --yes it will display the list of sql and commands to run in order (it's the first half of the update log you posted). You'll have to run core:update again after though since it won't update the version values in the option table (and it may not display automated dimension changes). But you can run them and see what effect they have. I would measure done% counts since you may not be able to load the UI while having done only a partial upgrade.

@diosmosis
Copy link
Member

@OnlineArts just one quick note, in case you're not already doing it for this test, it would be helpful to make sure the core:archive cron job is not enabled during the upgrade, then check for the segment data while it is still disabled.

@OnlineArts
Copy link
Author

@diosmosis
Sorry, for the late response. I was quite busy with different stuff. Today, I finished the upgrade (the cronjob was disabled as well a the apache web-server). The segment data are in fact lost again...

Noticeable, the last statements failed:

ALTER TABLE `p_custom_dimensions` ADD COLUMN `case_sensitive` TINYINT UNSIGNED NOT NULL DEFAULT 1 AFTER `extractions`;
ERROR 1146 (42S02): Table 'matomo.p_custom_dimensions' doesn't exist

ALTER TABLE `p_custom_dimensions` DROP INDEX `idcustomdimension_idsite`;
ERROR 1146 (42S02): Table 'matomo.p_custom_dimensions' doesn't exist

ALTER TABLE `p_custom_dimensions` ADD PRIMARY KEY(`idcustomdimension`, `idsite`);
ERROR 1146 (42S02): Table 'matomo.p_custom_dimensions' doesn't exist

ALTER TABLE `p_log_visit` ADD COLUMN `profilable` TINYINT(1) NULL, ADD COLUMN `visitor_seconds_since_first` INT(11) UNSIGNED NULL, ADD COLUMN `visitor_seconds_since_order` INT(11) UNSIGNED NULL, MODIFY COLUMN `visitor_count_visits` INT(11) UNSIGNED NOT NULL DEFAULT 0, MODIFY COLUMN `visit_total_interactions` MEDIUMINT UNSIGNED DEFAULT 0, MODIFY COLUMN `referer_name` VARCHAR(255) NULL, MODIFY COLUMN `referer_url` VARCHAR(1500) NULL, MODIFY COLUMN `config_browser_name` VARCHAR(40) NULL, ADD COLUMN `config_client_type` TINYINT( 1 ) NULL DEFAULT NULL, ADD COLUMN `visitor_seconds_since_last` INT(11) UNSIGNED NULL, MODIFY COLUMN `location_region` char(3) DEFAULT NULL;
ERROR 1060 (42S21): Duplicate column name 'profilable'

ALTER TABLE `p_log_conversion` ADD COLUMN `visitor_seconds_since_first` INT(11) UNSIGNED NULL, ADD COLUMN `visitor_seconds_since_order` INT(11) UNSIGNED NULL, MODIFY COLUMN `visitor_count_visits` INT(11) UNSIGNED NOT NULL DEFAULT 0, MODIFY COLUMN `referer_name` VARCHAR(255) NULL, ADD COLUMN `config_browser_name` VARCHAR(40) NULL, ADD COLUMN `config_client_type` TINYINT( 1 ) NULL DEFAULT NULL, MODIFY COLUMN `location_region` char(3) DEFAULT NULL;
ERROR 1060 (42S21): Duplicate column name 'visitor_seconds_since_first'

ALTER TABLE `p_log_link_visit_action` ADD COLUMN `search_cat` VARCHAR(200) NULL, ADD COLUMN `search_count` INTEGER(10) UNSIGNED NULL, ADD COLUMN `time_dom_completion` MEDIUMINT(10) UNSIGNED NULL, ADD COLUMN `time_dom_processing` MEDIUMINT(10) UNSIGNED NULL, ADD COLUMN `time_network` MEDIUMINT(10) UNSIGNED NULL, ADD COLUMN `time_on_load` MEDIUMINT(10) UNSIGNED NULL, ADD COLUMN `time_server` MEDIUMINT(10) UNSIGNED NULL, ADD COLUMN `time_transfer` MEDIUMINT(10) UNSIGNED NULL;
ERROR 1060 (42S21): Duplicate column name 'search_cat'

I'm not sure from where these indexes are coming from but I'm pretty sure that there were added automatically or through a guide or an official recommendation.

matomo_sql.log

@diosmosis
Copy link
Member

Hi @OnlineArts, did you manage to find out which sql statement in the update caused the data to disappear?

@OnlineArts
Copy link
Author

OnlineArts commented Apr 11, 2021

@diosmosis
Unfortunately not.

Today I was able to apply all the SQL updates commands (3.14.1->4.1.1) without executing the console script. It looks like the segment data are still present but in for some segments I received:
QLSTATE[42S22]: Column not found: 1054 Unknown column 'log_conversion.visitor_days_since_first' in 'field list' - in plugin Goals

I'll retry to apply all SQL changes at once without checking in between the graphs, maybe the look-up influences the procedure. I update the database drive that should accelerate the executions.

Edit: Obviously the graph was somewhere cached / preprocessed. Running all sql commands will lead now directly to aforementioned error message.
mysql_upgrade.log

How to avoid this?

@diosmosis
Copy link
Member

Hi @OnlineArts,visitor_days_since_first should no longer be in use in 4.x, I'm guessing the code base is still pre-4.x? Or maybe the webserver is using a previous file cached w/ opcache or something.

@dieisraels
Copy link

I have the same problem that the historical data of segments are not rehashed in Matomo 4.2.1. The segment definition is:

pageUrl=@%252FNG%25C3%259C%252F,pageUrl=@.NG%25C3%259C%252F,pageUrl=@%252FNG%25C3%259C.,pageUrl=@.NG%25C3%259C.,pageUrl=@%252FNG%252525C3%2525259C%252F,pageUrl=@.NG%252525C3%2525259C%252F,pageUrl=@%252FNG%252525C3%2525259C.,pageUrl=@.NG%252525C3%2525259C.

Or a bit simpler:

pageURL=@%252Fechte-helden%252F
console core:invalidate-report-data --sites=1 --segment="pageUrl=@%252FNG%25C3%259C%252F,pageUrl=@.NG%25C3%259C%252F,pageUrl=@%252FNG%25C3%259C.,pageUrl=@.NG%25C3%259C.,pageUrl=@%252FNG%252525C3%2525259C%252F,pageUrl=@.NG%252525C3%2525259C%252F,pageUrl=@%252FNG%252525C3%2525259C.,pageUrl=@.NG%252525C3%2525259C." --dates=2021-01-01,2021-02-28
console core:archive --force-all-websites --url=xxx

We are only shown data after the segment was created.

@diosmosis
Copy link
Member

Hi @dieisraels, what version of Matomo are you using? And what precisely do you mean by after the segment was created? Do you mean if a segment is created today on May 9th, 2021, you will only see data for May 9th onwards?

Also note, that the parameter supplied in core:invalidate-report-data has to be encoded the exact same way it appears in the URL when viewing the segment data from the UI. The value you're using is missing a urlencode. The easiest way to get it would be to load the segment in the UI and copy the segment parameter value from the URL.

@dieisraels
Copy link

Thank you for pointing this out. The missing URL encoding was a problem. And the archiving process was partially aborted because the DB was restarted!?! I need to get it checked why this is the case. Should I assume that the reports are incomplete if the archiving process was aborted? Or will incorrect reports be recreated during the next archiving process?

@dieisraels
Copy link

One more note (it may not be the appropriate issue): I found that the invalidation of the segments via the plugin in the interface does not work as expected:
matomo-donation

If I start the archiving process afterwards, no new data is processed:
matomo-cron-1

If I invalidate the data via CLI, then the segment is reprocessed by the archiving process:
matomo-cron-2

@diosmosis
Copy link
Member

diosmosis commented May 10, 2021

@dieisraels

I need to get it checked why this is the case. Should I assume that the reports are incomplete if the archiving process was aborted? Or will incorrect reports be recreated during the next archiving process?

When archiving finishes for a period, it's marked as done, if it doesn't finish this doesn't happen, and it should be processed again.

One more note (it may not be the appropriate issue): I found that the invalidation of the segments via the plugin in the interface does not work as expected:

Can you tell me if you specified a website to invalidate or was the first selector empty when you invalidated?

@dieisraels
Copy link

@diosmosis

When archiving finishes for a period, it's marked as done, if it doesn't finish this doesn't happen, and it should be processed again.

Then I wonder why the historical data was never prepared after creating the segments? Only after invalidating via the console this seems to work.

Can you tell me if you specified a website to invalidate or was the first selector empty when you invalidated?

Yes, the website was selected in the dropdown. I had only removed it because I did not want to display the domain.

@diosmosis
Copy link
Member

@dieisraels

Then I wonder why the historical data was never prepared after creating the segments?

We added the automatic invalidation of historical data for new/updated segments in 4.0, so it should've had an effect. I'm not sure why it wouldn't, but if you create a new segment or edit an existing one, new entries should be added to the archive_invalidations table to schedule archiving of historical data. It's possible there was something causing the data to be masked in some way. If this happens again after 4.3.0 is released, it might be worth looking into again (there were several fixes around this area).

Yes, the website was selected in the dropdown. I had only removed it because I did not want to display the domain.

Thanks, I'll see if I can reproduce this.

@dieisraels
Copy link

@diosmosis After updating to Matomo 4.3, the invalidation of segments works correctly again. Also with the plugin via the UI.

@utrautmann
Copy link

@diosmosis @dieisraels

After updating to Matomo 4.3, the invalidation of segments works correctly again.

What that means? Does every new segment have to be invalidated manually?

Here are some examples to the current behaviour:

If I create a new segment with Matomo 4.2.1 or 4.3 and the segment criteria contains any encoded string like

  • segment = 'browserName==115%20Browser'
  • segment = 'pageUrl=@newsportal%2Fpressemeldungen'

then the archiving process is only working for the segment for today and yesterday.

grafik

All days are only archived if I manually invalidate this segment (with the plugin). But that's just a workaround, as described here, isn't it?
https://forum.matomo.org/t/archiving-of-newly-created-segments-since-matomo-4/40352

If the new segment did not contains any encoded string like this:

  • segment = 'pageUrl=@pressemeldungen'
  • segment = 'deviceBrand==Apple'

then it works.
grafik

@diosmosis diosmosis changed the title Disappearance of segment archive data Disappearance of segment archive data during 3.0 => 4.0 update May 18, 2021
@diosmosis
Copy link
Member

Hi @utrautmann, I'm able to reproduce this, it's another bug. We were hoping #17408 would fix these, but I guess not. I've created a new issue here: #17583

@mattab
Copy link
Member

mattab commented May 15, 2023

Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again!

@mattab mattab closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2023
@sgiehl sgiehl added the wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. label May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

7 participants