New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disappearance of segment archive data during 3.0 => 4.0 update #17138
Comments
Maybe it will be fixed by #17029 I wonder @diosmosis ? @OnlineArts any chance you can post the output of your system check which you can find in "Administration -> System Check"? There should be an option to copy the system report (which is then anonymised). This would help us know eg about your report generation (archiving) settings etc. |
@tsteur it could be, depends on the segment definition. @OnlineArts would you be able to provide some of the segments that won't archive? |
@diosmosis I can see in the database entries (x_segment) the definitions are twice URL-encoded like: pageUrl=^https%253A%252F%252Fourdomain.com%252F I assumed that should be I overwrote the definition with once URL-encoded values in the database but it did not change anything. I miss during the archiving process for example the monthly entries for December and November for this segment. sudo -u www-data ./console core:invalidate-report-data --sites=67 --dates=2020-10-31,2021-01-25
sudo -u www-data ./console core:archive --force-idsites=67 --force-date-range=2020-10-31,2021-01-24
@tsteur
|
@OnlineArts this is possibly the same issue as #17029. Would you be able to apply that change, invalidate the segment data and run core:archive? This should archive the segment data. |
Hey, it does not look like that. After changing the files I get an error instead of a graph. During the archive process later one I could not see the segments. But I could observe that even before changing the files and right now, after reverting the changes, too. |
@OnlineArts that's unfortunate, could you show me the error? And could you try invalidating a segment (via the plugin preferably) and running the following SQL query and posting the results:
Then can you load the segment in the UI and check the |
Hi, it looks like that the segments are not considered during the processing. This site id has 5 segments defined. I did several tests and they are separated by "###" in the log.
After a while (2-5 min) the SELECT Statment will lead to an empty result.
The missing visitor graph says (in German): roughly translated: Unfortunately, I can not find any log entries in our Nginx error log file. Could that issue relate to the fact that I'm using a specific site id and date or our setting, that we are deleting old data after 30 days? I assume, that Matomo should be still able to read the pageURL from our SQL archive blobs and numerics since I could do that manually. But still, the data are disappeared after the Matomo 4 upgrade. (please keep in mind that I'm replaced our TLD) |
@OnlineArts can you try again w/ the following command:
(where the segment value is the same as in the query parameter) and w/ the changes in the PR I linked applied?
If the data for the segment exists in the Visits Log, then this would not be the cause. If there is nothing in the visits log for the specific period, then there wouldn't be any data to aggregate again (which you can get back by replaying webserver logs if needed). |
@diosmosis Thank you for your fast response. I think we get closer to the current issue. The visit logs are empty for that dates, but our archives are going back until the year <2010. I was able to read out these compressed archives but it is not comfortable to read the arrays. Is there a non-destructive way to recover these data into Matomo or the visit logs? Usually, the deletion should not be the issue since we segment graphs still contained the relevant numbers. Several months ago I was able to invalidate and re-processing download logs from the past 6 months. Would this not work again with Matomo 4? By the way, I think I have the Matomo database dump before the upgrade to version 4. |
The archives are aggregated using the log data, you can't go backwards from them to log data. Unfortunately, the only way to get them back into Matomo is by replaying webserver logs. You can replay logs of HTTP requests to your website (ie apache/nginx logs) or you can replay logs to your matomo instance (ie, logs of HTTP requests that go to your matomo.php/piwik.php) (this is the better choice as it will include more data). Both require using github.com/matomo-org/log-analytics. Is this something you can do.
If you mean invalidate and re-archive based on data in the log_* tables, then it would only work if the data was still in the log_* data tables. If it's not there because it was deleted, then there's unfortunately nothing to aggregate.
If it contains visit data for the the dates you need segment data for, then you could use it. But since this problem occurred after the upgrade to version 4, I'm guessing it won't have the data you need. |
@diosmosis Does Matomo 4 upgrade change the behavior or default privacy settings? Or does Matomo "re-archived" everything during the upgrade and lost the old deleted visit log/segment logs aggregates? The only possibility I see to restore these data would be to write fake weblogs that contain the full URL from the archive_blob_* tables. This should probably work, right? |
There shouldn't be anything that would result in such a change in settings.
Nothing in the upgrade would have deleted old log data. Only the related PrivacyManager feature should do this. What date did you update your Matomo? And what is the earliest date you are missing segment data? Can you check what the earliest visit_last_action_time value is in the log_visit table? You shouldn't be missing any segment data from dates where you were using Matomo 3.
That's aggregated data, so that wouldn't work. You'd have to guess all the other information, like referrer URL, etc. |
@OnlineArts ok, I understand what's happening now. AFAIK nothing should be deleting old segment reports automatically. You should be able to restore the old blob data from your backup, but if there's something strange going on then you'll keep losing data. Can you post your settings for the "Regularly delete old raw data" and "Delete old aggregated report data" sections? In the meantime I would recommend regularly creating backups (if you're not doing so already). EDIT: and just to be clear, you're missing old data for EVERY segment, correct? |
How could I trace that? By observing the blob data table sizes?
It was set to 30. I disabled that today. I don't think that this loss of older data than 30 days was by chance.
This was and is disabled.
Unfortunately, only for seven days and we realized that one week too late. I will initialize tape backups on a weekly basis.
Yes, on all segments of all our site IDs. I can not swear that this is related to the Matomo update but due to the time frame, I assume it is related to the update. |
Is there a way to identify the Matomo version from a database dump? I would like to setup a Matomo instance from the old dump and see would I could get out of data there. |
It would be in the |
You could monitor it by checking whether segment data from tables were suddenly going missing. For example, queries like I would check the counts before and after each core:archive run. If something is deleting the data automatically, then it would be in a scheduled task, which are run just before core:archive exits. Knowing when the delete happens and looking at the scheduled tasks that ran might help narrow down the cause. |
@diosmosis Since I'm not the only one with this issue would be great to find a solution. Maybe #11394 is related to that too. Could indicating something is going on during the update process. Edit: By the way, I converted the MariaDB database from utf8 to utf8mb4 after the update. |
@OnlineArts the forum issue appears to be a different issue (new segments not triggering archiving of past data, rather than pre-existing past segment data disappearing). I'm definitely not sure why this would happen to your matomo.
One thing that might be worth checking: load the backup into a new database for a new matomo, then update it w/o doing anything else and see if the segments disappear. (Also if you're willing to share the database dump (or a modified version of it w/ less data), you could email it to me at dizzy@innocraft.com and I could try to reproduce locally). |
@diosmosis Now it's getting weird. I could restore our database with Matomo 3.14.1 from the first of January 2021 and see our segment data until the 1st of December 2020. I restore now the database again but this time overwrite the option "delete_logs_enable" to NULL before I'm starting the web page. Let's see. Could it be, that the segment data are falsely affected by this option? Unfortunately, I can not send you the data, due to governmental laws. Our gzipped database dump requires 20 GiB. |
Gotcha!
The most tricky question: is it possible to merge those two databases? the old backup with the current tracking database? They diverse since 32 days and the version 3.14 to 4.1.1 upgrade. So the data loss could happen during the upgrade or by opening the webpage before the upgrade. |
Do you have the output for this core:archive run? If you can replicate the incident and run core:archive w/ -vvv and post or send me the output, that would help me diagnose the issue.
Yes, it's possible, you'd just need to use the archive data for the old backup until the data loss ends. So inserting archive_numeric & archive_blob data in the old backup to the new database if the new database is missing data for the specific idsite/date1/date2/period/name combination. You'd could do something like:
This will essentially make the reports available in the new database. But I wouldn't do this until we know for sure they won't just get deleted again. |
Yes, I will do that. It just takes some hours to load the dump again. |
As far as I know, the archiver of Matomo 3.14.1 does not wipe out the segment data. I looked on the wrong page. On the combined segment page view, it looks like the segment data are missing. But on the single segment page view, they are still present. I created today a fresh instance from the database dump (1. January). With the activated option to delete raw data after 30 days back again and performed a manual Matomo 4.1.1 update. I did not run the archiver before. Probably, that is the reason why I got the last 30 days' visits from 1. December to the 1. January. After the update, the segment data were lost again.
|
@OnlineArts thanks for collecting all this useful information! When you say you updated manually, you mean you ran the SQL and commands yourself instead of using the updater? From the update log, the only SQL that touches the archive tables are the:
SQL statements. The If you are going to manually update again, could you run a couple commands/sql statements at a time, then check whether the count of archives in a table changes? |
@diosmosis
Yes, if you could provide me all statements. Thankfully, the mysql console accepts several inputs at once. Edit: |
Ok, I reproduced the recovery, set delete_logs_enable to "" and delete_logs_older_than to 365 and performed the update. Segments data are again lost. These options did not have any effect on that. This time I would like to check to count the archive_numerics like %done%, before the upgrade, after the upgrade, and the first time opening the page after the update. |
These are available via the web updater and via the |
@OnlineArts just one quick note, in case you're not already doing it for this test, it would be helpful to make sure the core:archive cron job is not enabled during the upgrade, then check for the segment data while it is still disabled. |
@diosmosis Noticeable, the last statements failed:
I'm not sure from where these indexes are coming from but I'm pretty sure that there were added automatically or through a guide or an official recommendation. |
Hi @OnlineArts, did you manage to find out which sql statement in the update caused the data to disappear? |
@diosmosis Today I was able to apply all the SQL updates commands (3.14.1->4.1.1) without executing the console script. It looks like the segment data are still present but in for some segments I received: I'll retry to apply all SQL changes at once without checking in between the graphs, maybe the look-up influences the procedure. I update the database drive that should accelerate the executions. Edit: Obviously the graph was somewhere cached / preprocessed. Running all sql commands will lead now directly to aforementioned error message. How to avoid this? |
Hi @OnlineArts, |
I have the same problem that the historical data of segments are not rehashed in Matomo 4.2.1. The segment definition is:
Or a bit simpler:
We are only shown data after the segment was created. |
Hi @dieisraels, what version of Matomo are you using? And what precisely do you mean by Also note, that the parameter supplied in core:invalidate-report-data has to be encoded the exact same way it appears in the URL when viewing the segment data from the UI. The value you're using is missing a urlencode. The easiest way to get it would be to load the segment in the UI and copy the segment parameter value from the URL. |
Thank you for pointing this out. The missing URL encoding was a problem. And the archiving process was partially aborted because the DB was restarted!?! I need to get it checked why this is the case. Should I assume that the reports are incomplete if the archiving process was aborted? Or will incorrect reports be recreated during the next archiving process? |
When archiving finishes for a period, it's marked as done, if it doesn't finish this doesn't happen, and it should be processed again.
Can you tell me if you specified a website to invalidate or was the first selector empty when you invalidated? |
Then I wonder why the historical data was never prepared after creating the segments? Only after invalidating via the console this seems to work.
Yes, the website was selected in the dropdown. I had only removed it because I did not want to display the domain. |
We added the automatic invalidation of historical data for new/updated segments in 4.0, so it should've had an effect. I'm not sure why it wouldn't, but if you create a new segment or edit an existing one, new entries should be added to the archive_invalidations table to schedule archiving of historical data. It's possible there was something causing the data to be masked in some way. If this happens again after 4.3.0 is released, it might be worth looking into again (there were several fixes around this area).
Thanks, I'll see if I can reproduce this. |
@diosmosis After updating to Matomo 4.3, the invalidation of segments works correctly again. Also with the plugin via the UI. |
What that means? Does every new segment have to be invalidated manually? Here are some examples to the current behaviour: If I create a new segment with Matomo 4.2.1 or 4.3 and the segment criteria contains any encoded string like
then the archiving process is only working for the segment for today and yesterday. All days are only archived if I manually invalidate this segment (with the plugin). But that's just a workaround, as described here, isn't it? If the new segment did not contains any encoded string like this:
|
Hi @utrautmann, I'm able to reproduce this, it's another bug. We were hoping #17408 would fix these, but I guess not. I've created a new issue here: #17583 |
Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again! |
As mentioned in the Matomo forum, some users are not able to display historical segment data after the Matomo 4 upgrade.
We could observe, that after the Matomo 4 upgrade historical data (older than one month) disappeared from segmentation sites. By calling a segmentation overview we get the historical data from exactly one month. For example, opening the segment #1 page on the 2. January leads that all historical data till the 2. December appeared on this page. Opening the segment #2 page on the 15. January -> 15. December.
Archive invalidation and manual re-processing does not affect the appearance of historical data. This issue appears in at least Matomo 4.10 and 4.11. No obvious related config parameters changed that behavior.
Could that be related to the options from #17129 ? The segmentation in that side id is set to SEGMENTED REPORTS ARE PROCESSED IN REAL TIME (DEFAULT)
Edit:
Ok, the change of the mentioned option to SEGMENTED REPORTS ARE PRE-PROCESSED (FASTER, REQUIRES CRON)" results only that the segments now listed during the ./console core:archive execution. But still did not add historical data. As I see, it is still missing during the processing for the incomplete month of December.
Could that be in some kind related to the utf8 to utf8mb4 conversion that I performed after the upgrade?
The text was updated successfully, but these errors were encountered: