@Starker3 opened this Issue on December 20th 2021 Contributor

While troubleshooting an issue with segment archiving, I noticed that "Year" periods for newly created or recently edited segments take extremely long to process when compared to the time it takes the "Month" periods to process.
(A few seconds for "Month" periods vs almost 30 minutes for "Year" period)

In this example I created a simple test segment where I set the condition to Action Type is pageviews with process_new_segments_from = "editLast93" in the config file.
In the core:archive output it shows that the correct dates are being processed and everything seems fine for the month periods:

INFO [2021-12-19 22:16:16] 618228  Archived website id 1, period = month, date = 2021-12-01, segment = 'actionType==pageviews', 16 visits found. Time elapsed: 7.455s
INFO [2021-12-19 22:16:29] 618228  Archived website id 1, period = month, date = 2021-11-01, segment = 'actionType==pageviews', 25 visits found. Time elapsed: 1.291s
INFO [2021-12-19 22:16:41] 618228  Archived website id 1, period = month, date = 2021-10-01, segment = 'actionType==pageviews', 22 visits found. Time elapsed: 1.547s

But when it comes to the "Year" period it takes an exceptionally long time when compared to the "Month" periods which all took only a few seconds:

INFO [2021-12-19 22:24:20] 618228  Archived website id 1, period = year, date = 2021-01-01, segment = 'actionType==pageviews', 63 visits found. Time elapsed: 417.794s

While trying to reproduce the issue I I noticed that it didn't happen when creating a segment with the same conditions as previously processed (I.e. a duplicate segment), but creating a new segment with a different definition results in the same issue when using editLastN or lastN in the config file.

Here are two archive logs, first with process_new_segments_from = "editLast93" in the config.ini.php
editLast93-archive.log

INFO [2021-12-19 23:31:16] 684152  Archived website id 1, period = year, date = 2021-01-01, segment = 'actionType!=contents', 88 visits found. Time elapsed: 1826.489s

Second with process_new_segments_from = "last93" in the config.ini.php
last93-archive.log

INFO [2021-12-20 00:05:28] 777211  Archived website id 1, period = year, date = 2021-01-01, segment = 'actionType!=events', 75 visits found. Time elapsed: 1799.830s

Both of the above archive runs took only a few seconds to process the "Month" periods for these newly created segments

  • Matomo Version: 4.6.2
  • PHP Version: 7.3.22
@Starker3 commented on December 20th 2021 Contributor

For more info:
browser archiving is disabled and enforced, segments are pre-processed only:

browser_archiving_disabled_enforce = 1
enable_create_realtime_segments = 0
enable_browser_archiving_triggering = 0
process_new_segments_from = "last93"
@bx80 commented on December 20th 2021 Contributor

@Starker3 I've done a few tests, but haven't managed to recreate this. Any chance of seeing the console command you used for archiving? Did you have development mode enabled for the tests?

@Starker3 commented on December 20th 2021 Contributor

@bx80 I just used ./console core:archive for testing initially.

The archive logs attached were from ./console core:archive --force-idsites=Y --force-idsegments=XX

I did some testing with a different Site ID that has generated visit data and did not see the issue, so I was thinking that it might be specific to sites that have visits over a long period of time, eg. at least one visit for each day over the last year?

Otherwise I can setup an account for you on the Matomo site and setup SSH access if that might help.

@Starker3 commented on December 21st 2021 Contributor

I ran the archiving queries again to get debug information, all of the previous queries were run as follows:
./console core:archive --force-idsites=1 --force-idsegments=24 -vvv

This is what shows in the running processes when its processing the year period:

3395370 innocra+  20   0  916816 736680  20376 R  89.7   2.3   1:38.09 /usr/php/7.3.22/bin/php -q /home/matomo4/console climulti:request -q --matomo-domain= --superuser module=API&method=CoreAdminHome.archiveReports&idSite=1&period=year&date=2021-01-01&format=json&segment=actionType%21%3Ddownloads&trigger=archivephp                                                

3395369 innocra+  20   0    2392    752    684 S   0.0   0.0   0:00.00 sh -c /usr/php/7.3.22/bin/php -q  /home/matomo4/console climulti:request -q --matomo-domain='' --superuser 'module=API&method=CoreAdminHome.archiveReports&idSite=1&period=year&date=2021-01-01&format=json&segment=actionType%21%3Ddownloads&trigger=archivephp' 2>&1       

last93-archive-debug.log
The debug log only shows the year archiving query and doesn't indicate any subqueries being run from what I can see in the logs.

I enabled development mode like you mentioned to see if that would get us more info.

This is the archive run with debug output but it seems the output file is too large to upload to GitHub in a comment.
It also just shows the last query run as the year period, so it's probably not much help but also shows the Avoiding segment subquery due to missing start date and/or an end date. Please ensure a start date and/or end date is set when initializing a segment if it's used to build a query.

I managed to grab the following query close to the end of the archiving with development mode enabled:

| 46228983 | 217102_matomo4 | 2a00:b6e0:1:200:177::1:58656 | innocraft-staff-richard_matomo4 | Query   |    0 | Sending data | SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as nb_visits, arc3.value as nb_visits_converted
                     FROM matomo_archive_numeric_2021_03 arc1
                     LEFT JOIN matomo_archive_numeric_2021_03 arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = 'nb_visits')
                     LEFT JOIN matomo_archive_numeric_2021_03 arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = 'nb_visits_converted')
                     WHERE arc1.idsite = '1'
                         AND arc1.date1 = '2021-03-23'
                         AND arc1.date2 = '2021-03-23'
                         AND arc1.period = '1'
                         AND (((arc1.name IN ('done7257a261717bbab525cefd921ebc025e','done7257a261717bbab525cefd921ebc025e.VisitsSummary'))))

                         AND arc1.ts_archived IS NOT NULL
                     ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC |    0.000 |

So I performed the archiving tests again to see if I could grab more of the queries that were being executed sort of in the middle of the year archive:

SELECT idarchive FROM `matomo_archive_numeric_2021_04` WHERE idsite = '1' AND date1 = '2021-04-25 00:00:00' AND date2 = '2021-04-25 00:00:00' AND period = '1' AND name = 'done5a0a1c3642cbe2656666e798c12d6e8e.VisitsSummary' AND ts_archived < '2021-12-21 01:33:08' AND idarchive < '472701'
SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as nb_visits, arc3.value as nb_visits_converted
                     FROM matomo_archive_numeric_2021_03 arc1
                     LEFT JOIN matomo_archive_numeric_2021_03 arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = 'nb_visits')
                     LEFT JOIN matomo_archive_numeric_2021_03 arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = 'nb_visits_converted')
                     WHERE arc1.idsite = '1'
                         AND arc1.date1 = '2021-03-28'
                         AND arc1.date2 = '2021-03-28'
                         AND arc1.period = '1'
                         AND (((arc1.name IN ('done5a0a1c3642cbe2656666e798c12d6e8e','done5a0a1c3642cbe2656666e798c12d6e8e.VisitsSummary','done5a0a1c3642cbe2656666e798c12d6e8e.Goals'))))

                         AND arc1.ts_archived IS NOT NULL
                     ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC
SELECT  /*+ MAX_EXECUTION_TIME(7200000) */  /* segmenthash 4d2c67b1324c54f984faa692d330a988 */ /* sites 1 */ /* 2021-12-09,2021-12-09 */ /* Core */ /* trigger = CronArchive */
                count(distinct log_visit.idvisitor) AS `1`, 
            count(*) AS `2`, 
            sum(log_visit.visit_total_actions) AS `3`, 
            max(log_visit.visit_total_actions) AS `4`, 
            sum(log_visit.visit_total_time) AS `5`, 
            sum(case log_visit.visit_total_actions when 1 then 1 when 0 then 1 else 0 end) AS `6`, 
            sum(case log_visit.visit_goal_converted when 1 then 1 else 0 end) AS `7`, 
            count(distinct log_visit.user_id) AS `39`
            FROM
                matomo_logtmpsegmentf2f33469ee951a0648dc8cd9cbbc13fe AS logtmpsegmentf2f33469ee951a0648dc8cd9cbbc13fe INNER JOIN matomo_log_visit AS log_visit ON log_visit.idvisit = logtmpsegmentf2f33469ee951a0648dc8cd9cbbc13fe.idvisit
SELECT idarchive FROM `matomo_archive_numeric_2021_03` WHERE idsite = '1' AND date1 = '2021-03-01 00:00:00' AND date2 = '2021-03-01 00:00:00' AND period = '1' AND name = 'donea4f39d10edb6487cdfe714b3b5ce6586.VisitsSummary' AND ts_archived < '2021-12-21 01:35:15' AND idarchive < '2634003'
SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as nb_visits, arc3.value as nb_visits_converted
                     FROM matomo_archive_numeric_2021_03 arc1
                     LEFT JOIN matomo_archive_numeric_2021_03 arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = 'nb_visits')
                     LEFT JOIN matomo_archive_numeric_2021_03 arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = 'nb_visits_converted')
                     WHERE arc1.idsite = '1'
                         AND arc1.date1 = '2021-03-16'
                         AND arc1.date2 = '2021-03-16'
                         AND arc1.period = '1'
                         AND (((arc1.name IN ('donea4f39d10edb6487cdfe714b3b5ce6586','donea4f39d10edb6487cdfe714b3b5ce6586.VisitsSummary','donea4f39d10edb6487cdfe714b3b5ce6586.Goals'))))

                         AND arc1.ts_archived IS NOT NULL
                     ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC
SELECT  /*+ MAX_EXECUTION_TIME(7200000) */  /* WP IGNORE ERROR */ 1 FROM matomo_logtmpsegmentcfdf25885284b3d1f16c08f1ed96383e LIMIT 1
SELECT arc1.idarchive, arc1.value, arc1.name, arc1.ts_archived, arc1.date1 as startDate, arc2.value as nb_visits, arc3.value as nb_visits_converted
                     FROM matomo_archive_numeric_2021_05 arc1
                     LEFT JOIN matomo_archive_numeric_2021_05 arc2 on arc2.idarchive = arc1.idarchive and (arc2.name = 'nb_visits')
                     LEFT JOIN matomo_archive_numeric_2021_05 arc3 on arc3.idarchive = arc1.idarchive and (arc3.name = 'nb_visits_converted')
                     WHERE arc1.idsite = '1'
                         AND arc1.date1 = '2021-05-17'
                         AND arc1.date2 = '2021-05-17'
                         AND arc1.period = '1'
                         AND (((arc1.name IN ('donea4f39d10edb6487cdfe714b3b5ce6586','donea4f39d10edb6487cdfe714b3b5ce6586.VisitsSummary','donea4f39d10edb6487cdfe714b3b5ce6586.Goals'))))

                         AND arc1.ts_archived IS NOT NULL
                     ORDER BY arc1.ts_archived DESC, arc1.idarchive DESC |    0.000 |
| 46247939 | 217102_matomo4 | 2a00:b6e0:1:200:177::1:60972 | innocraft-staff-richard_matomo4
SELECT value, name, idsite, date1, date2, ts_archived
                                FROM matomo_archive_numeric_2021_06
                                WHERE idarchive IN (126295,126299,126299,126299,126299,126299,126299)
                                  AND name IN ('Goal_1_nb_conversions','Goal_1_nb_visits_converted','Goal_1_revenue','Goal_2_nb_conversions','Goal_2_nb_visits_converted','Goal_2_revenue','Goal_3_nb_conversions','Goal_3_nb_visits_converted','Goal_3_revenue','Goal_4_nb_conversions','Goal_4_nb_visits_converted','Goal_4_revenue','Goal_5_nb_conversions','Goal_5_nb_visits_converted','Goal_5_revenue','Goal_6_nb_conversions','Goal_6_nb_visits_converted','Goal_6_revenue','Goal_0_nb_conversions','Goal_0_nb_visits_converted','Goal_0_revenue','Goal_0_revenue_subtotal','Goal_0_revenue_tax','Goal_0_revenue_shipping','Goal_0_revenue_discount','Goal_0_items','Goal_-1_nb_conversions','Goal_-1_nb_visits_converted','Goal_-1_revenue','Goal_-1_items','Goal_nb_conversions','Goal_nb_visits_converted','Goal_revenue')
                             ORDER BY ts_archived ASC
INSERT IGNORE INTO matomo_archive_numeric_2021_08 (idarchive, idsite, date1, date2, period, ts_archived, name, value)
                  VALUES ('160946','1','2021-08-24','2021-08-24','1','2021-12-21 01:37:18','donea884fac1e18fd49f7ed61a0377add362.VisitsSummary','2') ON DUPLICATE KEY UPDATE value = '2'
SELECT idarchive FROM `matomo_archive_numeric_2021_04` WHERE idsite = '1' AND date1 = '2021-04-12 00:00:00' AND date2 = '2021-04-12 00:00:00' AND period = '1' AND name = 'doneed03f388d2845497ed599b584f9c55c0.VisitsSummary' AND ts_archived < '2021-12-21 01:38:44' AND idarchive < '473019'
INSERT IGNORE INTO matomo_archive_numeric_2021_08 (idarchive, idsite, date1, date2, period, ts_archived, name, value)
                  VALUES ('161066','1','2021-08-16','2021-08-22','2','2021-12-21 01:39:02','doneed03f388d2845497ed599b584f9c55c0.VisitsSummary','2') ON DUPLICATE KEY UPDATE value = '2'
@tsteur commented on December 21st 2021 Member

btw you will also want to paste any other config ini setting in here, especially from the General section. If there's anything in Debug section that be good to know too

@Starker3 commented on December 21st 2021 Contributor

Sure @tsteur Here is the [General] section:

[General]
release_channel = "latest_beta"
action_title_category_delimiter = "/"
browser_archiving_disabled_enforce = 1
enable_create_realtime_segments = 0
process_new_segments_from = "last93"
enable_browser_archiving_triggering = 0
archiving_custom_ranges[] = "last90"
force_ssl = 1
login_allow_logme = 1
enable_plugin_upload = 1
rearchive_reports_in_past_last_n_months = "last0"
salt = "..."
trusted_hosts[] = "..."
instance_id = "..."

I don't have a [Debug] section in my config.ini.php

@Starker3 commented on February 9th 2022 Contributor

FYI this is still happening on 4.7.1 for newly created segments
image

Powered by GitHub Issue Mirror