@tpimpao opened this Issue on February 18th 2020

Hi there,
I have struggled with a problem that occurs at the peak of the day's highest load, causing a high CPU load on the database server.

My database server:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Model name:            Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
Stepping:              7
CPU MHz:               2200.000
CPU max MHz:           2200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4389.13
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31
MemTotal:       65764440 kB
MemFree:         4493980 kB
MemAvailable:    4999644 kB
Buffers:              16 kB
Cached:           539544 kB
SwapCached:      2550824 kB
Active:         52374964 kB
Inactive:        7595748 kB
Active(anon):   51917680 kB
Inactive(anon):  7528420 kB
Active(file):     457284 kB
Inactive(file):    67328 kB
Unevictable:        8648 kB
Mlocked:            8648 kB
SwapTotal:      16777212 kB
SwapFree:       12554256 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:      58823428 kB
Mapped:            30216 kB
Shmem:              8600 kB
Slab:             630112 kB
SReclaimable:     474008 kB
SUnreclaim:       156104 kB
KernelStack:       13600 kB
PageTables:       128872 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    49659432 kB
Committed_AS:   58100260 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      425568 kB
VmallocChunk:   34325395452 kB
HardwareCorrupted:     0 kB
AnonHugePages:  13486080 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      226988 kB
DirectMap2M:     8126464 kB
DirectMap1G:    58720256 kB

I have about 20 million page views per month, so, my dedicated database server are complaint with Matomo requirements.

I'm using MariaDB 10.4.10 and this is my.cnf file

[mysqld]
datadir=/DB/mysql
socket=/DB/mysql/mysql.sock
tmpdir=/TEMPDB
bind-address=0.0.0.0
general_log_file=/var/log/mariadb/mariadb_acess.log
long_query_time = 10
slow_query_log
slow_query_log_file =/var/log/mariadb/slow-query.log
log_error=/var/log/mariadb/mariadb.err
innodb_flush_method=O_DIRECT
innodb_file_per_table=ON
innodb_flush_log_at_trx_commit = 2
innodb_log_group_home_dir = /TEMPDB

innodb_buffer_pool_size=51G
innodb_buffer_pool_instances=51
innodb_print_all_deadlocks=1
performance_schema=ON
max_allowed_packet=268435456
max_connections = 400
wait_timeout = 14400
interactive_timeout = 14400

[client]
port=3306
socket=/DB/mysql/mysql.sock

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

In the highest peak of the day i see that MySQL get allot slows querys similar to below:

# User<a class='mention' href='https://github.com/Host'>@Host</a>: matomo[matomo] @ server1 [xxx.xxx.xxx.xxx]
# Thread_id: 2001714  Schema: piwik  QC_hit: No
# Query_time: 16.680956  Lock_time: 0.000157  Rows_sent: 1  Rows_examined: 206832
# Rows_affected: 0  Bytes_sent: 5354
SET timestamp=1582018281;
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, idsite, visit_entry_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_url, config_device_brand, config_device_model, config_device_type, visit_total_events, visit_total_time, location_ip, location_browser_lang, campaign_content, campaign_id, campaign_keyword, campaign_medium, campaign_name, campaign_source, last_idlink_va, custom_dimension_1, custom_dimension_2, custom_dimension_3, custom_dimension_4, custom_dimension_5, custom_var_k1, custom_var_v1, custom_var_k2, custom_var_v2, custom_var_k3, custom_var_v3, custom_var_k4, custom_var_v4, custom_var_k5, custom_var_v5   FROM piwik_log_visit WHERE idsite = '23' AND visit_last_action_time <= '2020-02-18 10:01:04' AND idvisitor = 'S><FF><DF>J\n-<93>'
                ORDER BY visit_last
_action_time DESC
                LIMIT 1;

I don't have any problems running this query's directly, but i have a lot of them in the state "Sending Data" when i run the SHOW FULL PROCESSLIST;

High Load TOP

top - 09:46:47 up 331 days, 6:56, 1 user, load average: 42.14, 43.14, 42.75
Tasks: 485 total, 3 running, 482 sleeping, 0 stopped, 0 zombie
%Cpu(s): 90.6 us, 0.9 sy, 0.0 ni, 8.5 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 65764440 total, 4386816 free, 60103412 used, 1274212 buff/cache
KiB Swap: 16777212 total, 12578576 free, 4198636 used. 4996896 avail Mem

Hi attach the "innodb status" file.

What i'm missing?

innodb-status.txt

Thanks!
TP

@tsteur commented on February 18th 2020 Member

image

Make sure this index exists in the log_visit table.

This should be fast and sending data should be fast. This sounds more like an issue with your DB server and maybe something for the forum: https://forum.matomo.org/

Maybe there are some visitors with heaps of visits? But then the query should be still fast. Maybe try something like

select idsite, idvisitor, count(*) from odd_log_visit group by idsite, idvisitor having count(*) > 50 order by count(*) desc limit 10. If you have a lot of data this query might take a bit of time.

@tpimpao commented on February 19th 2020

image

Make sure this index exists in the log_visit table.

This should be fast and sending data should be fast. This sounds more like an issue with your DB server and maybe something for the forum: https://forum.matomo.org/

Maybe there are some visitors with heaps of visits? But then the query should be still fast. Maybe try something like

select idsite, idvisitor, count(*) from odd_log_visit group by idsite, idvisitor having count(*) > 50 order by count(*) desc limit 10. If you have a lot of data this query might take a bit of time.

Hi @tsteur, thanks for your time.

Below the indexes that are in log_visit table.
image

Below is the output of the query (i change it to a limit 20).

image

It take 122.844 sec to end.

Thanks!
TP

@tsteur commented on February 19th 2020 Member

@tpimpao This query should not take that long even if it had to look at 1000 visitor rows. I wonder if MySQL maybe picks the wrong index and uses the idsite,visit_last_action_time index for some of your visitors. How often does this generally happen and how many do you see there typically in the process list?

Are you familiar with PHP? Could you try adding a new line here https://github.com/matomo-org/matomo/blob/3.13.2/core/Tracker/Model.php#L421

like this:

image

$from .= ' USE INDEX (index_idsite_idvisitor) ';"

@tpimpao commented on February 20th 2020

Hi @tsteur :1st_place_medal: , very good news!! :+1:
After forcing the index (as you suggested) the CPU load average decrease to 10 and i'm getting about
500 selects /s, 300 updates /s and 100 inserts /s.

But, why this happen? There are any good reason for MySQL picking the wrong index?

Thanks,
TP

@tpimpao commented on February 20th 2020

Hi again @tsteur ,
MySQL appears to use the right index when (USE INDEX) is not forced in the query.

EXPLAIN
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, idsite, visit_entry_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_url, config_device_brand, config_device_model, config_device_type, visit_total_events, visit_total_time, location_ip, location_browser_lang, campaign_content, campaign_id, campaign_keyword, campaign_medium, campaign_name, campaign_source, last_idlink_va, custom_dimension_1, custom_dimension_2, custom_dimension_3, custom_dimension_4, custom_dimension_5, custom_var_k1, custom_var_v1, custom_var_k2, custom_var_v2, custom_var_k3, custom_var_v3, custom_var_k4, custom_var_v4, custom_var_k5, custom_var_v5   FROM piwik_log_visit WHERE idsite = '23' AND visit_last_action_time <= '2020-02-18 09:17:38' AND idvisitor = '??\r?8?m'
                ORDER BY visit_last_action_time DESC
                LIMIT 1

image

Forcing the index in the query:

EXPLAIN
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, idsite, visit_entry_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_url, config_device_brand, config_device_model, config_device_type, visit_total_events, visit_total_time, location_ip, location_browser_lang, campaign_content, campaign_id, campaign_keyword, campaign_medium, campaign_name, campaign_source, last_idlink_va, custom_dimension_1, custom_dimension_2, custom_dimension_3, custom_dimension_4, custom_dimension_5, custom_var_k1, custom_var_v1, custom_var_k2, custom_var_v2, custom_var_k3, custom_var_v3, custom_var_k4, custom_var_v4, custom_var_k5, custom_var_v5   FROM piwik_log_visit USE INDEX (index_idsite_idvisitor)  WHERE idsite = '23' AND visit_last_action_time <= '2020-02-20 15:22:54' AND idvisitor = 'M?9v??'
                ORDER BY visit_last_action_time DESC
                LIMIT 1

image

Thanks,
TP

@tsteur commented on February 20th 2020 Member

Awesome, that's some performance improvement we will be interested in ourselves.

The problem is that when an index doesn't exist, we will get an error Key 'index_irrrdsite_idvisitor' doesn't exist in table 'log_visit'.

So we will need to check if the index exists and only force it then. We would cache this eg in tracker cache general to not needing to query the DB all the time. We could specifically check if the index with name index_idsite_idvisitor exists and if so, force the usage.

Cheers @tpimpao

fyi @mattab

@tpimpao commented on February 21st 2020

Great! At this time i'm very happy with the performance.
Just for retain below the DB load in the last week and after forcing the index (20/02/2020 09:30PM).

image

image

Cheers @tsteur

TP

This Issue was closed on March 15th 2020
Powered by GitHub Issue Mirror