MySQL crashes when doing backups

Mar 7, 2016
20
1
3
Bristol
cPanel Access Level
Root Administrator
Hi,

My server has around 15 sites on it, 8gb of memory and only a handful of users at any one time. Yet every time it does backups, it seems to kill MySQL.

I can never restart mysql, I have to go into terminal and stop it dead entirely before it will come back up, as well as kill all backup processes 'pkgacct'.

The backups are not huge, and there is plenty of space on the disk.

This happens more and more often and now it is happening every time backups run.

This is the 'failed' error cPanel sends when I kill mysql:

Code:
Cpanel::Exception::Timeout/(XID uhdf6d) The system failed to lock the file “/var/run/restartsrv_mysql” after 197 seconds.
at /usr/local/cpanel/Cpanel/Exception/CORE.pm line 336.
Cpanel::Exception::create("Timeout", "The system failed to lock the file \x{e2}\x{80}\x{9c}[_1]\x{e2}\x{80}\x{9d} after [quant,_2"..., ARRAY(0x247b700)) called at /usr/local/cpanel/Cpanel/Exception.pm line 61
Cpanel::Exception::__ANON__(__CPANEL_HIDDEN__, __CPANEL_HIDDEN__..., ARRAY(0x247b700)) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 729
Cpanel::SafeFile::_timeout_exception("/var/run/restartsrv_mysql", 197) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 761
Cpanel::SafeFile::_die_if_file_is_flocked_cuz_already_waited_a_while("/var/run/restartsrv_mysql", 197) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 829
Cpanel::SafeFile::_lock_wait("/var/run/restartsrv_mysql") called at /usr/local/cpanel/Cpanel/SafeFile.pm line 355
Cpanel::SafeFile::_safelock("/var/run/restartsrv_mysql") called at /usr/local/cpanel/Cpanel/SafeFile.pm line 558
Cpanel::SafeFile::_safe_open(undef, ">>", "/var/run/restartsrv_mysql", CODE(0xb8ee48), "safeopen") called at /usr/local/cpanel/Cpanel/SafeFile.pm line 115
Cpanel::SafeFile::safeopen(undef, ">>", "/var/run/restartsrv_mysql") called at /usr/local/cpanel/Cpanel/RestartSrv/Lock.pm line 25
Cpanel::RestartSrv::Lock::new("Cpanel::RestartSrv::Lock", "mysql") called at /usr/local/cpanel/Cpanel/ServiceManager/Base.pm line 183
Cpanel::ServiceManager::Base::lock(Cpanel::ServiceManager::Services::Mysql=HASH(0x2137ae8)) called at /usr/local/cpanel/Cpanel/ServiceManager/Base.pm line 459
Cpanel::ServiceManager::Base::BUILD(Cpanel::ServiceManager::Services::Mysql=HASH(0x2137ae8), HASH(0xbac398)) called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/x86_64-linux-64int/Cpanel/Class/Object.pm line 49
Cpanel::Class::Object::BUILDALL(Cpanel::ServiceManager::Services::Mysql=HASH(0x2137ae8), HASH(0xbac398)) called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/x86_64-linux-64int/Cpanel/Class/Object.pm line 41
Cpanel::Class::Object::new("Cpanel::ServiceManager::Services::Mysql", "service", "mysql") called at /usr/local/cpanel/Cpanel/ServiceManager.pm line 99
Cpanel::ServiceManager::new("Cpanel::ServiceManager", "service", "mysql") called at bin/restartsrv_base.pl line 79
scripts::restartsrv_base::__ANON__() called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/Try/Tiny.pm line 99
eval {...} called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/Try/Tiny.pm line 90
Try::Tiny::try(CODE(0xb6f018), Try::Tiny::Catch=REF(0xb43fb0)) called at bin/restartsrv_base.pl line 125
scripts::restartsrv_base::run("--check", "--notconfigured-ok") called at bin/restartsrv_base.pl line 33

The subprocess “/usr/local/cpanel/scripts/restartsrv_mysql” reported error number 255 when it ended.
I have had to disable backups as it seems to be the cause based on the times and the fact they seem to be running whenever this happens.

When MySQL hits this issue, it just locks up and no sites are able to connect to it, sending all of my clients sites down.

There are no other heavy use processes going on, MySQL is at the top using 5% CPU and 3.4gb of memory - nothing the server cannot handle.

Why do the backups keep killing my databases?

When it restarts, almost every database the backups touch have tables which need repairing - not uncommon, but when mysql has been up for only 2 days since it last crashed, I wouldn't expect that.

My clients are very concerned and I need to know why backups are doing this - or why something else in cPanel is causing this.

Datacenter checked hardware and it is all ok.
 
Last edited by a moderator:
Mar 7, 2016
20
1
3
Bristol
cPanel Access Level
Root Administrator
Can't edit, but here is the 'back up process' failures reports:
Code:
pixelhero (XID upvwwc) “/usr/local/cpanel/bin/pkgacct” reported error code “1” when it ended:
northantsdecorat (XID feaqud) “/usr/local/cpanel/bin/pkgacct” reported error code “1” when it ended:
thatlittleagency (XID 9crkwa) “/usr/local/cpanel/bin/pkgacct” reported error code “1” when it ended:
recruitmentevent (XID reb9bx) “/usr/local/cpanel/bin/pkgacct” reported error code “1” when it ended:
I killed the process after this last one as it was when my clients noticed their sites were down.
Code:
[2018-07-16 16:56:05 +0100] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
warn [pkgacct] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
[2018-07-16 16:56:05 +0100] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
warn [pkgacct] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
[2018-07-16 16:56:05 +0100] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
warn [pkgacct] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
That database exists, I assume by this point MySQL had crashed
 
Last edited by a moderator:

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,309
363
Houston
Hi @Richard Edwards

Is there anything notable logged to the mysql error logs? The location of this is noted in /etc/my.cnf under log-error for example mine:
Code:
[root@server mysql]# grep log-error /etc/my.cnf
log-error=/var/log/mysqld.log
It may also be helpful to check
Code:
/var/log/messages
for any relevant output during the time of the backups as well.

Thanks!
 
Mar 7, 2016
20
1
3
Bristol
cPanel Access Level
Root Administrator
Hi,

So with backups turned off, this is still happening. Still usually in the morning when I used to run backups but must just be a co-incidence.

It died again at 7:40am, here are the logs. The site 'carousel' is a wordpress site, the user who viewed it was the client so nothing unusual.


Code:
2018-07-18  7:04:33 140575962253056 [Warning] Aborted connection 44677 to db: 'carousellights_wp' user: 'carousellights_wp' host: 'localhost' (Got an error reading communication packets)
180718  7:16:05 [ERROR] mysqld got signal 7 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.2.16-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=130
max_threads=153
thread_count=124
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467246 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fda3401bd68
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fd96bfb3d30 thread_stack 0x49000
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Uses event mutexes
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Compressed tables use zlib 1.2.7
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Using Linux native AIO
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Number of pools: 1
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Using SSE2 crc32 instructions
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Completed initialization of buffer pool
2018-07-18  7:35:08 139627076548352 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Highest supported file format is Barracuda.
2018-07-18  7:35:08 139627844749504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=217525724982
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the filefull; Please wait ...
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: Waiting for purge to start
2018-07-18  7:40:03 139627844749504 [Note] InnoDB: 5.7.22 started; log sequence number 217525724991
2018-07-18  7:40:03 139626434000640 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-07-18  7:40:03 139627844749504 [Note] Plugin 'FEEDBACK' is disabled.
2018-07-18  7:40:03 139627844749504 [Note] Recovering after a crash using tc.log
2018-07-18  7:40:03 139627844749504 [Note] Starting crash recovery...
2018-07-18  7:40:03 139627844749504 [Note] Crash recovery finished.
2018-07-18  7:40:03 139627844749504 [Note] Server socket created on IP: '::'.
2018-07-18  7:40:03 139627710367488 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
2018-07-18  7:40:06 139627844749504 [Note] Reading of all Master_info entries succeded
2018-07-18  7:40:06 139627844749504 [Note] Added new Master_info '' to hash table
2018-07-18  7:40:06 139627844749504 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.2.16-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2018-07-18  7:40:06 139627710367488 [Note] Event Scheduler: Purging the queue. 0 events
2018-07-18  7:40:06 139626954086144 [Note] InnoDB: FTS optimize thread exiting.
2018-07-18  7:40:06 139627710367488 [Note] InnoDB: Starting shutdown...
2018-07-18  7:40:06 139626434000640 [Note] InnoDB: Buffer pool(s) load completed at 180718  7:40:06
2018-07-18  7:40:06 139626434000640 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
2018-07-18  7:40:06 139626434000640 [Note] InnoDB: Instance 0, restricted to 2048 pages due to innodb_buf_pool_dump_pct=25
2018-07-18  7:40:06 139626434000640 [Note] InnoDB: Buffer pool(s) dump completed at 180718  7:40:06
2018-07-18  7:40:07 139627710367488 [Note] InnoDB: Shutdown completed; log sequence number 217525725028
2018-07-18  7:40:07 139627710367488 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2018-07-18  7:40:07 139627710367488 [Note] /usr/sbin/mysqld: Shutdown complete

2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Uses event mutexes
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Compressed tables use zlib 1.2.7
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Using Linux native AIO
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Number of pools: 1
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Using SSE2 crc32 instructions
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Completed initialization of buffer pool
2018-07-18  7:40:08 140499609212672 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-07-18  7:40:08 140500380170432 [Note] InnoDB: Highest supported file format is Barracuda.
2018-07-18  7:40:09 140500380170432 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-07-18  7:40:09 140500380170432 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-07-18  7:40:09 140500380170432 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the filefull; Please wait ...
2018-07-18  7:40:09 140500380170432 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-07-18  7:40:09 140500380170432 [Note] InnoDB: Waiting for purge to start
2018-07-18  7:40:09 140500380170432 [Note] InnoDB: 5.7.22 started; log sequence number 217525725028
2018-07-18  7:40:09 140499302209280 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-07-18  7:40:09 140500380170432 [Note] Plugin 'FEEDBACK' is disabled.
2018-07-18  7:40:09 140499302209280 [Note] InnoDB: Buffer pool(s) load completed at 180718  7:40:09
2018-07-18  7:40:09 140500380170432 [Note] Server socket created on IP: '::'.
2018-07-18  7:40:09 140500380170432 [Note] Reading of all Master_info entries succeded
2018-07-18  7:40:09 140500380170432 [Note] Added new Master_info '' to hash table
2018-07-18  7:40:09 140500380170432 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.2.16-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2018-07-18  7:40:15 140499926767360 [Note] Detected table cache mutex contention at instance 1: 54% waits. Additionaltable cache instance activated. Number of instances after activation: 2.

This is the messages log around the time mysql got signal 7 at 7:16:
Code:
Jul 18 07:15:32 server kernel: INFO: task php-cgi:21973 blocked for more than 120 seconds.
Jul 18 07:15:32 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 07:15:32 server kernel: php-cgi         D ffff88020e6efe40     0 21973  16287 0x00000080
Jul 18 07:15:32 server kernel: ffff880157727b20 0000000000000082 ffff88001b1bdc00 ffff880157727fd8
Jul 18 07:15:33 server kernel: ffff880157727fd8 ffff880157727fd8 ffff88001b1bdc00 ffff880157727c48
Jul 18 07:15:34 server kernel: ffff880157727c50 7fffffffffffffff ffff88001b1bdc00 ffff88020e6efe40
Jul 18 07:15:34 server kernel: Call Trace:
Jul 18 07:15:34 server kernel: [<ffffffff8163ae49>] schedule+0x29/0x70
Jul 18 07:15:34 server kernel: [<ffffffff81638b39>] schedule_timeout+0x209/0x2d0
Jul 18 07:15:34 server kernel: [<ffffffff810b8bf3>] ? wake_up_process+0x23/0x40
Jul 18 07:15:35 server kernel: [<ffffffff8109a744>] ? wake_up_worker+0x24/0x30
Jul 18 07:15:35 server kernel: [<ffffffff8163b216>] wait_for_completion+0x116/0x170
Jul 18 07:15:35 server kernel: [<ffffffff810b8c30>] ? wake_up_state+0x20/0x20
Jul 18 07:15:36 server kernel: [<ffffffff81099729>] call_usermodehelper_exec+0x169/0x1a0
Jul 18 07:15:37 server kernel: [<ffffffff8123e935>] do_coredump+0x5e5/0xb10
Jul 18 07:15:37 server kernel: [<ffffffff8108f593>] ? __sigqueue_free.part.11+0x33/0x40
Jul 18 07:15:44 server kernel: [<ffffffff8108fc7c>] ? __dequeue_signal+0x13c/0x220
Jul 18 07:15:46 server kernel: [<ffffffff81092c07>] get_signal_to_deliver+0x1c7/0x6d0
Jul 18 07:15:46 server kernel: [<ffffffff81014417>] do_signal+0x57/0x6c0
Jul 18 07:15:46 server kernel: [<ffffffff8162e487>] ? is_prefetch.isra.17+0xe4/0x215

...

Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] Sense Key : Medium Error [current]
Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] Add. Sense: Unrecovered read error
Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] CDB: Read(10) 28 00 00 09 ae 80 00 00 40 00
Jul 18 07:16:10 server kernel: blk_update_request: critical medium error, dev sda, sector 634496
Jul 18 07:16:10 server kernel: Read-error on swap-device (8:0:634504)
Jul 18 07:16:10 server kernel: Read-error on swap-device (8:0:634512)
Jul 18 07:16:10 server kernel: Read-error on swap-device (8:0:634520)
Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634528)
Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634536)
Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634544)
Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634552)
Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634560)
Jul 18 07:16:12 server kernel: type=1701 audit(1531894565.339:104808): auid=4294967295 uid=1042 gid=1040 ses=4294967295 pid=21855 comm="php-cgi" reason="memory violation" sig=7
Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] Sense Key : Medium Error [current]
Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] Add. Sense: Unrecovered read error
Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] CDB: Read(10) 28 00 00 7e 36 c0 00 00 40 00
Jul 18 07:16:19 server kernel: blk_update_request: critical medium error, dev sda, sector 8271552
Jul 18 07:16:20 server kernel: Read-error on swap-device (8:0:8271560)
Jul 18 07:16:20 server kernel: Read-error on swap-device (8:0:8271568)
Jul 18 07:16:21 server kernel: Read-error on swap-device (8:0:8271576)
Jul 18 07:16:21 server kernel: Read-error on swap-device (8:0:8271584)
Jul 18 07:16:21 server kernel: Read-error on swap-device (8:0:8271592)
Jul 18 07:16:22 server kernel: Read-error on swap-device (8:0:8271600)
Jul 18 07:16:22 server kernel: Read-error on swap-device (8:0:8271608)
Jul 18 07:16:22 server kernel: Read-error on swap-device (8:0:8271616)
 
Last edited:

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,309
363
Houston
Hi @Richard Edwards

Based on those errors I'm curious to know when is the last time you ran an fsck on the device? This sounds indicative of a failure or a bad block that may need to be mapped out. I would suggest rebooting and running an fsck at the very least on the swap device. Ultimately though this sounds like something that would best be suited for a system administrator. If you don't have one you might find one here: System Administration Services | cPanel Forums .

Thanks!
 
Mar 7, 2016
20
1
3
Bristol
cPanel Access Level
Root Administrator
They have advised that fsck can also do more damage than it fixes - you should say the same if you are suggesting it to your users! They will consider it, but are investigating the server before resorting to doing that. If they do, hopefully it does not kill the drive as it looks in a bad state to me.

The other option is they will set up a new box for me, but I will need to migrate all my stuff - is there a way to clone everything from 1 cpanel to another? I want to avoid doing it account by account and manually updating configs as that would be a huge downtime while I do it.
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,309
363
Houston
They have advised that fsck can also do more damage than it fixes - you should say the same if you are suggesting it to your users!They will consider it, but are investigating the server before resorting to doing that. If they do, hopefully it does not kill the drive as it looks in a bad state to me.
I'm not sure where that came from as that's not entirely true the only instance in which an fsck would actually be dangerous is if the underlying hardware is damaged, none the less this is why a suggestion to contact a system administrator was made.

The other option is they will set up a new box for me, but I will need to migrate all my stuff - is there a way to clone everything from 1 cpanel to another? I want to avoid doing it account by account and manually updating configs as that would be a huge downtime while I do it.
There is no native option to clone a cPanel server, you can use the transfer tool in the event the drive is functioning. Information on the Transfer Tool can be found in the documentation here: Transfer Tool - Version 72 Documentation - cPanel Documentation

Thanks!