Hello.
Something weird happened today at 3 am when I run my backups. Ever since installing cPanel I've enabled full daily backups transported to Amazon S3 (no local backups, and S3 is configured via official options, not mounted to the server) and cPanel excelled at it. But today it crashed mid backup and took the whole server down. Woke up with lots of notifications from Freshping and one customer asking why their site was down
Restarted the server and all's fine. So I checked the logs and the last line was:
[2019-03-03 03:52:23 -0300] Creating Archive .................................................................................... ..............................cpuwatch (Sun Mar 3 03:58:15 2019): System load is currently 3.75; waiting for it to go down below 3.50 to continue …
After that everything was down, all websites and the panel, until I restarted the server.
This was related to an account that has around 9gb of files but I have other accounts as large as this one that were backed up successfully on that run. Based on what was saved to S3, the server backed up about 1/5th of the accounts.
Server is a 4-core and has plenty of free space. Vultr logs shows that in that moment the CPU usage didn't go over 200%, however that graph (last 24h) doesn't have much resolution so I can't confirm it didn't go over 375%.
So my questions are:
1. Anyone has any idea about what happened? Plenty of full backups completed just fine the previous days.
2. Any way to set up an automatic server restart if it gets unresponsive like this or the backup never finishes?
3. I know cPanel can't restore backups stored on S3, but why does the restore option in WHM only shows some accounts, even though I can see them all on the S3 interface? Just curious about this one, though.
4. There is an option "Extra CPUs for server load" that is currently set at zero. What does this do? Also I learned that server load can't go over 4 as this is a 4-core server. Seeing that backup stopped at 3.75 load, it seems that it correctly identified the number of CPUs/cores.
Additionally, if there's a better way to configure these backups, I'd like to hear.
Something weird happened today at 3 am when I run my backups. Ever since installing cPanel I've enabled full daily backups transported to Amazon S3 (no local backups, and S3 is configured via official options, not mounted to the server) and cPanel excelled at it. But today it crashed mid backup and took the whole server down. Woke up with lots of notifications from Freshping and one customer asking why their site was down
Restarted the server and all's fine. So I checked the logs and the last line was:
[2019-03-03 03:52:23 -0300] Creating Archive .................................................................................... ..............................cpuwatch (Sun Mar 3 03:58:15 2019): System load is currently 3.75; waiting for it to go down below 3.50 to continue …
After that everything was down, all websites and the panel, until I restarted the server.
This was related to an account that has around 9gb of files but I have other accounts as large as this one that were backed up successfully on that run. Based on what was saved to S3, the server backed up about 1/5th of the accounts.
Server is a 4-core and has plenty of free space. Vultr logs shows that in that moment the CPU usage didn't go over 200%, however that graph (last 24h) doesn't have much resolution so I can't confirm it didn't go over 375%.
So my questions are:
1. Anyone has any idea about what happened? Plenty of full backups completed just fine the previous days.
2. Any way to set up an automatic server restart if it gets unresponsive like this or the backup never finishes?
3. I know cPanel can't restore backups stored on S3, but why does the restore option in WHM only shows some accounts, even though I can see them all on the S3 interface? Just curious about this one, though.
4. There is an option "Extra CPUs for server load" that is currently set at zero. What does this do? Also I learned that server load can't go over 4 as this is a 4-core server. Seeing that backup stopped at 3.75 load, it seems that it correctly identified the number of CPUs/cores.
Additionally, if there's a better way to configure these backups, I'd like to hear.