High server load leading to 503 error

pqpier

Active Member
Aug 12, 2022
26
1
3
Brazil
cPanel Access Level
Root Administrator
For the past 3 days, every morning around 7am BRT one of my servers is facing high server load and 503 service unavailable errors.

I have 2600 cpanel accounts in it, with around 2100 ~ "active" and 500 suspended accounts.

Something is occurring and is generating massive log files on /var/log that quickly take up all free disk space (around 400gb)

The error seem in the logs is:
2022-12-11 13:25:33.999958 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:80] can't accept:Too many open files in system!
2022-12-11 13:25:33.999967 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:443] can't accept:Too many open files in system!
2022-12-11 13:25:33.999974 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:80] can't accept:Too many open files in system!
2022-12-11 13:25:33.999981 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:443] can't accept:Too many open files in system!
2022-12-11 13:25:33.999987 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:80] can't accept:Too many open files in system!
2022-12-11 13:25:33.999996 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:443] can't accept:Too many open files in system!
2022-12-11 13:25:34.000004 [ERROR] [2632281] [T0] SocketListener::handleEvents(): [*:80] can't accept:Too many open files in system!

Also one weird thing is when I terminate suspended accounts it's taking quite a lot of time (20-30 seconds) while on my other server it takes 3-5.
And while the accs are being terminated, load average goes up a lot from 1.5 to around 4-5.

My server is with Hivelocity, they can't find the cause. I hired Bobcares, they also can't find the cause...

I'm freaking out. We had 3 hours of downtime today. "Luckily" this only happens in the morning while not many users are online.

Without knowing the cause, only fix we find is to wait a bit, then reboot server and delete all the massive logs.
 

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
17,470
2,843
363
cPanel Access Level
Root Administrator
Hey there! I'm sorry to hear about these issues with the system. It seems odd to me that two separate admins weren't able to find the cause of the issue. If that's the case, I'm not sure how much help the Forums will be, as we're just guessing instead of having direct access to the server.

It sounds like something that wouldn't be related to cPanel, but possible a user process or bad script that is being called.

I would think the true error happens before the "Too many open files in system" message, as that is just a symptom of the disk being full. By the time that happens, the root cause has already happened.

You're always welcome to submit a ticket to cPanel directly and have us take a look, but since this isn't related to the cPanel software we may not find anything on our end either. It doesn't hurt to ask, though!
 

pqpier

Active Member
Aug 12, 2022
26
1
3
Brazil
cPanel Access Level
Root Administrator
I know this might be just guessing work but...

I'm finding really strange that average loads goes from 0.6 to 5 (almost 10x) just for terminating a list of 20 accounts via WHM, in a span of 2-3 minutes (which in my view is taking too long compared to my experience).

This is the process manager while terminating the accounts, although guessing, just by looking at it, does it look "normal" to you?
Screenshot from 2022-12-13 18-16-52.png

Hey there! I'm sorry to hear about these issues with the system. It seems odd to me that two separate admins weren't able to find the cause of the issue. If that's the case, I'm not sure how much help the Forums will be, as we're just guessing instead of having direct access to the server.

It sounds like something that wouldn't be related to cPanel, but possible a user process or bad script that is being called.

I would think the true error happens before the "Too many open files in system" message, as that is just a symptom of the disk being full. By the time that happens, the root cause has already happened.

You're always welcome to submit a ticket to cPanel directly and have us take a look, but since this isn't related to the cPanel software we may not find anything on our end either. It doesn't hurt to ask, though!
 

ServerHealers

Well-Known Member
Sep 21, 2015
100
58
78
India
cPanel Access Level
Root Administrator
From above top results, the IO usage seems normal. Regarding the account termination taking time, it could be related to the size of those accounts. If those accounts consuming high disk space, then it might cause slowness which is normal. However, I cannot confirm anything further without looking at the server in detail.

In regards to the high load every morning, is there any backup process running at that time? This is what usually causing such daily issues. If there are no backup execution at that time, I'd check for the crons and see if there are anything suspicious configured to run at that time. Also, look at "top -c" output at the time of morning load spike to see what process consuming highest resource and the IO usage of the time to break it down further.
 
  • Like
Reactions: cPRex