This can be long since it is a problem that I have been fighting for a long time, but I desperately need help and I hope someone has some ideas for me. My server technicians do not have much response apart from some ideas about software change.
We have a dedicated machine with around 150 active websites (all wordpress) and email addresses for some of them. Every day around 4 pm, all websites begin to slow down, mail is not delivered in a timely manner (my tests show that they arrive an hour later) and when SSH enters the server and runs Top, it seems that almost all A single website is blocking the server at the same time (even development sites where there should be no real traffic). We have been dealing with this for quite some time and have tried several things that we think could be to blame.
Each site has the same set of add-ons, Wordfence for security, Backupbuddy for backups. So, in our first round of changes, we discovered that Wordfence was running scans at this time of day, so we entered each website and deactivated the WordFence (not ideal) scan function, we thought this had helped, but that It may have had been a coincidence when the problem started again.
Then we started to look deeper into the top command and discovered that when it crashed every day all sites were running cron jobs and some of them seemed to be connected to the backup plug-in, so we went to each website and deactivated our plug-in. backup. That seemed to help for a day, but again it can be a coincidence. As the problem continues to happen, and even worse in recent days.
Now my servers keep saying that they are just random processes and the server is getting overloaded, but why every day at the same time? You can see the running processes jump from 14 or so to 50.60.80 etc.
Now, when you look at the top command when it's happening, it doesn't seem to be cron jobs, it just says random ones like /index.php and /wp-login.php. The normal things you would expect to see, but they are only very bad when it happens, and in places that I know are not receiving visitors.
My servers are suggesting moving to Cloud Linux, which I don't mind doing, but I wanted to get some tips here first, since it seems that nobody has a clue and is just shooting in the dark.
Has anyone seen anything like this before? My knowledge with all this is limited and self-taught, but all help is greatly appreciated.
Restarting HTTP and SQL used to solve the problem, but many times it simply comes back up after a few seconds.
I will try to give more details about our specific situation.
150 (ish) sites, all running wordpress, not all updated (since some have custom add-ons or outdated themes and would require work that the client is not willing to do to correct the update). Most run in PHP 5.6 (another thing I need to fix, but it takes time that we don't have).
The unit is 90% full, which I know is not an idea, but its repair is expensive.
The server specifications are:
16 gb of ram
800 gs ssd
backup mirror unit
Cpanel / WHM / MySQL
I can provide any other necessary information, I just need to get to the bottom of this since my clients are not happy.
Should I be experiencing this kind of problem with those server specifications and so many sites? Or is something else happening?
Is obsolete PHP my problem, or websites, or disk space? I just don't know what would cause more than 100 sites to hit the server at once and delay it.
Thanks in advance for any help you can provide.