Keep Server Online
If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.
or
A donation makes a contribution towards the costs, the time and effort that's going in this site and building.
Thank You! Steffen
Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
| |
|
Topic: Apache continually growing children (solved) |
|
Author |
|
chagrin
Joined: 11 Dec 2017 Posts: 1
|
Posted: Mon 11 Dec '17 18:12 Post subject: Apache continually growing children (solved) |
|
|
I have a site running Apache 2.4.28, prefork MPM, with a WebLogic plugin and mod_wsgi on a pair of load-balanced Linux VMs running between 15 and 50 requests/s over the course of the day. The problem that I'm seeing is that Apache keeps spawning new children over time, slowly at first and eventually to its MaxRequestWorkers (2000). Presently I'm restarting it twice a day due to this problem.
Things I've tried:
* Running "strace" on the processes doesn't show any unusual behavior. Everything is chugging along fine, although as more children spawn their individual workload becomes less.
* No unusual netstat activity (nothing stuck in CLOSE_WAIT, etc.)
* KeepAlive is off (and didn't change behavior when it was on).
* I've already thrown an embarrassing amount of CPU and RAM at the servers; there's no problem there.
* Tweaking the Min/MaxSpareServers hasn't had any effect.
In general I'd describe the problem as something wrong with the algorithm that Apache uses when it decides to create/close its processes. But I've been running and building Apache for about 20 years now and I've never seen anything like this.
Any suggestions on what I might check?
--- edit ---
In the great tradition of answering one's own question I've found the problem appears to be identical to the futex_wait() bug present in RHEL 6.6 / 7.1 kernels; attaching a debugger or strace-ing the process breaks it out of its lock and causes it to appear to be running normally. Executing a SIGSTOP/SIGCONT on the child process also breaks it out of its lock.
Neat kernel bug. Now to figure out why my recent OEL kernel still has it present. |
|
Back to top |
|
|
|
|
|
|