It has been a while since my last update here, so I thought I would publish my latest win with a client's overloaded server.
Now, before we begin, ill explain the situation, this client, who will remain nameless due to privacy reasons, hosts a very popular site that is used by thousands of people daily, one of the servers primarily serves files from an APT like repository.
The host that is serving these files is by no means slow, it is an 8 core 64bit with 16GB of ram running CentOS 5. What makes this more then mere file serving is that every request is re-written to a php script that checks for download authorization with a remote service and also logs stats on each file download.
The server was configured with Litespeed, but over the new year break some new software was released which has caused the load on this server to increase over 15 times. In an attempt to reduce load I tried just about everything, tuning Litespeed, installing eAccelerator into PHP, tuning the script that handles the downloads. These things helped, the server load was up over 150-160 before these changes and this reduced it to around 70-80.
This was still not good enough, I advised the client that there is nothing more we can do and that he would have to purchase more servers to deal with the load... but I was not at all happy with this solution.
So, over the next few days I tried a few more things, such as a reverse Nginx proxy in front of litespeed, this helped by further reducing the load to around 30-40, but again, the server was still way overloaded.
Further profiling and examination identified some very poor code in the stats script that was quickly corrected, such as fetching a file from the local host via the fopen url wrapper instead of from the hard disk directly, and using Memcache to cache hits on files and only update the database every 30 seconds. This did not reduce load but prevented Litespeed from running out of connections that were causing 500 errors.
After discussing options with the client we decided to swap out Litespeed with Nginx completely, so over the next day I re-wrote the rewrite rules into Nginx and setup PHP FastCGI workers ready to handle requests. This was all done limiting these new rules to my IP address so I could test these changes, and the public would still get passed to the Litespeed backend.
Once everything was confirmed I stopped Litespeed and started up Nginx as the stand alone webserver, with full logging to ensure that things were running smoothly. Immediately the server load dropped to 2-3, initially I thought that something was broken, PHP was not running or some other fault, but testing proved that it was all running as it should.
After a few minior tweaks to the configuration such as number of PHP workers and Nginx workers, the servers load dropped below 3 permanently. I did not expect to see such a huge performance increase by just moving away from Litespeed, I was under the false impression that Litespeed is about as fast as Nginx, and faster with PHP... but this experience has without a doubt proved that Litespeed just cant keep up with Nginx + PHP FastCGI when configured properly.