Something not often mentioned and tested is the impact of latency in the wild on the operation and scalability of a website. The vast majority of load tests conducted are ran from a local load source, jmeter in the same availability zone. In this case the latency is incredibly low, probably sub millisecond. In the real world your application will never see this kind of latency again, it will be anywhere from 50 to 500ms depending on the global mix of traffic you receive. This can kill the performance of your application in surprising ways.

The time Apache spends waiting for a response on low latency requests is going to be small, this allows your servers to handle a much larger volume of traffic spread over a much lower number of threads. This is further amplified if your application is handling a lot of small quick requests, say an web API. In the lab, a server might be able to handle thousands of requests per second with only 30-100 threads at any given time. Using such a small number of threads is stellar for performance, the box will require much less application concurrency. A change in latency from 1ms to 200ms will cause transactions overhead to take 200% longer by definition, if your application has a 1:1 ratio of thread to transaction this will also cause a 200% increase in concurrency. This could most obviously lead to the box running out of threads or memory in production before it reaches the performance levels seen during testing.

Latency issues could also highlight any bottlenecks in your code where the application blocks while waiting on other threads. You could see this in your performance graphs by comparing context switching and system CPU usage between QA and production, as waiting on other threads often shows up in the kernel level.

What to do

So finally, what can we do about this? Load test from over the internet! You should mimic production latency in your performance testing environment, this will ensure that you not only test the raw performance of your application but also stress production similar concurrency levels. To do this you should generate the load for your tests remotely in some form of cloud, like AWS. This raises a big question though, where do you generate the load from? If your average visitors are fairly geographically close by, you don't need to test from that far away. But if you have a truly global customer base you may want to generate load from the other side of the Atlantic. To decide where you really need a good average of your production latency, which is fairly hard to measure (I'm not about to ping every IP in my apache access log, haha). Luckily we can get this number a roundabout way through testing!

If using Apache HTTPD, the first step is to enable Apache server-status, if you want to see what this looks like httpd.apache.org has server-status enabled by default, kudos to them. Next test your app in QA, fire up enough threads to mirror the requests per second your production site sees; then measure the number of active threads ("requests currently being processed" in server-status). Using this you can compute the average latency you see on your production site like so:

production_latency = local_latency * production_threads / local_threads

To increase accuracy of your measurement increase the latency in your test environment, possibly generate the load from a near by AWS location. You will still know the latency but it then won't be so close to 0; the difference between .08ms and .07ms is pretty significant in the final number while being hard to measure accurately...

So then when you are armed with a production latency number, peruse different cloud providers and find one that has a latency to your test site that is near or somewhat larger that can be you see in production. Then when you run tests you can also test at similar application concurrency numbers to what is experienced in production!

Any comments, questions, concerns, or areas where I'm wrong that you'd like to troll?