As the recent spate of bug fix and patch releases shows I'm not scared of talking about the bugs that I find in the code of The Server Framework and pushing fixes out quickly. It's my belief that the most important thing to get out of a bug report is an improved process which will help prevent similar bugs from occurring in future and the only way to achieve that is to be open about the bugs you find and equally open about how you then address them and try and prevent similar issues. Every bug is an opportunity to improve. Sometimes I wish I had fewer opportunities...
I've been improving my pre-release testing system and now run a lock inversion detector as part of my build machine's build and test cycle for the socket server examples. This lock inversion detector can detect the potential to deadlock without the code ever needing to actually deadlock, so it's a pretty powerful tool. It has detected a lock inversion in the async connectors used by the OpenSSL, SChannel and SSPI Negotiate libraries. I've fixed these problems and there will be a 6.3.2 release next week.
At present I haven't finished changing all of the build scripts for all of the servers to include the lock inversion detection phase into the test runs and so there may be more issues yet to be discovered.
Note that these lock inversions have not, as far as I know, ever actually caused a deadlock, but the possibility is there if the right timing occurs between concurrent read and write operations.
As I mentioned last time, supporting a large number of concurrent connections on a modern Windows operating system is reasonably straight forward if you get your initial design right; use an I/O Completion Port based design, minimise context switches, data copies and memory allocation and avoid lock contention... The Server Framework gives you this as a starting point and you can often use one of the many, complete and fully functionaly, real world example servers to provide you with a whole server shell, complete with easy performance monitoring and SSL security, where you simply have to fill in your business logic.
Unfortunately it can be all too easy to squander the scalability and performance that you have at the start with poor design choices along the way. These design choices are often quite sensible when viewed from a single threaded or reasonably complex desktop development viewpoint but can be performance killers when writing scalable servers. It's also easy to over engineer your solution because of irrational performance fears; the over-engineering takes time and delays your delivery, and can often add complexity which then causes maintenance issues for years after. There's a fine line to walk between the two and I firmly believe that the only way to walk this line is to establish realistic performance tests from day 0.
Step one on the road to a high performance server is obviously to select The Server Framework to do your network I/O ;)
Step two is to get a shell of an application up and running so that you can measure its performance.
Step three is, of course, to measure.
There is no excuse for getting to your acceptance testing phase only to find that your code doesn't perform adequately under the desired load. What's more, at that point it's often too late to do anything about the problem without refactoring reams of code. Even if you have decent unit test coverage, refactoring towards performance is often a painful process. The various inappropriate design decisions that you can make tend to build on each other and the result can be difficult to unpick. The performance of the whole is likely to continue to suffer even as you replace individual poor performing components.
So the first thing you should do once you have your basic server operating, even if all it does is echo data back, is to build a client that can stress it and which you can develop in tandem with the server to ensure that real world usage scenarios can scale. The Server Framework comes with an example client that provides the basic shell of a high performance multiple client simulator. This allows you to set up tests to prove that your server can handle the loads that you need it to handle. Starting with a basic echo server you can first base line the number of connections that it can handle on a given hardware platform. Then as you add functionality to the server you can performance test real world scenarios by sending and receiving appropriate sequences of messages. As your server grows in complexity you can ensure that your design decisions don't adversely affect performance to the point where you no longer meet your performance targets. For example, you might find that adding a collection of data which connections need to access on every message causes an unnecessary synchronisation point across all connections which reduces the maximum number of active connections that you can handle from vastly above your performance target to somewhere very close to it... Knowing this as soon as the offending code is added to the code base means that the redesign (if deemed required) is less painful. Tracking this performance issue down later on and then fixing it might be considerably harder once the whole server workflow has come to depend on it.
I'm a big fan of unit testing and agile development and so I don't find this kind of 'incremental' acceptance testing to be anything but sensible and, in the world of high performance servers, essential.
You can download a compiled copy of the Echo Server test program from here, where I talk about using it to test servers developed using WASP.
Of course the key to this kind of testing is using realistic scenarios. When your test tools are written with as much scalability and performance as the server under test it's easy to create unrealistic test scenarios. One of the first problems that clients using the echo server test program to evaluate the performance of example servers have is that of simulating too many concurrent connection attempts. Whilst it's easy to generate 5000 concurrent connections and watch most servers fail to deal with them effectively it's not usually especially realistic. A far more realistic version of this scenario might be to handle a peak of 1000 connections per second for 5 seconds, perhaps whilst the server is already dealing with 25,000 concurrent connections that had arrived at a more modest connection rate. Likewise it's easy to send messages as fast as possible but that's often not how the server will actually be used. The Echo Server test program can be configured to establish connections and send data at predetermined rates which helps you build more realistic tests.
You should also be careful to make sure that you're not, in fact, simply testing the capabilities of the machines being used to run the test clients, or the network bandwidth between them and the server. With the easy availability of cloud computing resources such as Amazon's EC2 it's pretty easy to put together a network of machines to use to load test your server.
Once you have a suitable set of clients, running a reasonably number of connections each you can begin to stress your server with repeatable, preferably scripted, tests. You can then automate the gathering of results using perfmon and your server's performance counters mixed in with the standard system counters.
Personally I tend to do two kinds of load tests. The first is to prove that we can achieve the client's target performance for the desired number of connections on given hardware. The second is to see what happens when we drive the server to destruction. These destruction tests are useful to know what kind of gap there is between target performance and server meltdown and also to ensure that server meltdown is protected against, either by actively limiting the number of connections that a server is willing to accept or by ensuring that performance degrades gracefully rather than spectacularly.
Knowledge is power, and when aiming to build super scalable, high performance code you need to gather as much knowledge as you can by measuring and performance testing your whole system from the very start.
As I mentioned in the release notes for v6.3 here, I've added some code to prevent potential recursion issues if certain performance improvements are enabled.
In Windows Vista and later it's possible to set the
FILE_SKIP_COMPLETION_PORT_ON_SUCCESSflag on a socket using SetFileCompletionNotificationModes(). When this flag is set an overlapped operation can complete "in-line" and the completion operation can be handled on the thread that issued the operation rather than on one of the threads that is servicing the IO completion port that is associated with the socket. This is great as it means that if, for example, data is already available when an overlapped read is issued then we avoid a potentially costly context switch to an I/O thread to handle this data. The downside of this is that the code for handling overlapped completions becomes potentially recursive. If we issue a read and it completes straight away and is handled on the thread that issued it then the code that handles the read completion is likely to issue another read which itself may complete "in-line", etc. With a suitable rate of supply of inbound data this can lead to stack overflows due to unconstrained recursion.