October 2010 Archives
As I mentioned last time, supporting a large number of concurrent connections on a modern Windows operating system is reasonably straight forward if you get your initial design right; use an I/O Completion Port based design, minimise context switches, data copies and memory allocation and avoid lock contention... The Server Framework gives you this as a starting point and you can often use one of the many, complete and fully functionaly, real world example servers to provide you with a whole server shell, complete with easy performance monitoring and SSL security, where you simply have to fill in your business logic.
Unfortunately it can be all too easy to squander the scalability and performance that you have at the start with poor design choices along the way. These design choices are often quite sensible when viewed from a single threaded or reasonably complex desktop development viewpoint but can be performance killers when writing scalable servers. It's also easy to over engineer your solution because of irrational performance fears; the over-engineering takes time and delays your delivery, and can often add complexity which then causes maintenance issues for years after. There's a fine line to walk between the two and I firmly believe that the only way to walk this line is to establish realistic performance tests from day 0.
Step one on the road to a high performance server is obviously to select The Server Framework to do your network I/O ;)
Step two is to get a shell of an application up and running so that you can measure its performance.
Step three is, of course, to measure.
There is no excuse for getting to your acceptance testing phase only to find that your code doesn't perform adequately under the desired load. What's more, at that point it's often too late to do anything about the problem without refactoring reams of code. Even if you have decent unit test coverage, refactoring towards performance is often a painful process. The various inappropriate design decisions that you can make tend to build on each other and the result can be difficult to unpick. The performance of the whole is likely to continue to suffer even as you replace individual poor performing components.
So the first thing you should do once you have your basic server operating, even if all it does is echo data back, is to build a client that can stress it and which you can develop in tandem with the server to ensure that real world usage scenarios can scale. The Server Framework comes with an example client that provides the basic shell of a high performance multiple client simulator. This allows you to set up tests to prove that your server can handle the loads that you need it to handle. Starting with a basic echo server you can first base line the number of connections that it can handle on a given hardware platform. Then as you add functionality to the server you can performance test real world scenarios by sending and receiving appropriate sequences of messages. As your server grows in complexity you can ensure that your design decisions don't adversely affect performance to the point where you no longer meet your performance targets. For example, you might find that adding a collection of data which connections need to access on every message causes an unnecessary synchronisation point across all connections which reduces the maximum number of active connections that you can handle from vastly above your performance target to somewhere very close to it... Knowing this as soon as the offending code is added to the code base means that the redesign (if deemed required) is less painful. Tracking this performance issue down later on and then fixing it might be considerably harder once the whole server workflow has come to depend on it.
I'm a big fan of unit testing and agile development and so I don't find this kind of 'incremental' acceptance testing to be anything but sensible and, in the world of high performance servers, essential.
You can download a compiled copy of the Echo Server test program from here, where I talk about using it to test servers developed using WASP.
Of course the key to this kind of testing is using realistic scenarios. When your test tools are written with as much scalability and performance as the server under test it's easy to create unrealistic test scenarios. One of the first problems that clients using the echo server test program to evaluate the performance of example servers have is that of simulating too many concurrent connection attempts. Whilst it's easy to generate 5000 concurrent connections and watch most servers fail to deal with them effectively it's not usually especially realistic. A far more realistic version of this scenario might be to handle a peak of 1000 connections per second for 5 seconds, perhaps whilst the server is already dealing with 25,000 concurrent connections that had arrived at a more modest connection rate. Likewise it's easy to send messages as fast as possible but that's often not how the server will actually be used. The Echo Server test program can be configured to establish connections and send data at predetermined rates which helps you build more realistic tests.
You should also be careful to make sure that you're not, in fact, simply testing the capabilities of the machines being used to run the test clients, or the network bandwidth between them and the server. With the easy availability of cloud computing resources such as Amazon's EC2 it's pretty easy to put together a network of machines to use to load test your server.
Once you have a suitable set of clients, running a reasonably number of connections each you can begin to stress your server with repeatable, preferably scripted, tests. You can then automate the gathering of results using perfmon and your server's performance counters mixed in with the standard system counters.
Personally I tend to do two kinds of load tests. The first is to prove that we can achieve the client's target performance for the desired number of connections on given hardware. The second is to see what happens when we drive the server to destruction. These destruction tests are useful to know what kind of gap there is between target performance and server meltdown and also to ensure that server meltdown is protected against, either by actively limiting the number of connections that a server is willing to accept or by ensuring that performance degrades gracefully rather than spectacularly.
Knowledge is power, and when aiming to build super scalable, high performance code you need to gather as much knowledge as you can by measuring and performance testing your whole system from the very start.
Using a modern Windows operating system it's pretty easy to build a server system that can support many thousands of connections if you design the system to use the correct Windows APIs. The key to server scalability is to always keep in mind the Four Horsemen of Poor Performance as described by Jeff Darcy in his document on High Performance Server Architecture. These are:
- Data copies
- Context switches
- Memory allocation
- Lock contention
I'll look at context switches first, as IMHO this is where outdated designs often rear their head first. On Windows systems you MUST be using I/O Completion Ports and overlapped I/O if you want the best scalability. Using the IOCP API correctly can mean that you can service thousands of concurrent connections with a handful of threads.
So, we'll assume we're using a modern overlapped I/O, IO Completion Port based design; something similar to what The Server Framework provides, perhaps. Using an IOCP allows you to limit the number of threads that are eligible to run at the same time and the IOCP uses a last in, first out mechanism to ensure that the thread that was last active and processing is always the next one to be given new work, thus preventing a new thread's memory needing to be paged in.
The latest versions of Windows allow you to reduce context switching even further by enabling various options on each socket. I've spoken in detail about how
FILE_SKIP_COMPLETION_PORT_ON_SUCCESScan help to reduce context switching on my blog, here. The gist is that with this option enabled you can process overlapped operation completions on the same thread that issued the operation if the operation can complete at the time it's issued. This removes unnecessary context switching. The Server Framework supports this mode of operation on operating systems that support it.
Next on to data copies, as Jeff says in his document, one of the best ways to avoid data copies is to use reference counted buffers and to manage them in terms of the amount of data present in them, building buffer chains where necessary. The Server Framework has always worked in terms of flexible, reference counted buffers. Many server designs can benefit from accumulating inbound data into a single buffer with no buffer copying required simply by reissuing a read on a connection with the buffer that was passed to you when the previous read completed. In this way it's easy to accumulate 'complete messages' and process them without needing to copy data.
Since the buffers are reference counted you can easily pass them off to other threads for processing, or keep them hanging around until you're done with them. The
CMultiBufferHandleclass allows you to use scatter/gather I/O so that you can transmit common blocks of data without needing to copy the common data and the
CBufferHandleclass allows you to broadcast data buffers to multiple connections without needing to copy the data for each connection.
Memory allocation during connection processing is minimal and custom buffer allocators can reduce this even further. The Server Framework ships with several different allocators and it's easy to implement your own if you need to and simply plug it in to the framework code. By default the buffer and socket allocators pool memory for reuse which helps reduce contention and improve performance.
Once you're processing your messages several tricks can be employed to optimise your memory allocation use, a favourite of mine is to use custom memory allocators that use scratch space in the buffer that the message was initially read into. This can then be used to provide all of the dynamic memory needed during message processing and avoids the need for traditional memory management and its potential lock contention.
Lock contention within the framework itself is limited. The design of the socket object is such that you can design a server where there's never any need to access a common collection of connection objects simply to obtain per connection data from it. Each connection can have as much user data as you like associated with it and this can all be accessed from the connection without the need for any locks. The locks in the buffer allocators are probably the most likely locks to result in contention but here you can select from several different strategies, including a lock free allocator.
Of course once you've our of the framework code and into your own code you still have to be careful not to fall foul of the Four Horsemen of Poor Performance, but you can rest assured that The Server Framework is designed very much with these issues in mind. I believe that the only way to make sure that you maintain scalability and performance is to test for it at all stages of development, it's important to set up scalability tests from day 1 and to run them regularly using real world usage scenarios. I'll talk about how to go about setting up these kinds of tests in a later blog post.
As you've seen from some of the earlier tutorials, WASP has quite a few command line parameters that can change how it runs. You can run WASP as a normal executable or install it as a Windows Service. The complete set of command line options are displayed if you run WASP with /help or with an option that it doesn't understand but I thought I'd list them all here for completeness and so that I can explain in a little more detail what each one does.
So far the tutorials have focused on a simple length prefixed message type. This is probably the easiest message in the world to process, the message framing is very simple and there's hardly anything to do in your message framing DLL. Unfortunately not all protocols are this simple to parse. Another common real-world protocol is a line based protocol that is delimited by a terminating character, or characters. One such protocol is the POP3 protocol which works in terms of commands which are delimited by the CR LF sequence.
In this tutorial we'll explore how we can write a message framing plugin for CR LF terminated messages and use per connection user data to track our parsing state.
As you have discovered if you've been following the tutorials, WASP is configured using an XML file.
This file can either live in the same directory as the WASP executable or, for when you're running WASP as a Windows Service, it can live in a place that is configured in the registry.
The file is pretty simple and we've covered most of the options in the various tutorials but there are some configuration options that we haven't touched on yet and it seems sensible to have one place to look for details of all of the options that you can configure in the config file. This blog post is the place!
By now you've probably taken a look inside of the WASP SDK header, WASPDLLEntryPoints.h and seen all of the various plugin entry points that you can export from your plugin. This tutorial will explain what each of them is for and how you use them and will present a simple plugin which uses all of the entry points and logs its actions to WASP's debug log.
A single WASP plugin can be loaded by multiple end points to provide the same server on multiple ports. A plugin could, for example, be configured on one end point to provide services to the internal network and on another end point to provide services to the internet. Alternatively, in later WASP releases, a single plugin may be used to provide services over an insecure link on one end point and via an SSL protected link on another.
To enable the plugin to distinguish between connections from a specific end point WASP supports the concept of server instances.
As I mentioned in the release notes for v6.3 here, I've added some code to prevent potential recursion issues if certain performance improvements are enabled.
In Windows Vista and later it's possible to set the
FILE_SKIP_COMPLETION_PORT_ON_SUCCESSflag on a socket using SetFileCompletionNotificationModes(). When this flag is set an overlapped operation can complete "in-line" and the completion operation can be handled on the thread that issued the operation rather than on one of the threads that is servicing the IO completion port that is associated with the socket. This is great as it means that if, for example, data is already available when an overlapped read is issued then we avoid a potentially costly context switch to an I/O thread to handle this data. The downside of this is that the code for handling overlapped completions becomes potentially recursive. If we issue a read and it completes straight away and is handled on the thread that issued it then the code that handles the read completion is likely to issue another read which itself may complete "in-line", etc. With a suitable rate of supply of inbound data this can lead to stack overflows due to unconstrained recursion.
So far our simple example WASP plugins have all used
OnReadCompletedEx()which gives you both an input and an output buffer and assumes that you generate a single response to each inbound message. It also assumes that you wont write more data than will fit in a single I/O buffer. Whilst this is suitable for some server designs it's quite restrictive. Most plugins will probably use a combination of
OnReadCompleted()and the WASP callback function
OnReadCompleted()only provides inbound data and
writeToConnection()can be called to send any amount of data to any active connection and can be called by your plugin at any time.
I'm currently re-reading "High Performance Server Architecture" by Jeff Darcy and he has a lot of sensible stuff to say about avoiding context switches and how my multiple thread pool design, whilst conceptually good is practically not so good. In general I agree with him but often the design provides good enough performance and it's easy to compose from the various classes in The Server Framework.
Explicitly managing the threads that could run, using a semaphore that only allows a number of threads that is equal to or less than your number of cores to do work at once is a nice idea but one that adds complexity to the workflow as you need to explicitly acquire and release the semaphore as you perform your blocking operations. This approach, coupled with a single thread pool with more threads than you have processors would likely result in less context switches and higher performance.
I'm currently accumulating ideas for the performance work that I have scheduled for the 6.4 release, I expect a single pool design with a running threads limiter will feature...
Way back in 2002 when I was developing ISO8583 servers for PayPoint I put together a two thread pool server design that has worked so well that many of the servers that I develop today still use variations on the original design. The main idea behind the design was that the threads that do the network I/O should never block and the threads that do the user work can block if they like. Since this work was being done before Windows Vista came along with it's changes to how overlapped I/O that was still pending when the thread that issued it exited, see here. The I/O threads were not allowed to exit. However, to handle peaks and troughs in demand and operations that could block for various lengths of time (due to database access) it was useful to be able to expand and contract the thread pool that did the actual work. This led to a design where we had a fixed sized pool of I/O threads and a variable sized pool of "business logic" threads. Dispatch to the business logic pool was via a thread safe queue (built using and I/O completion port) and the dispatch was two stage so that the dispatcher could determine when it needed to expand the pool to deal with more work. Due to way the intra-pool dispatch worked it was easy to instrument the server using performance counters so that support staff could easily visualise how heavily loaded the server was.
WASP uses a variation on this design.
OpenSSL is an open source implementation of the SSL and TLS protocols. Unfortunately it doesn't play well with windows style asynchronous sockets. This article - previously published in Windows Developer Magazine and now available on the Dr. Dobbs site - provides a simple connector that enables you to use OpenSSL asynchronously.
Integrating OpenSSL with asynchronous sockets is similar to integrating it with overlapped I/O and IO completion port based designs and so the ideas behind the code discussed in the article were then used as part of the original design for The Server Framework's OpenSSL option pack.
The simple echo server plugin that we developed in the earlier tutorial was easy to test using telnet as it simply echoed all data back to the client. The plugin which used simple message framing was less easy to test using telnet as you first needed to enter the correct bytes to specify a message length as an int in network byte order.
Neither plugin was easy to stress test using telnet as you'd need lots of monkeys and lots of machines to simulate lots of users.
The Server Framework ships with an example client that allows you to create thousands of concurrent connections and control how they send data to a server. This is an easy way to build a test system for your server as all of the complexity of managing and controlling the connections is done for you and you simply have to adjust the messages that are generated and how the response validation is done. The default message that is built is an network byte order integer length prefixed message and so this program can be used to stress test WASP with either of the two example plugins that we've developed so far.
You can download the EchoServerTest program from here.
Most TCP servers deal with distinct messages whereas TCP itself deals in terms of a stream bytes. By default a single read from a TCP stream can return any number of bytes from 1 to the size of the buffer that you supplied. TCP knows nothing about your message structure. This is where message framing comes in. If the protocol that you are supporting has a concept of what constitutes a "message" then your protocol requires message framing. The simplest message framing is a length prefixed message where all messages start with a series of one, or more, bytes that convey the length of the message to the receiver. Another common message framing style is to have a terminating character, or characters, perhaps each message is terminated with a CR LF combination. Whatever your framing requirements WASP's message framing handlers can help.
WASP plugins are, at present, native DLLs that expose a set of known entry points. The minimal plugin exposes a single entry point, either OnReadCompleted() or OnReadCompletedEx(). The entry points and other details that you need to build a plugin for WASP are detailed in the WASPDLLEntryPoint.h header file that ships in the SDK directory with WASP.
As you will have noticed if you've been following along with the tutorials, WASP displays a message box when you successfully install an instance of the service or install the performance counters. This is less than ideal for automated installations and so you can add the /noMessages command line argument to these operations to give you an installation which does not require a user to acknowledge the message box.
If you need to automate the installation of one or more instances of WASP and/or the performance counters you can using /noMessages.
WASP can expose internal metrics using performance counters. These can then be viewed using the standard Windows performance monitoring tool, perfmon. WASP's performance counters allow you to see how your server is performing and the counters can be automatically monitored and integrated with the counters from other Windows services, such as SQL server, the base operating system or the CLR, to provide a fully integrated production monitoring system.
Installation of performance counters is done once per machine rather than once per WASP instance and performance counters can be used and viewed both when WASP is running as a Windows Service and from the command line. You install the counters by running WASP from the command line with the /installCounters command line argument. As with installing WASP as a Windows Service, you will be prompted to elevate your credentials if you do not have the appropriate access rights on your account.
In the last tutorial I showed you how to run WASP as a normal command line executable. This can be useful for testing and development and also for debugging but when you want to run WASP on a production machine you probably want to run it as a Windows Service.
Running as a Windows Service has the advantage that you don't need a user logged in to the machine to run WASP, you can also control it remotely using the Windows Service Control Manager application.
To run WASP as a service you first need to install it as a service.
I've got quite a few plans for expanding the functionality that WASP provides. Ideally it should showcase all of the major options available with The Server Framework; so pretty soon I'll be adding UDP support, hosting plugins written in managed code, providing secure TCP connections using SSL, etc. Plus I'm sure that some bugs will be shaken out as users push the system in ways that I haven't anticipated and so haven't tested. So, expect there to be updates coming along.
WASP can automatically check for updates by reading a web page at www.ServerFramework.com. By default it does this once a week. You can disable this by adding a DisableUpdateCheck value to the instance node of your config file and setting it to "true". This will prevent WASP from automatically checking for updates. Alternatively you can change how often WASP checks by adding a CheckForUpdatesEvery value; valid values for this are "Day", "Week" and "Month".
WASP will remember if it's not up to date and remind you each time it starts, you can stop it doing this by adding a RemindOnStartIfNotUpToDate value to the instance node and setting it to "false".
All of the details of what the update check is doing and the results of the check are logged to the log file.
If you disable automatic update checks then you can check for updates manually by running WASP with the /checkForUpdates command line parameter. This will log the results to the log and display a message box telling you how things went. If you want to automate this process and run the test from a script then you can run WASP with /noMessages /checkForUpdates which will not display the message box and will return an exit code of 1 if an update is available and 0 if WASP is up to date.
During the update check WASP requests a web page, http://wasp.ServerFramework.com/WASP-LatestVersion.html with a user agent string that identifies the version of WASP that is making the call and the version of the operating system that it is running on.
The development of WASP has been acting as a bit of an internal driver for new feature development in the 6.3 release of The Server Framework. Sitting down to develop a service that was easy to use for a mass market exposed some small holes in the 6.2 release; nothing too serious but pretty soon after putting together the first service shell of the WASP application I had a list of nice to have additions for the Service Tools Library.
The easiest way to get started with WASP is to download the latest version from the download page, here, unzip the contents somewhere and then run the WASP executable, simply double click the executable in Explorer or navigate to the directory that you have installed WASP into and type "wasp". The result should be that a log file is created in a subdirectory of the directory where the exe is located called log. The contents of the log file will look something like this:
Version 6.3 of The Server Framework was released today.
This release includes the following, see the release notes, here, for full details of all changes.
- Performance improvements on Vista and later operating systems. See here for more details.
- Performance improvements for some designs of datagram server by reusing sockets and buffers more aggressively and thus avoiding the allocators in some situations
- A redesigned the timer queue to improve timer dispatch performance. See here for more details.
- A new implementation of the timer queue interface, implemented as a timer wheel, which is optimised for timers where the maximum timeout is known and is relatively small (ideal for most situations where timeouts can be guaranteed to be less than 30 mins). See here for more details.
- Monitoring interfaces for timer queues and the new timer wheel.
- Buffer allocation contention monitoring. See here for more details.
- Added a "low contention" buffer allocator which relies on multiple buffer pools to avoid contention on any one pool.
- Added a "reusable id" manager which will manage a pool of ids which can be reused thus avoiding id duplication due to id wrap.
- Added a simple ring buffer class.
- Added a recursion limiter which will prevent some kinds of servers from experiencing potentially unbounded recursion during read and write completions when certain 6.2 performance optimisations are enabled. See here for more details.
- Lots of changes to the Service Tools library, see here for more details.
- Enabled hosting of the .Net 4.0 CLR and 'side by side' multiple CLR version hosting via the .Net 4.0 hosting API.
WASP is, at heart, simply another example server for The Server Framework, but it's also something a little new. Up until now potential clients have been able to look at the example servers source code and ask for pre-built executables for testing. The thing is, the examples are just that, examples and whilst release 6.2 of The Server Framework ships with over 70 examples (see here) they still only show small pieces of functionality in isolation. So, whilst it may be great for me to be able to test the number of concurrent connections that The Server Framework can maintain using a simple server example that doesn't do anything much else it's often not enough to convince people that they're right to choose The Server Framework for their networking layer.
WASP allows potential clients, and non-commercial operations who don't wish to license The Server Framework to build a server that does what they need it to do rather than what I want to show them. WASP will grow into a showcase for functionality that is available within The Server Framework. By writing your own DLLs you can plug your own business logic into our networking layer and to see just what The Server Framework can do for you.
But everything has to start somewhere. Right now WASP is fairly simple. It's a Windows Service that exposes Performance Counters and which can be configured using an XML config file. You get to write a DLL which you configure WASP to load and once loaded your DLL is passed networking events as they happen. Right now this is purely an unmanaged TCP solution with no SSL but, going forward, I hope to add a host of features to WASP so that you can try out all of the key features of The Server Framework.
If your project is non-commercial then you can use WASP however you want to otherwise you need to buy a license to the pieces of The Server Framework that you need before you go live. WASP will give you the confidence that you need that The Server Framework can do all of your networking for you.
ServerFramework.com is a new website that we've put together to make it easier for users and potential users of the licensed version of our high performance, I/O completion port based client/server socket framework to find all of the information that they need. As many of you know, I've been working on the code that forms The Server Framework since 2001 and it's been used by lots of our clients to produce highly scalable, high performance, reliable servers that often run continuously 24/7, all year round. Up until now the code hasn't really had a consistent name; but that's changed. Although the documentation for the libraries themselves will still refer to them as "Socket Tools" and "Win32 Tools" etc. the whole package of what was formerly "the licensed code" is now The Server Framework.
With the launch of ServerFramework.com I'm also cleaning up how you get hold of the free version of the framework. Back in 2002 I first wrote about my server development efforts over at www.CodeProject.com and with those articles I gave away an early version of the framework. That code is still available, it's unsupported but it's still a great way to kick start your development efforts if you don't want to pay for a license for The Server Framework. You can now get hold of the latest free code, now known as The Free Framework, from the download page here as a single zip file.
Something else that's a little new is WASP. I've been building custom application servers with The Server Framework for years now and so have my clients but sometimes people don't need a fully customised server solution. Sometimes you just need a robust networking layer that you don't need to worry about. WASP is a pluggable application server that's free for non-commercial use. WASP is, in effect, just another example server that ships with The Server Framework and if you have a license to The Server Framework then you can customise it and learn from it as you do all the other examples. The big difference with WASP is that you can also download the compiled binary and, due to its pluggable design, you can write your own business logic and simply plug it in to WASP. Let WASP take care of the 'highly scalable, high performance' part of networking whilst you get on with writing your server. As I said, WASP is free for non-commercial use and the idea is that if you're considering purchasing a license for The Server Framework then you might be able to get a pretty decent prototype up and running using WASP to evaluate the performance that you get from The Server Framework. Of course WASP is limited, but simply buying a license to The Server Framework gives you all the code that you need to remove those limits and customise your application server as required. You can download WASP from here.
ServerFramework.com also includes some forums. Up until now I've dealt with all support and questions directly via email and that will continue. I thought it may be useful, especially for users of WASP or The Free Framework, to be able to discuss things publicly. So now you can.
This blog will be in addition to my technical blog over at Rambling Comments. The stuff that I post here will always be directly related to developments in The Server Framework whereas the stuff over at Rambling Comments will be more technically diverse.
So, on with the show...