One Million TCP Connections...

2010-12-10

It seems that C10K is old hat these days and that people are aiming a little higher. I’ve seen several questions on StackOverflow.com (and had an equal number of direct emails) asking about how people can achieve one million active TCP connections on a single Windows Server box. Or, in a more round about way, what is the theoretical maximum number of active TCP connections that a Windows Server box can handle.

I always respond in the same way; basically I think the question is misguided and even if a definitive answer were possible it wouldn’t actually be that useful.

What the people asking these questions seem to ignore is that with a modern Windows Server operating system, a reasonable amount of ram and a decent enough set of CPUs it’s likely that it’s YOUR software that is the limiting factor in the equation.

Assuming it’s even possible for the chief networking wonks at Microsoft to determine the maximum number of ‘active’ TCP connections for a given hardware spec I believe that a) that number would be specific to the particular service pack of the operating system AND the network drivers but it would also depend heavily on how you designed your specific piece of server software that is going to service these connections. How much non-paged pool are YOU using? How many pages does your server software have locked in memory for I/O at any one time? If you don’t include your software in the calculation then the number is pretty meaningless.

One thing to be sure of is that you won’t get more than 16 million concurrent connections, as that seems to be the maximum value that can be set in the registry for configuring the TCP stack (see here). In reality you wont get anywhere near that figure due to all of the other limits, most of which are less documented and possibly implicit. On earlier Windows operating systems non-paged pool exhaustion was a real problem as the amount of non-paged pool memory was highly constrained and increased very slowly with relation to the physical RAM present. Vista x64 and later fix that (see here) but there are still other limits, the I/O page lock limit, for example, limits the number of memory pages that can be locked for I/O at any one time and whilst this can be tuned to high values you still need to know how your application uses memory for I/O to know what you should set this value to. Both of these are memory related limitations, but then you have the CPU. We’ll assume for a moment that you have designed your application to use I/O completion ports and you’re following the rules that I laid out in my earlier blog post about the C10K problem. Oh, have we just included your server software in our calculations?

You see the problem with asking these kinds of questions is that the answers aren’t meaningful. What will you do once you know that you can support 1 million active TCP connections on a single Windows Server 2008R2 box? What does that fact allow you to do that you couldn’t do before? It doesn’t tell you what a server running your server software can support because your software can be arbitrarily complex and use an unknown (and probably equally unmeasurable amount of resources per connection). Are you looking for a big stick to hit your programmers with? “Microsoft says that this spec box will support 500,000 concurrent TCP connections, why are we only achieving 100,000?” I’m sorry but the answer to that is the same as if you removed the “Microsoft says” part; we need to profile the server to see… So, since you need to profile the server to answer your questions knowing the theoretical maximum doesn’t really help. I stand by my earlier blog post on concurrent connection testing. You need to do this kind of testing with your real server software from Day 0. There is no simple answer.

Given that you’re building scalability tests from the start you’ll also learn about the other scalability limits as you grow your server. How are you going to route all of those connections to your server machine or machines? What is your strategy for down time and server maintenance? Etc.

So, there is a definitive answer to the questions posed at the start of this posting:

Q - “How many active TCP connections can a Windows Server 2008 R2 box support?” A - More than the server software that you’re running on it.

Q - “Can a machine of a given spec running Windows Server 2003 support 1 million concurrent TCP connections?” A - Only if your server software can also support that number of connections…

Separating the hardware and OS part of the question from your custom developed server solution is not meaningful.