March 2012 Archives

When I switched to looking at the performance of the more advanced RIO server designs that use IOCP it quickly became apparent that even multiple 1 Gigabit connections weren't enough of a challenge to give me any meaningful figures; my traditional IOCP datagram servers were easily able to keep up and increasing the workload per datagram required such high workloads that the tests became meaningless. So, we've brought forward the purchase of the hardware that we intended to use for our private cloud scalability testing and we now have 2 Intel 10 Gigabit AT2 cards. Switch prices are still prohibitive for lab use and so these two cards are directly connected, point to point.

The good news is that we now have a 10 Gigabit network link between two of our test servers. The bad news is that I now have to work out how to use it, the traditional datagram generation program that I was previously using to test simply doesn't scale to saturate the new link.

Windows 8 Registered I/O Performance

| 0 Comments | 1 TrackBack
I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance.

Of course these comparisons should be taken as preliminary since we're working with a beta version of the operating system. However, though I wouldn't put much weight in the exact numbers until we have a non-beta OS to test on, it's useful to see how things are coming along and familiarise ourselves with the designs that might be required to take advantage of RIO once it ships. The main thing to take away from these discussions on RIO's performance are the example server designs, the testing methods and a general understanding of why RIO performs better than the traditional Windows networking APIs. With this you can run your own tests, build your own servers and get value from using RIO where appropriate.

Do bear in mind that I'm learning as I go here, RIO is a new API and there is precious little in the way of documentation about how and why to use the API. Comments and suggestions are more than welcome, feel free to put me straight if I've made some mistakes, and submit code to help make these examples better.

Please note: the results of using RIO with a 10 Gigabit Ethernet link are considerably more impressive than those shown in this article - please see here for how we ran these tests again with improved hardware.
Now that we have five example servers, four RIO designs and a traditional polled UDP design, we can begin to look at how the RIO API performs compared to the traditional APIs. Of course these comparisons should be taken as preliminary since we're working with a beta version of the operating system. However, though I wouldn't put much weight in the exact numbers until we have a non-beta OS to test on, it's useful to see how things are coming along and familiarise ourselves with the designs that might be required to take advantage of RIO once it ships.

Sending a stream of datagrams

Before we can compare performance we need to be able to push the example servers hard. We do this by sending a stream of datagrams at them as fast as we can for a period of time. The servers start timing when they get the first datagram and then count the number of datagrams that they process. The test finishes by sending a series of smaller datagrams at the server. When the server sees one of these smaller datagrams it shuts down and reports on the time taken and the number of datagrams processed and the rate at which they were processed.

All we need to be able to do to stress the servers is to send datagrams at a rate that gets close to 100% utilisation of a 1Gb Ethernet link. This is fairly simple to achieve using the traditional blocking sockets API.
   for (size_t i = 0; i < DATAGRAMS_TO_SEND; ++i)
   {
      if (SOCKET_ERROR == ::WSASendTo(
         s,
         &buf,
         1,
         &bytesSent,
         flags,
         reinterpret_cast<sockaddr *>(&addr),
         sizeof(addr),
         0,
         0))
      {
         ErrorExit("WSASend");
      }
   }

There's not much more to it than that. We use similar code to setup and clean up, but if you've been following along with the other examples then there's nothing that needs to be explained about that.

The code for this example can be downloaded from here. This code requires Visual Studio 11, but would work with earlier compilers if you have a Windows SDK that supports RIO. Note that Shared.h and Constants.h contain helper functions and tuning constants for ALL of the examples and so there will be code in there that is not used by this example. You should be able to unzip each example into the same directory structure so that they all share the same shared headers. This allows you to tune all of the examples the same so that any performance comparisons make sense. This program can be run on versions of Windows prior to Windows 8, which is useful for testing as you only need one machine set up with the beta of Windows 8 server.
This article presents the fifth in my series of example servers for comparing the performance of the Windows 8 Registered I/O Networking extensions, RIO, and traditional Windows networking APIs. This example server is a traditional polled UDP design that we can use to compare to the RIO polled UDP example server. I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

A traditional polled UDP server

This server is probably the simplest UDP server you could have. It's pretty much just a tight loop around a blocking call to WSARecv(). There's none of the complexity required by RIO for registering memory buffers for I/O and so we use a single buffer that we create on the stack.
   do
   {
      workValue += DoWork(g_workIterations);

      if (SOCKET_ERROR == ::WSARecv(
         s,
         &buf,
         1,
         &bytesRecvd,
         &flags,
         0,
         0))
      {
         ErrorExit("WSARecv");
      }

      if (bytesRecvd == EXPECTED_DATA_SIZE)
      {
         g_packets++;
      }
      else
      {
         done = true;
      }
   }
   while (!done);
There is some added complexity to allow us to compare performance, and this is similar to the RIO server examples. We can add an arbitrary processing overhead to each datagram by setting g_workIterations to a non zero value and we count each datagram that arrives and stop the test when a datagram of an unexpected size is received.
This article presents the fourth in my series of example servers using the Windows 8 Registered I/O Networking extensions, RIO. This example server, like the last example, uses the I/O Completion Port notification method to handle RIO completions, but where the last example used only a single thread to service the IOCP this one uses multiple thread to scale the load . I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

Using an I/O Completion Port for RIO completions

As I mentioned back in October, there are three ways to receive completion notifications from RIO; polling, event driven and via an I/O Completion Port. Using an IOCP for RIO completions allows you to easily scale your completion handling across multiple threads as we do here and this is the first of my example servers that allows for more than one thread to be used to process completions.
This article presents the third in my series of example servers using the Windows 8 Registered I/O Networking extensions, RIO. This example server uses the I/O Completion Port notification method to handle RIO completions, but only uses a single thread to service the IOCP. I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

Using an I/O Completion Port for RIO completions

As I mentioned back in October, there are three ways to receive completion notifications from RIO; polling, event driven and via an I/O Completion Port. Using an IOCP for RIO completions allows you to easily scale your completion handling across multiple threads, though in this first IOCP example server we use a single thread so as to allow us to compare the performance against the polled and event driven servers. The next example server will adapt this server for multiple threads and allow us to scale our completion processing across more CPUs.
This article presents the second in my series of example servers using the Windows 8 Registered I/O Networking extensions, RIO. This example server uses the event driven notification method to handle RIO completions. I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

Using an event for RIO completions

As I mentioned back in October, there are three ways to receive completion notifications from RIO; polling, event driven and via an I/O Completion Port. Using the event driven approach is similar to using the polling approach that I described in the previous article except that the server doesn't burn CPU in a tight polling loop.
I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

Polling RIO for completions

As I mentioned back in October, there are three ways to receive completion notifications from RIO; polling, event driven and via an I/O Completion Port. The first is the simplest though it burns CPU time even when no datagrams are being received.

At its simplest a polled RIO server obtains datagrams to process like this:
   RIORESULT results[RIO_MAX_RESULTS];

   ULONG numResults = 0;

   do
   {
      numResults = g_rio.RIODequeueCompletion(
         g_queue,
         results,
         RIO_MAX_RESULTS);

      if (0 == numResults)
      {
         YieldProcessor();
      }
      else if (RIO_CORRUPT_CQ == numResults)
      {
         ErrorExit("RIODequeueCompletion");
      }
   }
   while (0 == numResults);
You then loop over the results array and process each result in turn before looping back to dequeue more completions.

Getting to the point where you can call RIODequeueCompletion() takes a bit of setting up though...

Windows 8 Registered I/O Example UDP Servers

| 0 Comments
I've been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the "traditional" APIs so that I could compare performance.

RIO API demonstration

The examples are simple in that they do the bare minimum to demonstrate the APIs in question but they are configurable so that you can tune them to the hardware on which you're running them. You can run them to compare the maximum speed at which you can pull UDP datagrams off of the wire using each API and then adjust the examples so that they do a specific amount of "work" with each datagram to simulate a slightly more realistic scenario.

Simplified error handling

Error handling is limited, we display an error and exit the program, but we don't skip error checking, all API calls are checked for errors. The examples are each stand alone but can share two common header files. The first, Constants.h, contains all constants that are used to tune the examples. The second, Shared.h, contains inline helper functions which hide some of the complexity and allow the individual example programs to focus on the area of the API that they're demonstrating.

This is the index page

I will be blogging about the construction of the various examples over the next few weeks and updating this entry as an index page for all of the examples. I've listed the examples that I'll be talking about and I'll link to each blog post as they go live. Once I've presented the RIO examples I'll present the more traditional examples and finally some performance comparisons.

RIO server examples

  • RIO Polled UDP - A server which uses a single thread and a tight loop to poll for RIO completions.
  • RIO Event Driven UDP - A server which uses a single thread and event driven notifications to handle RIO completions.
  • RIO IOCP UDP - A server which uses a single thread and I/O Completion Port notifications to handle RIO completions.
  • RIO IOCP MT UDP - A server which uses a configurable number of threads and I/O Completion Port notifications to handle RIO completions.

Traditional server examples

  • Simple Polled UDP - A server which uses a single thread and a tight loop to poll WSARecv() for datagrams.
  • IOCP UDP - A server which uses a single thread and I/O Completion Port notifications to handle overlapped WSARecv() completions.
  • IOCP MT UDP - A server which uses a configurable number of threads and I/O Completion Port notifications to handle overlapped WSARecv() completions.

A simple UDP datagram traffic generator

  • Simple UDP traffic generator - A client which uses a single thread and a tight loop send datagrams using WSASendTo(), this easily saturates a 1000BASE-T, 1Gb ethernet connection.

Test scripts

  • Test scripts - These simple scripts create performance counter logs and run the test servers.

Performance Test results

  • The first tests - Where we compare the simple polled traditional server with the polled RIO server.

Join in

Comments and suggestions are more than welcome. I'm learning as I go here and I'm quite likely to have made some mistakes or come up with some erroneous conclusions, feel free to put me straight and help make these examples better.

Follow us on Twitter: @ServerFramework

About this Archive

This page is an archive of entries from March 2012 listed from newest to oldest.

February 2012 is the previous archive.

May 2012 is the next archive.

I usually write about the development of The Server Framework, a super scalable, high performance, C++, I/O Completion Port based framework for writing servers and clients on Windows platforms.

Find recent content on the main index or look in the archives to find all content.