Windows 8 Registered I/O - Traditional Polled UDP Example Server

Page content

This article presents the fifth in my series of example servers for comparing the performance of the Windows 8 Registered I/O Networking extensions, RIO, and traditional Windows networking APIs. This example server is a traditional polled UDP design that we can use to compare to the RIO polled UDP example server. I’ve been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the “traditional” APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

A traditional polled UDP server

This server is probably the simplest UDP server you could have. It’s pretty much just a tight loop around a blocking call to WSARecv(). There’s none of the complexity required by RIO for registering memory buffers for I/O and so we use a single buffer that we create on the stack.

   do
   {
      workValue += DoWork(g_workIterations);

      if (SOCKET_ERROR == ::WSARecv(
         s,
         &buf,
         1,
         &bytesRecvd,
         &flags,
         0,
         0))
      {
         ErrorExit("WSARecv");
      }

      if (bytesRecvd == EXPECTED_DATA_SIZE)
      {
         g_packets++;
      }
      else
      {
         done = true;
      }
   }
   while (!done);

There is some added complexity to allow us to compare performance, and this is similar to the RIO server examples. We can add an arbitrary processing overhead to each datagram by setting g_workIterations to a non zero value and we count each datagram that arrives and stop the test when a datagram of an unexpected size is received.

Setting up for the datagram processing loop

As with the RIO examples we do some setup before we can process datagrams. See the polled RIO example server for details of how and why we set up the timing system and initialise Winsock, and for details on our error handling policy.

int _tmain(int argc, _TCHAR* argv[])
{
   SetupTiming("Simple polled UDP");

   InitialiseWinsock();

   SOCKET s = CreateSocket();

   Bind(s, PORT);

   SetSocketRecvBufferToMaximum(s);

   bool done = false;

   CHAR buffer[RECV_BUFFER_SIZE];

   WSABUF buf;

   buf.buf = buffer;
   buf.len = RECV_BUFFER_SIZE;

   DWORD bytesRecvd = 0;

   DWORD flags = 0;

   if (SOCKET_ERROR == ::WSARecv(
      s,
      &buf,
      1,
      &bytesRecvd,
      &flags,
      0,
      0))
   {
      ErrorExit("WSARecv");
   }

   g_packets++;

   StartTiming();

   int workValue = 0;

We then create a traditional blocking UDP socket, bind it to a port, set its receive buffer size to the maximum and create our receive buffer on the stack, set up our WSABUF and call WSARecv() for the first time. We make this call outside of our processing loop so that we can start timing when we get the first datagram. This code then proceeds into the processing loop, shown above, and processes datagrams until the test is complete and a datagram of an unexpected size is received.

The code for CreateSocket(), Bind() and SetSocketRecvBufferToMaximum() can be found in Shared.h and remember that the use of globals isn’t clever, it’s simply convenient for some of the other example servers that use the shared code.

After the processing loop

Once the performance test completes we stop our timing and report the results.

   StopTiming();

   PrintTimings();

   return workValue;
}

The code for this example can be downloaded from here. This code requires Visual Studio 11, but would work with earlier compilers if you have a Windows SDK that supports RIO. Note that Shared.h and Constants.h contain helper functions and tuning constants for ALL of the examples and so there will be code in there that is not used by this example. You should be able to unzip each example into the same directory structure so that they all share the same shared headers. This allows you to tune all of the examples the same so that any performance comparisons make sense.

Join in

Comments and suggestions are more than welcome. I’m learning as I go here and I’m quite likely to have made some mistakes or come up with some erroneous conclusions, feel free to put me straight and help make these examples better.

Code is here

Code - updated 15th April 2023

Full source can be found here on GitHub.

This isn’t production code, error handling is simply “panic and run away”.

This code is licensed with the MIT license.