Windows 8 Registered I/O - Single threaded RIO Event Driven UDP Example Server

2012-03-09

Page content

This article presents the second in my series of example servers using the Windows 8 Registered I/O Networking extensions, RIO. This example server uses the event driven notification method to handle RIO completions. I’ve been looking at the Windows 8 Registered I/O Networking Extensions since October when they first made an appearance as part of the Windows 8 Developer Preview. Whilst exploring and understanding the new API I spent some time putting together some simple UDP servers using the various notification styles that RIO provides. I then put together some equally simple UDP servers using the “traditional” APIs so that I could compare performance. This series of blog posts describes each of the example servers in turn. You can find an index to all of the articles about the Windows 8 Registered I/O example servers here.

Using an event for RIO completions

As I mentioned back in October, there are three ways to receive completion notifications from RIO; polling, event driven and via an I/O Completion Port. Using the event driven approach is similar to using the polling approach that I described in the previous article except that the server doesn’t burn CPU in a tight polling loop.

Creating an event driven RIO completion queue

We start by initialising things in the same way that we did with the earlier example RIO servers.

int _tmain(int argc, _TCHAR* argv[])
{
   SetupTiming("RIO Event Driven UDP");

   InitialiseWinsock();

   CreateRIOSocket();

   HANDLE hEvent = WSACreateEvent();

   if (hEvent == WSA_INVALID_EVENT)
   {
      ErrorExit("WSACreateEvent");
   }

   RIO_NOTIFICATION_COMPLETION completionType;

   completionType.Type = RIO_EVENT_COMPLETION;
   completionType.Event.EventHandle = hEvent;
   completionType.Event.NotifyReset = TRUE;

   g_queue = g_rio.RIOCreateCompletionQueue(
      RIO_PENDING_RECVS,
      &completionType);

   if (g_queue == RIO_INVALID_CQ)
   {
      ErrorExit("RIOCreateCompletionQueue");
   }

Once that is done we create an event and then create a RIO completion queue which uses the event for notification. The event is signalled when there are completions to process and reset when we call RIONotify().

Creating a RIO request queue

Creating the request queue and posting our receives is identical to the polled example. The only difference is how we handle the completions.

Calling RIODequeueCompletion() and processing results

The processing loop is, again, similar to the polled example. Unsurprisingly rather than polling we wait on the event and dequeue the completions once the event is set. This reduced the amount of CPU used as there’s no need to spin whilst waiting for new datagrams to process. The only complication is that we need to call RIONotify() to indicate that we’re ready to process more completions. Note that in a real server you would probably want to wait on your completions available event and a ‘we’re ready to shut down’ event so that you can shut the sever down cleanly.

   bool done = false;

   DWORD recvFlags = 0;

   RIORESULT results[RIO_MAX_RESULTS];

   const INT notifyResult = g_rio.RIONotify(g_queue);

   if (notifyResult != ERROR_SUCCESS)
   {
      ErrorExit("RIONotify");
   }

   const DWORD waitResult = WaitForSingleObject(
      hEvent,
      INFINITE);

   if (waitResult != WAIT_OBJECT_0)
   {
      ErrorExit("WaitForSingleObject");
   }

   ULONG numResults = g_rio.RIODequeueCompletion(
      g_queue,
      results,
      RIO_MAX_RESULTS);

   if (0 == numResults ||
       RIO_CORRUPT_CQ == numResults)
   {
      ErrorExit("RIODequeueCompletion");
   }

   StartTiming();

   int workValue = 0;

   bool running = true;

   do
   {
      for (DWORD i = 0; i < numResults; ++i)
      {
         EXTENDED_RIO_BUF *pBuffer = reinterpret_cast<EXTENDED_RIO_BUF *>(results[i].RequestContext);

         if (results[i].BytesTransferred == EXPECTED_DATA_SIZE)
         {
            g_packets++;

            workValue += DoWork(g_workIterations);

            if (!g_rio.RIOReceive(
               g_requestQueue,
               pBuffer,
               1,
               recvFlags,
               pBuffer))
            {
               ErrorExit("RIOReceive");
            }

            done = false;
         }
         else
         {
            done = true;
         }
      }

      if (!done)
      {
         const INT notifyResult = g_rio.RIONotify(g_queue);

         if (notifyResult != ERROR_SUCCESS)
         {
            ErrorExit("RIONotify");
         }

         const DWORD waitResult = WaitForSingleObject(
            hEvent,
            INFINITE);

         if (waitResult != WAIT_OBJECT_0)
         {
            ErrorExit("WaitForSingleObject");
         }

         numResults = g_rio.RIODequeueCompletion(
            g_queue,
            results,
            RIO_MAX_RESULTS);

         if (0 == numResults ||
             RIO_CORRUPT_CQ == numResults)
         {
            ErrorExit("RIODequeueCompletion");
         }
      }
   }
   while (!done);

   StopTiming();

   PrintTimings();

   return workValue;
}

As before, the structure of the processing loop is complicated somewhat by the fact that we want to start and stop the timing for the performance testing, and the DoWork() function can be used to add ‘processing overhead’ to each datagram. This can be configured using the g_workIterations which is defined in Constants.h. With this set to 0 there is no overhead and we can compare how quickly each API can receive datagrams. Setting larger values will affect how the various multi-threaded examples perform and can be useful if you’re unable to saturate the test machine’s network interfaces.

This example can be optimised slightly so that we revert to straight polling as long as calling RIODequeueCompletion() returns us at least one result. We’ll look at this variation after we’ve studied the performance of the example shown here.

The code for this example can be downloaded from here. This code requires Visual Studio 11, but would work with earlier compilers if you have a Windows SDK that supports RIO. Note that Shared.h and Constants.h contain helper functions and tuning constants for ALL of the examples and so there will be code in there that is not used by this example. You should be able to unzip each example into the same directory structure so that they all share the same shared headers. This allows you to tune all of the examples the same so that any performance comparisons make sense.

Join in

Comments and suggestions are more than welcome. I’m learning as I go here and I’m quite likely to have made some mistakes or come up with some erroneous conclusions, feel free to put me straight and help make these examples better.

Code is here

Code - updated 15th April 2023

Full source can be found here on GitHub.

This isn’t production code, error handling is simply “panic and run away”.

This code is licensed with the MIT license.