UDP flow control and asynchronous writes

2014-07-22

I don’t believe that UDP should require any flow control in the sending application. After all, it’s unreliable and it should be quite OK for any stage of the route from one peer to another to decide to drop a datagram for any reason. However, it seems that, on Window’s at least, no datagrams will be dropped between the application and the network interface card (NIC) driver, no matter how heavily you load the system.

Unfortunately most NIC drivers also prefer not to drop datagrams, even if they’re overloaded (see here for details of how UDP checksum offloading can cause a NIC to make non-paged pool usage considerably higher). This can lead to situations where a user mode application can bring a box down due to non-paged pool exhaustion simply by sending as many datagrams as it can as fast as it can. It’s likely that it’s actually poorly implemented device drivers that are at fault here; by failing to gracefully handle situations where non-paged pool allocations fail, but it’s the application that is putting these drivers into a situation where they could fail in such a catastrophic manner.

Since the NIC driver and the operating system will not drop datagrams it’s down to the application itself to do so if it senses that it’s overloading the NIC. I’ve recently added code to The Server Framework to allow you to configure behaviour like this so that an application can prevent itself from exhausting non-paged pool due to pending datagram writes.

Some background

It may be useful for you to familiarise yourself with the reasons that we need flow control for TCP connections and the consequences of not having any when you’re using an asynchronous API.

With UDP there is no congestion control or windowing built into the protocol so the networking stack has less to do, it simply sends datagrams to the NIC driver which puts them out onto the network as quickly as it can. With an asynchronous API, such as the Windows I/O Completion Port API, the asynchronous datagram send operation is not completed until the NIC driver has finished with the datagram, at which point a completion queued to the IOCP and the application can reuse or release the memory buffer holding the datagram’s data. During the time between issuing the datagram send operation and receiving the completion both application allocated memory and operating system allocated non-paged pool are required.

If an application is sending datagrams faster than the NIC driver can put them onto the network then these completions will slow down and the length of time that both the memory and the non-paged pool are required will increase.

Implementation details

As with the TCP flow control system, a UDP system can be driven off of a count of the number of outstanding datagram send operations. Incrementing a counter when an operation is initiated and decrementing it only when the operation completes or fails provides us with such a counter. The UDP flow control system that I’ve implemented is considerably simpler than the TCP system as we don’t need to queue datagrams and send them when we can. In fact, such queuing could cause more problems due to a “buffer bloat” effect, reducing latency and increasing jitter. If we assume that whoever is using UDP is aware that they may have some datagrams discarded then we can simply discard datagrams when our counter gets too high.

Rather than providing a single limit for the number of outstanding send operations I’ve implemented the system with an upper and lower limit. When the number of outstanding operations reaches the upper limit we are discarding 100% of all datagrams that the application tries to send. When the number of operations is less than the lower limit we discard no datagrams. When the count is between the two limits we discard in proportion to its position in the limit range, so when it’s half way between the lower and upper limit we discard 50% of datagrams and when it’s three-quarters of the way to the upper limit we discard 75%. You can, of course, set the lower limit the same as the upper limit and then there is no gentle scaling of our discard rate, we go straight to 100%.

The UDP flow control filter can be configured per socket, but this is rarely useful if you are creating multiple UDP sockets. The counter can be shared between all sockets on a particular connection manager, which is useful if you have multiple NICs, or between all connections.

This functionality will be available in Release 6.6.2 of The Server Framework which will be released shortly.