The WebSocket protocol

2010-12-09

I’ve spent the last few days implementing the WebSocket protocol (well, two versions of the draft standard actually) and integrating it into an existing server for one of our clients. This has proved to be an interesting exercise. The protocol itself is pretty simple but, as ever, the devil is in the detail. I now have server side code that deals with both the Hixie 76 draft and the HyBi 03 draft of the protocol. Once the initial handshake (which is pretty much just HTTP) is out of the way the two drafts deal in terms of frames of data rather than a simple byte stream. The library that I’ve developed accumulates these frames in an I/O buffer frames until they’re complete and then dispatches them to the layer of code above by using a callback interface. Thus your server simply sits and waits for frames to arrive and then sends out frames of its own.

The callback interface for the HyBi 03 draft looks something like this:

class IWebSocketServer
{
   public :

      virtual void OnConnectionEstablished(
         IWebSocket &socket,
         const std::string &uri,
         const bool secure,
         const CHeaders &requestHeaders,
         CHeaders &responseHeaders) = 0;

      virtual void OnRawTextFrame(
         IWebSocket &socket,
         JetByteTools::IO::IBuffer &buffer,
         const FrameStatus status) = 0;

      virtual void OnTextFrame(
         IWebSocket &socket,
         const std::wstring &text,
         const FrameStatus status) = 0;

      virtual void OnBinaryFrame(
         IWebSocket &socket,
         JetByteTools::IO::IBuffer &buffer,
         const FrameStatus status) = 0;

      virtual void OnPingResponse(
         IWebSocket &socket,
         const BYTE *pData,
         const BYTE length) = 0;

      virtual void OnClientClose(
         IWebSocket &socket,
         const BYTE *pData,
         const BYTE length) = 0;

   protected :

      ~IWebSocketServer() {}
};

The IWebSocket interface wraps up the underlying IStreamSocket and provides the ability to write binary and text frames, it deals with the framing internally for you so you don’t have to worry about it.

You can choose to have text frames delivered as raw UTF-8 encoded bytes or as a pre-decoded wide string, fragmented frames can be automatically coalesced and presented as a single frame if required and the various control frames are handled for you where possible.

The code is likely to be made available as an option pack with version 6.4. I have a example server running the Hixie 76 draft with both ws: and wss: endpoints and will be putting together a HyBi 03 example over the next few days. The library will, of course, track the ongoing standardisation work.

The main complexity in my implementation comes from the fact that it’s nice and user friendly for users of The Server Framework to present the data frames without their framing and in an I/O buffer rather than as a raw byte pointer and a length. Providing the frames in an I/O buffer means that code built on top of this code can take advantage of the reference counted nature of the I/O buffer and can pass the data frames around for further processing without needing to allocate, copy and free memory. The fact that I’ve been working on performance critical code for another client recently has me in a performance mindset and so dealing with the frame accumulation, and header/trailer removal without unnecessary data copying or allocations was at the front of my mind.

Unfortunately the I/O buffer implementation that has served the framework well for over 10 years could benefit from a redesign to make it more flexible. Up until now it has been acceptable to me to occasionally have to copy memory blocks around when accumulating protocol frames. For example, if we have a frame with contains a 4 byte protocol header and 200 bytes of data it’s sometimes nice to strip the header before presenting it to the user of the actual data. This can either be achieved by moving the 200 bytes back by 4 bytes to the physical front of the buffer or by moving the logical front of the buffer forward by 4 bytes… For my work with WebSockets I’ve opted for the later, and adjusted the buffer interface to suit. This avoids a memory copy in favour of an offset update. Unfortunately, however, the opposite operation, adding a 4 byte header to a buffer that already contains 200 bytes of data is less easy to optimise with my current buffer design. We either have to accept a memory copy or allocate a whole new buffer for the header and then send that before the data buffer; neither is ideal. What would, I think, be better would be to be able to take advantage of the underlying network API’s WSABUF design and build a logical buffer chain within a single physical buffer; I already have buffer implementations that build physical buffer chains from multiple discrete buffers but what I’d like to be able to do is take a buffer which has space at the end of it and use that space for a header by building a logical chain which has the header area specified before the currently used data portion… I expect these designs, which are currently just a pile of scribbles, wont see the light of day before version 7.0, partly because they will be quite serious breaking changes and partly because the existing buffer design is deeply embedded into the entire framework…