March 2013 Archives

I've been working on a "big" new release for some time, too long actually. It has steadily been accumulating new features for over a year but the arrival of my second son in July last year and masses of client work has meant that it has repeatedly been pushed on the back burner. Well, no more, Release 6.6 is now in the final stages of development and testing (so I won't be adding more new features) and hopefully will see a release in Q2

I'm planning a "what's new in 6.6" blog posting which will detail all of the major changes but first I'd like to show you the results of some performance tuning that I've been doing. Most people are familiar with the quote from Donald Knuth, "premature optimization is the root of all evil", and it's often used as a stick to beat people with when they want to tweak low level code to "make things faster". Yet the full quote is more interesting; "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.". I like to think that much of what I've done in 6.6 is in that 3% but even if it isn't it's still optimisation that comes for free to users of the framework. What's more, the first change also removes complexity and makes it much easier to write correct code using the framework.

Since explaining the changes is pretty heavy going lets jump to the pictures of the results first.

Performance of a 6.5.9 Open SSL Server

This is the "before" graph; a v6.5.9 Open SSL Server. 6.5.9-OpenSSLServerPerf.png

Performance of a 6.6 Open SSL Server

And this is the "after" graph; a v6.6 server. 6.6-OpenSSLServerPerf.png The important things are the red and purple lines (bytes processed/sec) where higher values are better. The next important are the faint dotted lines (thread context switches) where lower values are better.

What the graphs above show is that with the new changes in 6.6 a server can process more data in less time with fewer thread context switches.

These tests were run on a dual quad-core Xeon E5320 and pushed the box to around 80% cpu usage; the 6.6 test using slightly more cpu but for much less time. The results have been far less dramatic, but still positive, on our Core i7-3930K (12 core), but it's hard to push it above 30% cpu utilisation.