Sunday, July 25, 2010

WCF Gotcha: Binary and MTOM encodings not optimal for untyped scenarios

@YaronNaveh

WCF binary encoding efficiently writes Xml in a binary optimized format. For example, this text message

<s:Envelope …>
  <s:Header> 
    <!-- ws-addressing stuff… -->
   </s:Header> 
  <s:Body>
    <MyContract>
      <arr>
        <b:int>99999</b:int>
        <b:int>99999</b:int>
        <b:int>77777</b:int>
        <b:int>99999</b:int>
      </arr>
    </MyContract>
  </s:Body>
</s:Envelope>

is 1KB in the default text encoding but only half the size in binary (554 bytes). The reasons are various:
  • binary encoding knows to write int values as a one 4-bytes int and not as five 1-byte chars.
  • binary does not repeat the “int” element name more than once.
  • binary does not need to write known elements (e.g. Envelope, Body) but only the key in an optimized dictionary.
The difference is clear if we look at the text and binary messages in Fiddler’s HexView:

(click image to enlarge)

Text

Binary


This code snippet is the one I used to send the above message in a binary format:

[ServiceContract]
    public interface IUniversalContract
    {
        [OperationContract(Action = "*", ReplyAction = "*")]
        Message Send(Message message);
    }
    [DataContract]
    class MyContract
    {
        [DataMember]
        public int[] arr;          
    }
private static IUniversalContract GetChannel()
        {
            var binding = new CustomBinding();
            binding.Elements.Add(new BinaryMessageEncodingBindingElement());
            binding.Elements.Add(new HttpTransportBindingElement());
            var factory = new ChannelFactory<IUniversalContract>(binding);
            return factory.CreateChannel(new EndpointAddress("http://localhost:8888/"));
        }
static void Main(string[] args)
        {
var channel = GetChannel();
var obj = new MyContract { arr = new[] { 99999, 99999, 77777, 99999 } };                       
            Message msg = Message.CreateMessage(MessageVersion.Soap12WSAddressing10, "someAction", obj);
            channel.Send(msg);
}

You can see I have built a custom binding which uses the binary encoding. Then I have instantiated my data contract class and sent it.

But take a look at this code:

static void Main(string[] args)
        {
var channel = GetChannel(); //same method as above

var str =
                new StringReader(
                    @" <MyContract xmlns=""http://schemas.datacontract.org/2004/07/ConsoleApplication288""
xmlns:i=""http://www.w3.org/2001/XMLSchema-instance""><arr xmlns:b=""http://schemas.microsoft.com/2003/10/Serialization/Arrays""><b:int>99999</b:int>
<b:int>99999</b:int><b:int>77777</b:int>
<b:int>99999</b:int></arr></MyContract>");
          
            XmlReader reader = XmlReader.Create(str);
            Message msg = Message.CreateMessage(MessageVersion.Soap12WSAddressing10, "someAction", reader);
            channel.Send(msg);
}

It sends the same message on the wire but uses a raw xml message instead of a live object. This is common in routing scenarios, in cases where the message is manually handcrafted and in places where message is taken from an external source.

Let’s see how this message looks like in Fiddler:



its size is 577 bytes which means it has 23 extra bytes over the object-based binary message. While this seems negligible, had we sent more array elements, and had the integer values require additional digits, the difference would be very noticeable, especially for an optimized format. For example, with just 50 items in the array the second message was already 2.5 times larger the the fully optimized one.

So from where does this difference come from? when analyzing the raw messages we find two interesting differences:
  • In the less optimized message the text “int” repeats a few times where in the optimized one it only appears once. As far as I understand this text here represents the array element name and not the fact that the type is an int.
  • In the less optimized message the array values are written as five 1-byte chars (=5 bytes per value) whereas in the optimized message they are a one 4-byte integer (=4 bytes per value). For higher integers the difference is bigger and for long arrays the difference is huge.
The second difference is understandable – the binary encoder does not have type information since it only gets a big string as input. It then cannot optimize integers. I’m not really sure however as for the first issue – maybe the reason is that for integers we do not need array items separator as values are of fixed types whereas for strings we need to know when the string has ended. But still I would expect a more optimized separator so I’m not really sure about this one.


Conclusion

Wcf binrary encoding is a popular optimization technique, especially ever since it became available to Silverlight applications. However, When untyped messages are used, binary encoded messages are less optimal than in the typed scenarios. They are still more optimal than regular text encoded messages. The same corollary can be reached for MTOM scenarios as they use the same optimization technique. If you use such messages (usually in routing scenarios) you should be aware of this and consider the different trade-offs.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

0 comments: