How will you make it if you never even try?

June 10, 2008

Speed up your XML code with OuterXmlCachingXmlDocument

Filed under: Performance — Tags: , , — charlieflowers @ 7:09 pm

Transitioning from a string to XmlDocument and vice versa is very slow

Recently I worked on a project which involved some performance profiling. We used tools like Red Gate’s profiler, the free .NET CLR profiler from Microsoft, and the AutomatedQa profiler. These profilers made one thing very clear — transitioning from an XmlDocument representation of XML to a text representation, and vice versa, is very expensive and slow.

However, in this particular project, we had no choice. Our code was building credit reports, which means our original input was XML and our final output was XML (both the MISMO XML format). At various points scattered through the processing of a request, we had to call legacy components for specific tasks. Most of those legacy components wanted XML text (not a DOM) as input, and they also returned their output as XML text. But the rest of our code wanted to work with the XML as a DOM (ie, an XmlDocument), so that we could navigate, set properties, use XPATH, etc.

So, we had no choice but to transition from a large text string to an XmlDocument and then back again, over and over. I said before this is slow, but I want to make sure you understand I mean very slow! You’d be surprised. It is slow because a) it is a big job for the computer to do, and b) because it generates tons of little objects and therefore causes garbage collection overhead.

Caching the OuterXml representation

After thinking about this for a while, I realized that it would help tremendously if we could just make XmlDocument “cache” its textual representation. In other words, when you load an XmlDocument from a string, I wanted XmlDocument to “remember” that string. And as long as no one had made any changes to the XmlDocument in any way, the XmlDocument would merely return that string every time you call OuterXml. But the minute someone makes a change to the XmlDocument, the XmlDocument now knows it no longer has a valid string representation. The next time you call OuterXml, the XmlDocument would go through the big, expensive process of creating the textual representation … but then it would “remember” it again, until the next time that some change invalidates it.

And it turns out, this was fairly simple to build.

Presenting the “OuterXmlCachingXmlDocument”

I called it the “OuterXmlCachingXmlDocument” because it is an XmlDocument that caches the “OuterXml”.

It inherits from XmlDocument and does the following:

  • Overrides Load() and LoadXml() — these methods let you load XML into an XmlDocument. They both find a string of XML text from somewhere (a file or stream, a variable, etc.). They have been overridden to store that string in an instance variable before performing the load operation. They also then register event listeners for XmlDocument’s three “Changed” events — NodeChanged, NodeInserted, and NodeRemoved. Those change events will tell us when the string representation that we have cached becomes invalid.
  • Overrides OuterXml — this is a property that returns the string representation. In XmlDocument, its implementation performs the expensive process of walking the linked list of objects in the DOM and creating a string representation. We have overridden it to first see if we have a valid cached version of the xml string. If so, we just return it! If not, then we have to let the base class do the expensive conversion … BUT! The good news is, once that expensive process has been done, we now have a valid string representation again! So we cache it in the same instance variable again.
  • Handles event notifications for NodeChanged, NodeInserted, and NodeRemoved — if any of these events is fired, we need to dump our cached string representation. We don’t recalculate a new string representation at this time, because avoiding that is why we’re here in the first place! We simply “make a note” that we no longer have a cached string representation. Also, and this is very important, when any of these events fire, we de-register our listener from the NodeChanged, NodeInserted, and NodeRemoved events! Otherwise, all DOM operations that change the XmlDocument will incur the overhead of calling us for no reason.

That’s really all there is to it. It is simple, and it simply “silently” replaces your XmlDocument usages. You can feel free to use it everywhere instead of XmlDocument — it is completely compatible. It made a very noticeable improvement to our performance, and if you’re transitioning a lot between DOM and text, it will likely help you quite a bit as well.

Blog at WordPress.com.