How will you make it if you never even try?

June 10, 2008

Speed up your XML code with OuterXmlCachingXmlDocument

Filed under: Performance — Tags: , , — charlieflowers @ 7:09 pm

Transitioning from a string to XmlDocument and vice versa is very slow

Recently I worked on a project which involved some performance profiling. We used tools like Red Gate’s profiler, the free .NET CLR profiler from Microsoft, and the AutomatedQa profiler. These profilers made one thing very clear — transitioning from an XmlDocument representation of XML to a text representation, and vice versa, is very expensive and slow.

However, in this particular project, we had no choice. Our code was building credit reports, which means our original input was XML and our final output was XML (both the MISMO XML format). At various points scattered through the processing of a request, we had to call legacy components for specific tasks. Most of those legacy components wanted XML text (not a DOM) as input, and they also returned their output as XML text. But the rest of our code wanted to work with the XML as a DOM (ie, an XmlDocument), so that we could navigate, set properties, use XPATH, etc.

So, we had no choice but to transition from a large text string to an XmlDocument and then back again, over and over. I said before this is slow, but I want to make sure you understand I mean very slow! You’d be surprised. It is slow because a) it is a big job for the computer to do, and b) because it generates tons of little objects and therefore causes garbage collection overhead.

Caching the OuterXml representation

After thinking about this for a while, I realized that it would help tremendously if we could just make XmlDocument “cache” its textual representation. In other words, when you load an XmlDocument from a string, I wanted XmlDocument to “remember” that string. And as long as no one had made any changes to the XmlDocument in any way, the XmlDocument would merely return that string every time you call OuterXml. But the minute someone makes a change to the XmlDocument, the XmlDocument now knows it no longer has a valid string representation. The next time you call OuterXml, the XmlDocument would go through the big, expensive process of creating the textual representation … but then it would “remember” it again, until the next time that some change invalidates it.

And it turns out, this was fairly simple to build.

Presenting the “OuterXmlCachingXmlDocument”

I called it the “OuterXmlCachingXmlDocument” because it is an XmlDocument that caches the “OuterXml”.

It inherits from XmlDocument and does the following:

  • Overrides Load() and LoadXml() — these methods let you load XML into an XmlDocument. They both find a string of XML text from somewhere (a file or stream, a variable, etc.). They have been overridden to store that string in an instance variable before performing the load operation. They also then register event listeners for XmlDocument’s three “Changed” events — NodeChanged, NodeInserted, and NodeRemoved. Those change events will tell us when the string representation that we have cached becomes invalid.
  • Overrides OuterXml — this is a property that returns the string representation. In XmlDocument, its implementation performs the expensive process of walking the linked list of objects in the DOM and creating a string representation. We have overridden it to first see if we have a valid cached version of the xml string. If so, we just return it! If not, then we have to let the base class do the expensive conversion … BUT! The good news is, once that expensive process has been done, we now have a valid string representation again! So we cache it in the same instance variable again.
  • Handles event notifications for NodeChanged, NodeInserted, and NodeRemoved — if any of these events is fired, we need to dump our cached string representation. We don’t recalculate a new string representation at this time, because avoiding that is why we’re here in the first place! We simply “make a note” that we no longer have a cached string representation. Also, and this is very important, when any of these events fire, we de-register our listener from the NodeChanged, NodeInserted, and NodeRemoved events! Otherwise, all DOM operations that change the XmlDocument will incur the overhead of calling us for no reason.

That’s really all there is to it. It is simple, and it simply “silently” replaces your XmlDocument usages. You can feel free to use it everywhere instead of XmlDocument — it is completely compatible. It made a very noticeable improvement to our performance, and if you’re transitioning a lot between DOM and text, it will likely help you quite a bit as well.

Advertisements

2 Comments »

  1. Hi Charlie,

    Where can I find the code for the OuterXmlCachingXmlDocument class ?

    Regards,

    Jan

    Comment by Jan — November 27, 2013 @ 10:32 am

    • Jan,

      Unfortunately, I never actually posted the code — just the description of how to do it. The code was owned by my client at the time. And I just searched and it turns out I no longer have it.

      But the description in the blog post pretty much covers it. If you inherit XmlDocument, do the overrides I mentioned, and handle the events I mentioned, that should do it or at least get you 98% of the way there. If you give it a whirl and have any questions, I’d be happy to respond.

      Regards,
      Charlie

      Comment by charlieflowers — November 28, 2013 @ 8:52 am


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: