HackWeek V - Jonathan Pryor's web log

« Defending XML-based Build Systems | Main

HackWeek V

Last week was HackWeek V, during which I had small goals, yet had most of the time eaten by unexpected "roadblocks."

The week started with my mis-remembering OptionSet behavior. I had thought that there was a bug with passing options containing DOS paths, as I thought the path would be overly split:

string path = null;
var o = new OptionSet () {
	{ "path=", v => path = v },
};
o.Parse (new[]{"-path=C:\path"});

Fortunately, my memory was wrong: this works as expected. Yay.

What fails is if the option supports multiple values:

string key = null, value = null;
var o = new OptionSet () {
	{ "D=", (k, v) => {key = k; value = v;} },
};
o.Parse (new[]{"-DFOO=C:\path"});

The above fails with a OptionException, because the DOS path is split, so OptionSet attempts to send 3 arguments to an option expecting 2 arguments. This isn't allowed.

The patch to fix the above is trivial (most of that patch is for tests). However, the fix didn't work at first.

Enter roadblock #1: String.Split() can return too many substrings. Oops.

So I fixed it. That only killed a day...

Next up, I had been sent an email showing that OptionSet had some bugs when removing by index. I couldn't let that happen...and being in a TDD mood, I first wrote some unit tests to describe what the IList<T> semantics should be. Being in an over-engineering mood, I wrote a set of "contract" tests for IList<T> in Cadenza, fixed some Cadenza bugs so that Cadenza would pass the new ListContract, then merged ListContract with the OptionSet tests.

Then I hit roadblock #2 when KeyedCollection<TKey, TItem> wouldn't pass my ListContract tests, as it wasn't exception safe. Not willing to give up on ListContract, I fixed KeyedCollection so it would now pass my ListContract tests, improving compatibility with .NET in the process, which allowed me to finally fix the OptionSet bugs.

I was then able to fix a mdoc export-html bug in which index files wouldn't always be updated, before starting to investigate mdoc assemble wanting gobs of memory.

While pondering how to figure out why mdoc assemble wanted 400MB of memory, I asked the folks on ##csharp on freenode if there were any Mono bugs preventing their SpikeLite bot from working under Mono. They kindly directed me toward a bug in which AppDomain.ProcessExit was being fired at the wrong time. This proved easier than I feared (I feared it would be beyond me).

Which left me with pondering a memory "leak." It obviously couldn't be a leak with a GC and no unmanaged memory to speak of, but what was causing so much memory to be used? Thus proceeded lots of Console.WriteLine(GC.GetTotalMemory(false)) calls and reading the output to see where the memory use was jumping (as, alas I found Mono's memory profiler to be less than useful for me, and mono's profiler was far slower than a normal run). This eventually directed me to the problem:

I needed, at most, two XmlNode values from an XmlDocument. An XmlDocument loaded from a file that could be very small or large-ish (0.5MB). Thousands of such files. At once.

That's when it dawned on me that storing XmlNodes in a Dictionary loaded from thousands of XmlDocuments might not be such a good idea, as each XmlNode retains a reference to the XmlDocument it came from, so I was basically copying the entire documentation set into memory, when I only needed a fraction of it. Doh!

The fix was straightforward: keep a temporary XmlDocument around and call XmlDocument.ImportNode to preserve just the data I needed.

Memory use plummeted to less than one tenth what was previously required.

Along the way I ran across and reported an xbuild bug (since fixed), and filed a regression in gmcs which prevented Cadenza from building.

Overall, a productive week, but not at all what I had originally intended.

Posted on 15 Jun 2010 | Path: /development/mono/ | Permalink