You might be interested in the Parsoid project, which Wikimedia funded precisely...

You might be interested in the Parsoid project, which Wikimedia funded precisely to try to build a more maintainable pipeline for parsing syntax into some kind of AST or document model, disentangled from the other cruft: http://www.mediawiki.org/wiki/Parsoid

I believe it's currently being maintained separately, so the logic in MediaWiki itself is still the crufty version.