HN2new | past | comments | ask | show | jobs | submitlogin
Writing a Simple Decompiler for .NET, Part 1 (codeandux.com)
62 points by zerratar on July 26, 2015 | hide | past | favorite | 16 comments


If that's a subject you find interesting, you can also read the source for two OSS .NET decompilers:

ILSpy: https://github.com/icsharpcode/ilspy

JustDecompile: https://github.com/telerik/JustDecompileEngine/

Both are based on the same library that is used in the post: Mono.Cecil (https://github.com/jbevain/cecil).


Hi jbevain :-) Nice to see you here! Having a tiny switch to a new blog engine, but next part should come up soon enough. Have a nice day, and once more. You rock!


Thanks! Looking forward to reading the next posts in the series!


Stack-based, high-level VMs like CLR, JVM, and Flash's AcriptScript are certainly quite easy to decompile, although I think this article unfortunately misses the point - it's full of (rather verbose) code, but little explanation. From what I can see it's very fragile too - it attempts to match exact instruction sequences so won't work for anything even slightly different from what's presented. This is equivalent to the test() method given, but won't get decompiled correctly:

    ldc.i4.4
    ldarg.0
    call System.Int32 Test1.AwesomeClass::c()
    starg.s b
    starg.s a
The right way to decompile a stack-based language requires keeping track of what's on the stack, building expressions instead of evaluating values.

That InstructionHelper class also looks like it could be rewritten more clearly...


The standard way to resolve distinct variables is SSA-based decompilation. I've only worked with decompiling Java bytecode, so I don't know how the CLR works, but in Java, the compiler definitely reuses the local variable slots.

There's also no discussion of type inferencing for variables, parenthesizing expression DAGs properly. I suspect properly decompiling control flow would be in part 2, but I'd be surprised if that were anywhere near robust, based on the quality demonstrated so far. Which is sad because this sort of decompilation has been practically demonstrated and solved for, oh, 10-20 years.


Are you and 'userbinator saying the same thing? I can't tell. I know how the simple symbolic stack->expression evaluation works, and it happens that in my code I generate something pretty close to SSA expressions, but does SSA do something else profound for decompilation?


SSA abstracts the stack away, and allows to reason about types much easier.


I'm not sure I'm following. To get from stack operations to expressions, I just symbolically evaluate the stack, creating temporary variables as I go. It happens that the resulting IR is pretty much SSA form. But I'm not taking much else from SSA. I'm wondering if I'm missing opportunities.


It's easier to transform your expressions into a useful form from a guaranteed, proper SSA than from a simple tree representation. For example, an induction variable extraction is totally trivial in SSA, and you really need do to it if you want to reconstruct nice looking `for` loops.

It also pays well to have distinct basic blocks - loop analysis is much easier then.


This is helpful. But I read it and think, for instance, "distinct basic blocks aren't SSA"; compilers worked in terms of CFGs before SSA existed. :)

Again this is more about my lack of confidence about fully grokking the implications of SSA; I'm not nerd-sniping.


Of course, you can have basic blocks without an SSA. It's just another feature that was missing from the article that was worth mentioning.

Another thing you'll get for free from an SSA - nice ternary expressions reconstructed (even if the original code was using ifs).


Thank you for your comment! :-)

Haha, yep I do agree that I did not cover all important aspects of writing a decompiler. Neither did I use a stack based solution to tackle the problem.

A reminder though, the idea behind this post was to cover a little bit of everything, just trying to make it as simple as possible. This is not a fully fledged decompiler and will not decompile everything. It is to give an idea on how CIL works, how to use Mono.Cecil, and just hacking away!

Next part of the tutorial DO actually manage the stack to try and create a more complex solution. Together with code refactoring and more.

And I'm sorry for any information that I've might have forgotten and/or for any poorly written code. I will try and do better next time. Yet I hope you still like the article.

/zerratar


If this is of interest to you then you will most likely find the the recent .NET Core Design API review on ILDASM interesting as well: https://www.youtube.com/watch?v=HuRc6CpiOVg


For something with "UX" in the name, it's a surprisingly bad layout. Massive waste of screen width, and code boxes forcing me to scroll sideways even as acres of empty space sits there unused.


Ah, I see it's been partially fixed. The source code sections no longer require scrolling sideways to see it all, at least.


I really apologize for that, we were totally taken by surprise by all the attention and around 2AM i saw your (very valid) reply and was like oh snap! It's not a perfect fix, but I'll try and improve it as soon as I can. And thanks Eli.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: