HN2new | past | comments | ask | show | jobs | submitlogin

The standard way to resolve distinct variables is SSA-based decompilation. I've only worked with decompiling Java bytecode, so I don't know how the CLR works, but in Java, the compiler definitely reuses the local variable slots.

There's also no discussion of type inferencing for variables, parenthesizing expression DAGs properly. I suspect properly decompiling control flow would be in part 2, but I'd be surprised if that were anywhere near robust, based on the quality demonstrated so far. Which is sad because this sort of decompilation has been practically demonstrated and solved for, oh, 10-20 years.



Are you and 'userbinator saying the same thing? I can't tell. I know how the simple symbolic stack->expression evaluation works, and it happens that in my code I generate something pretty close to SSA expressions, but does SSA do something else profound for decompilation?


SSA abstracts the stack away, and allows to reason about types much easier.


I'm not sure I'm following. To get from stack operations to expressions, I just symbolically evaluate the stack, creating temporary variables as I go. It happens that the resulting IR is pretty much SSA form. But I'm not taking much else from SSA. I'm wondering if I'm missing opportunities.


It's easier to transform your expressions into a useful form from a guaranteed, proper SSA than from a simple tree representation. For example, an induction variable extraction is totally trivial in SSA, and you really need do to it if you want to reconstruct nice looking `for` loops.

It also pays well to have distinct basic blocks - loop analysis is much easier then.


This is helpful. But I read it and think, for instance, "distinct basic blocks aren't SSA"; compilers worked in terms of CFGs before SSA existed. :)

Again this is more about my lack of confidence about fully grokking the implications of SSA; I'm not nerd-sniping.


Of course, you can have basic blocks without an SSA. It's just another feature that was missing from the article that was worth mentioning.

Another thing you'll get for free from an SSA - nice ternary expressions reconstructed (even if the original code was using ifs).


Thank you for your comment! :-)

Haha, yep I do agree that I did not cover all important aspects of writing a decompiler. Neither did I use a stack based solution to tackle the problem.

A reminder though, the idea behind this post was to cover a little bit of everything, just trying to make it as simple as possible. This is not a fully fledged decompiler and will not decompile everything. It is to give an idea on how CIL works, how to use Mono.Cecil, and just hacking away!

Next part of the tutorial DO actually manage the stack to try and create a more complex solution. Together with code refactoring and more.

And I'm sorry for any information that I've might have forgotten and/or for any poorly written code. I will try and do better next time. Yet I hope you still like the article.

/zerratar




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: