HN2new | past | comments | ask | show | jobs | submit | pfez's commentslogin

> I do like the appeal of a recompileable target language. But that language need not be C.

Hey! Thanks for the very interesting feedback!

I also strongly feel the appeal of having a decompiler emit a recompilable language. But I want to stress that's not just appealing for it's own sake. It opens up the possibility of consumption by other tools, which is a great opportunity.

Basically, until the decompiler only emits some half-baked pseudocode that looks like C and humans can understand, that "language" is only an output format. It's the end of the journey from the binary. You can look at it, you can reason about it, you can even edit it change types and rename stuff, but its final purpose (and the only purpose of any adjustments you do to it) is for human consumption and understanding.

Don't get me wrong, human understanding is great, but it has shortcomings, and it doesn't scale.

On the other hand, the very moment a decompiler starts emitting decompiled code in a language that is parsable from other tools, its output stops being the end of the journey. In a way, it becomes yet another intermediate language, at a different level of abstraction, that can be consumed by other tools. Think any static analysis tool that usually requires having access to the source code, except now you can throw the decompiled code at it and get useful information about your binary.

And not hypothetically speaking. At rev.ng we have a PoC where we detect memory bugs like use-after-free in a binary, without access to the original source code, but using CodeQL or clang-static-analyzer on the decompiled C code. With all the nice reports that usually come with these tools, telling you the conditions that must be verified during the execution in order for the bug to be triggered. So, it is entirely possible to use C-bases source-level static analysis tools to automate at least some part of the grinding analysis job on a binary.

Take this with a grain of salt. It's a PoC. We haven't realeased it and it's not production grade yet, even if we're planning to show it around :) Also, I'm definitely not saying that's a silver bullet for every problem, or that it can solve stuff at every level of abstraction. But it's to make a point: decompiling to a recompilable language is a great opportunity to tap the potential of the analysis tools available for that language.

And if that's a direction you want to go, it suddenly becomes very important that the language you decompile to has a large pool of powerful robust and battle-tested static analysis tools. That's definitely true for C, not so much for a custom language you roll on your own. Which is not to say your custom language isn't good, but AFAIU from your message you are designing it basically for being able to better read LLVM IR yourself without going crazy. So it seems to me to be something designed for your own eyes and mind, not for mass consumption form other analysis tools. And even if it turns out to be good for consumption by other tools, it's hard to beat the amount of engineering effort that has been put into static analysis tools for C, that already available off the shelf.

So, all in all, I totally agree with you on the appeal of a recompilable target language. On that language being C or not, I really think it depends what you're trying to do. If you're trying to improve human understanding of the code, in the right conditions, I can see your point. If the decompiled code is just a starting point for other tools, I still think nothing beats C (yet?).

> Ghidra's type system lacks function pointer types

Wow! I think this is really crippling, and even without considering C++. I can think of many C codebases where people just do "C-with-classes" with a bunch of struct with function pointer fields.

> the C type system is too powerful for decompilers to robustly lift to, and the resulting code is generally at best filled with distractions of wait-I-can-fix-this excessive casting and at worst just wrong.

> I've just seen far too many times where Ghidra starts with wrong types for something and the result becomes gibberish--even just plain dropping stuff altogether.

Besides the lack of function pointers, which I can't say loud enough how crippling I think it is, I'd be really interested in knowing more about the specifics of your complaints on plain-wrong type recovery. I second the invite to join our Discord server!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: