RVT Decompiler

A new macOS aarch64 native decompiler


RVT is a command line decompiler that targets the decompilation of machine code that comes from Mach-o (for macOS) binaries based on AARCH64 machine code.

Write me an e-mail to become an alpha tester.

Features

RVT Decompiler has the following features:

Most of the ideas and roots are set however a large rewriting is currently in progress, due to some technical debts.

Frequently Asked Questions

Why write a new decompiler?

The main idea of this project was to write a static analysis project that could apply some notions from the program analysis course. I have always liked to be able to say of any program "ah maybe I could do it better" and after countless problems with Hopper Disassembler I decided to write a decompiler. Some online projects are actually pretty much written just so you can learn, figure out how you can implement certain features, and later improve your ideas. The challenges that are posed in the development of a decompiler are really tough, but they allow you to touch with practice many theories that would otherwise remain only on white paper (see abstract interpretation, fixed point analysis, widening operator).

We already have IDA, Ghidra and Binary Ninja...

Yes, there are much more mature decompilers than my "proof of concept" (I would call it more of a weekend project), but that doesn't mean you have to abandon the idea. Only one decompiler of the three mentioned is open source (Ghidra), and while I respect the decision to keep a closed-source decompiler, I struggled to find a good minor project that could illustrate how to build static analysis software.

The only mature project that I value a lot, the Ghidra decompiler, written in C++, is a module within the framework of the same name, and while it can be built and integrated independently, it requires some data structures that are internal to Ghidra and are very verbose at the type level. The design development of a single decompiler vertically-focused (one architecture) allows the focus to be on dataflow analysis, type inference, and stack analysis into a single program without taking or having all the hacks of others architectures (MISP, x86, PPC).

Why another closed source decompiler?

The decompiler source code has not been available yet, due to some work that needs to be done by the lead developer – aka me – to ensure that all the code written sticks to some standards. I'm currently abusing some idioms that might not be the best in the long period, if you know Rust enough, please send me an e-mail. Once I've released the first versions and I'm tired to continue improve my software, I'll release it on GitHub. Current milestone is June 2024.

Why writing in Rust and not XYZ?

The main decompilers (Binary Ninja, Ghidra, IDA) currently are written in C++: most of the time this reason turns out to be one of the main causes for bad memory space management and bad results from the analysis point of view. Consequences are various including deadlocks, exponential memory consumption and much more. Using a language like Rust allows the developer to ignore problems related to memory management, avoiding spending valuable time managing it manually (C/C++) or worrying about monstrous overhead at runtime (Go/Java).

In projects like this, efficient and intelligent management of the objects at hand can make the difference between a good analysis result and one in which it is not sound/correct. In contexts such as in the case of a heavily obfuscated program, the difference is substantial.

Another reason I chose Rust is the way this language handles the "translation" aspect: through the match construct I can identify exactly which parts of an expression I want to have. This has translated into a robust implementation of objects such as optimizers, lifters.

Finally, the last reason I chose Rust concerns the ecosystem. In fact, the ecosystem is mature enough to be able to develop projects of very high technical difficulty like this one, and compared to other programming languages, development has been easy. Also, introducing an open-source project with Rust gives an additional boost to the development of analysis tools with this language.

How do I use RVT decompiler to decompile an aarch64 binary for macOS?

However, the general process is usually as follows:

  1. Download the decompiler and move it to a folder.
  2. Open the decompiler and load the aarch64 binary that you want to decompile. This is done by simply calling it via Terminal and attaching the aarch64 binary as the first argument.
  3. The decompiler will then attempt to reconstruct the original source code from the binary. RVT will print some helpful information if you start via RUST_LOG=info.
  4. Once the decompiler has finished, you will be able to view the reconstructed source code under the folder "results" in the HTML file generated by decompiler.

The decompiler does not translate this code well

The decompiler suffers from major problems and there are many features missing to consider it complete. Nevertheless, I am happy if some users try to use it for analysis and to see if indeed the tool is correct. I will be grateful to anyone who wants to create issues within issue tracking to implement, improve and share solutions to some problems.

My time to maintain this project will not be very much, but if there is enough demand I can devote a few hours a week to it to make it more robust and replaceable to a decompiler like HopperApp.

I want to use the product commercially

The decompiler developed here is very far from being a commercial product. However, I have obtained some results that are very similar to more mature decompilers.

I appreciate the question, though, and urge you to contact me if you would like to explore integrating some of the static analysis within your software products (or if you would simply like to understand how to turn my decompiler into $$ - just kidding).

Why it is called RVT Decompiler

RVT stands for a particular enzyhme we have that is "ReVerse Transcriptase" that is a protein which completely changes the information flow of our body. By default biology information flow directly goes from DNA to RNA for synthetizying proteins and other enzyhmes. With this enzyhme, found especially in so-called retro-virus, the information flow goes backward: from RNA to DNA.

Transpile the image into computer science: DNA (source code) is used to create filaments of RNA (compiled code) that is used to synthezize the proteins. A relevant website in the field once said:

Reverse transcriptase performs a remarkable feat, reversing the normal flow of genetic information, but it is rather sloppy in its job. The polymerases used to make DNA and RNA in cells are very accurate and make very few mistakes. Reverse transcriptase, on the other hand, makes lots of mistakes, up to about one in every 2,000 bases that it copies.

Is there a better definition of decompiler? Something that reverse the normal flow of information and can make mistakes. Yes, I know that it seems a little bit confusing, but if you have any better names, please send me an e-mail.

The word decompiler was a must since most of the programs related to reverse engineering were named with "disassembler" (IDA disassembler, Hopper Disassembler) or did not include any words at all related to what that program does in terms of "elaboration" (Ghidra? What's that? A fish?).


Seekbytes – e-mail