Software Reverse Engineering and Mechanistic Interpretability

I had an interesting talk with Neel Nanda, a well known mechanistic interpretability researcher working at Anthropic, independently and soon for DeepMind. On our chat he interviewed me on paralles between software reverse engineering and neural-networks reverse engineering. You may read it here.

In our talk I mentioned some of my practical tools to research in general; mangaing three modes of operation:

  1. Diverging - using mindmaps.
  2. Converging - focusing on a specific question.
  3. Meta - predicting and analyzing the research as it goes.

I emphasized the importance on RE the least possible, trying to reach an understanding fast by gaining an intituion towards the inner parts of the software. The RE process is hyptothesize driven and fuzzy. It is done by constantly updating a mental model of the developer’s high level point of view to his/her implementation and design. Looking for known motifs as hooks to get directly to the code of interest. Leveraging bugs and exceptional behavior as guidance or shortcut to the interseting part of the code.

Written on December 26, 2022