No, don’t get too excited, I’m not going to publish a post regarding some stripper. This post is about analyzing an executable file whose source is NOT open. More lucidly? Well, let us assume that there is a malicious program. Many a time, programmers “strip” their executables (also called a binary in *nix lingo) that obfuscates the actual code. Regular debuggers and analyzers cannot help in recovering or “reversing” the obfuscated binaries.
More lucidly? I know, right? Its just that it gets more esoteric as it gets more lucid 🙂 *innocent smile*. Well nonetheless, lets dive into deeper oceans now. You see, when we write a code and compile (assuming, compiler == gcc) it to generate an executable of a specific format. In *nix, this format is the ELF (Executable and Linking Format). The compiler “inserts” many symbols and stuff such that… yada yada. I’ll cut the stupid intro part and directly jump to the actual thing (I’m having a big huge super-massive headache). The reader, if novice, is advised to read the elf(3) documentation before actually even trying to read this post.
I’ll assume that the reader is a bit-more-than-novice Linux system programmer and has a decent knowledge of C. This post will, first, theoretically tell how we can analyze a stripped binary/executable and second, I give my analyzer `xbug’ that can be used as a handy tool for reverse engineering. Since we are in the world of Linux, we assume the already known facts: compiler == gcc, architecture == i386 and ELF executable format.
ELFs are really mysterious
ELF stands for Extensible Linking Format. It is a format for use by compilers, linkers, loaders and other tools that manipulate object code. The ELF specification was released to the public in 1990 as an “open standard” by a group of vendors. As a result of its ready availability it has been widely adopted by industry and the open-source community. The ELF standard supports 32- and 64-bit architectures of both big and little-endian kinds, and supports features like cross-compilation and dynamic shared libraries. ELF
also supports the special compilation needs of the C++ language. Among the current set of open-source operating systems, FreeBSD switched to using ELF as its object format in FreeBSD 3.0 (October 1998).
The libelf library provides an API set (ELF(3) and GELF(3)) for application writers to read and write ELF objects with. The library eases the task of writing cross-tools that can run on one machine architecture and manipulate ELF objects for another.
An ELF file has a “dual nature”, the figure below shows what “dual nature” means:
As a reverse engineer we are more interested in the execution view. I wont be didactic about what an ELF header or program header or a section header is, you can download a well written tutorial for better understanding the LibELF API. The post will just become a whole new spawn of my blog if I start from the scratch.
When we strip a binary, the section headers being optional are obliterated. Since, we do not have sections hence we cannot debug the executable using a generic debugger such as gdb. We need to reverse engineer the binary and manually construct useful sections that can enable us to analyze the binary.
The figure above clearly shows that from execution perspective, we have segments present. Each segment has a type defined by the p_type entry of the program header of that segment. We are interested in the dynamic segment. The dynamic segment is actually the start of the .dynamic section generated by gcc during compilation. This section/segment holds a wealth of information that we can use. My `xbug’ uses it to partially rebuild the symbol table that can be then used by a debugger to control the executable at run-time (xbug’s next version will also feature a debugger, I couldn’t code it right now… handling three projects at the same time becomes a headache).
Each one of our dynamic entries is going to contain a different value stored in the d_tag member of the Elf32_Dyn structure. For example in our case if dyn->d_tag == DT_REL then this means that dyn->d_un.d_val == ‘start of of a reloctable section’. In the case of our gcc binary its probably the .reldyn section. Many bits and pieces of the section header can be pieced back together this way. Offsets for sections such as .interp, .ctors, .strtab, .hash, .symtab, .reldyn and .relplt can be found using the DYNAMIC segment.
So our basic first step is to parse the program headers to find the dynamic segment. Assuming that we have the offset of our dynamic segment, we now search for the string table (DT_STRTAB: string table sections hold null-terminated character sequences, commonly called strings. The object file uses these strings to represent symbol and section names).
Now, since we have our string table, we search for the symbol table section (DT_SYMTAB) in the dynamic segment. Next, we get the size of the string table by reading the nchains value from the Hash table which you can find from the dynamic segment of type DT_HASH.
Next, we loop through the symbol table using the Elf32_Sym struct. Refer to the location of our string table and use the st_name value to find the string that corresponds with the symbol entry we are parsing.
Reverse Engineering (?)
Ok, so now we have our partially built symbol table. Now, we can use our own debugger(s) and, for example, insert break-points at places that can be found from the symbol table. Want more? We can now even disassemble the instructions and see the actual instructions that are being executed by the binary, in other words, we hacked the actual source code of the closed source software!. An astute analyzer can use this information in
bad good ways and reverse engineer the executable.
My `xbug’ follows the above algorithm and constructs a partial symbol table as of now. The next version (and I’ll upload the new version only, once it is done) will feature a basic debugger as well. It can be a handy tool while analyzing closed source softwares.
Jeez, this post was like the hardest post I might have ever written and its so dry… nonetheless, I HOPE you enjoyed reading it. 🙂
May the force be with you.
PS: I’m just publishing this post now, I’ll upload my program soon (soon == a day or two maybe).