How To Verify Stolen Code
By Tim Daycounter
So a key employee has just left your company, and within a couple of weeks, one your main competitors has come out with a product that mimics all of your product's functionality. The question is how can I know if my employee has stolen our company's code base, and given it to a competitor. This article attempts to give you some insights into what you can do to figure this out.
The easiest way to see if your code has been stolen is to get access to your competitor's source code, and then do quick visual comparison. It is likely that you would need more proof than mere suspicions to get a subpoena for this code. Most likely your competitor would want to stipulate, that it be reviewed by a 3rd party, and not anyone related to your company, to protect their own trade secrets.
Thus, the only potential proof you have is their current executable file. If the file is very similar to yours then you have a case. So the questions is: How do you compare the the two executable files?
First some background - When a program is developed. It is usually written in a high level language from C. When it is compiled, the compiler converts the source code into assembly language. An assembler converts the assembly language into object code which is machine language, (ones and zeros), where the functions are all independent. A linker then takes this object code and links the function calls together, and gives you an executable file. In the case of a windows file. This is a .exe which has a Portable Executable (PE) format. High level code is much more sparse than assembly language. One line of C++ code could translate into 10 or more lines of assembly. So if you have a program in C that is 20 pages, it's corresponding assembly language could be 200 pages of code.
A disassembler will take machine code, such as an exe and will convert the machine language back into assembly language. But not into the original source code.
This morning I downloaded win32Dasm (you can search for it on google.com), a common windows disassembler, and I looked at a simple program that I had written. The disassembler gave me about 250 pages of disassembled code, written in assembly language. The actual C code was probably only about 20 pages.
Error messages, and other string constants are often stored into the program, and are easily retrievable. These string constants are easily retrieved using a disassembler, and can be used as a finger print for the program. I assume that your former employ, wouldn't have changed any of the string constants. So it would be worth while as a first approach to look at each executable file's string constants using the disassembler. If the string constants are totally different, then chances are he didn't outright steal your code. It also means, that chances are that the code is completely different and will be difficult to make a case on.
If it turns out that the string constants are similar, then you can take it to the next phase. You have two options:
The first is to have a programmer look at the disassembled assembly code, and look for correlations. There are going to be hundreds of pages, so this is going to be time consuming. It will also give you a qualitative result. In other words, your programmer will say, "Yes, the two programs look quite similar".
The second option is to write a program that looks for correlations, in the code. This will also take some time, but it will give a more quantitative approach, such as 95% of the executable is identical.
For a first pass, I'd look at the string constants in the executable files. This could be done quickly, and it should give you enough information to know if you want to move forward.
The only challenge is extracting the executable from the install program. You can extract the executable by running the installation programs, and then look for the files that it installs. If the files are not installed into a common directory, but placed all over the machine, such as in the windows system directory, then it might be hard to actually find the executable.
Finally some advice - Don't let on to the programmer that you are thinking of going after him. This will make him want to cover his tracks.