Table of contents |
2 Example algorithm (theoretical!) 3 TODO |
Assembly style self-modifying code:
The kinds of self-modifying code that are used in assembly can be for various purpuses:
The second and third types are proboably the kinds mostly used also in high-level languages, such as LISP.
Pseudo-code example of type 1:
repeat N times { if STATE is 1 increase A by one else decrease A by oneSelf-modifying code in this case would simply be a matter of rewriting the loop like this:do something with A }
repeat N times {Note that 2-state replacement of the opcode can be easly written as 'xor var at address with the value "opcodeOf(Inc) xor opcodeOf(dec)"'increase A by one do something with A } when STATE has to switch { replace the opcode "increase" above with the opcode to decrease }
Choosing this solution will have to depend of course on the value of 'N' and the frequency of state changing.
Some claim that use of self-modifying code is not recommended when a viable alternative exists, because such code can be difficult to understand and maintain.
Others, simply view self-modifying code as something one would be doing while editing code (in the above example, replacing a line, or keyword), only done in run-time.
In some cases self-modifying code executes slower on modern processors. This is because a modern processor will usually try to keep blocks of code in its cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay.
The cache invalidation issue on modern processors usually means that self-modifying code would still be faster only when the modification will occur rarely. Such as in the case of a state switching in an inner loop. This concideration is not unique to processors with code cache, since on any processor rewriting the code never does come for free.
Self-modifying code was used in the early days of computers in order to save memory space, which was limited. It was also used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the flow of control (this is still relevant in certain ultra-RISC architectures, at least theoretically, e.g. one such system has a sole branching instruction with three operands: subtract-and-branch-if-negative).
Self-modifying code was used to hide copy protection instructions in 1980s MS-DOS based games. The floppy disk drive access instruction 'int 0x13' would not appear in the executable program's image but it would be written into the executable's memory image after the program started executing. Self-modifying code is also sometimes used by programs that do not want to reveal their presence -- such as computer viruses and some shellcodes. Modifying a piece of running code is also used in certain attacks, such as buffer overflows.
Because of the security implications of self-modifying code, some operating systems go to lengths to rule it out. Recent versions of OpenBSD, for instance, have a feature known as W^X (for "write xor execute", meaning a program can only write, or execute, but not both) which inhibits alteration of memory pages which harbor executable code. Programs which depend upon rewriting their own machine code cannot execute in such an environment.
(*A means "the location to which A points")Example algorithm (theoretical!)
Start:
GOTO Decryption_Code
Encrypted:
...
lots of encrypted code!!!
...
Decryption_Code:
*A = Encrypted
Loop:
B = *A
B = B XOR CryptoKey
*A = B
A = A + 1
GOTO Loop IF NOT A = (Decryption_Code - Encrypted)
GOTO Encrypted
CryptoKey:
some_random_number
This "program" will decrypt a part of itself and then jump to it.