From: “Mullen Patrick” <Mullen.Patrick@mail.ndhm.gtegsc.com>
To: “Neil Tunnicliffe” <intaus2@ois.net.au>
Message Hash: bcc733ff49fae1b52663c6f8b8402ae4486bb9630f09edd7c898542154ca64a3
Message ID: <n1359491961.71160@mail.ndhm.gtegsc.com>
Reply To: N/A
UTC Datetime: 1997-01-07 14:47:47 UTC
Raw Date: Tue, 7 Jan 1997 06:47:47 -0800 (PST)
From: "Mullen Patrick" <Mullen.Patrick@mail.ndhm.gtegsc.com>
Date: Tue, 7 Jan 1997 06:47:47 -0800 (PST)
To: "Neil Tunnicliffe" <intaus2@ois.net.au>
Subject: Polymorph engine (was:RE: PWL's how ?)
Message-ID: <n1359491961.71160@mail.ndhm.gtegsc.com>
MIME-Version: 1.0
Content-Type: text/plain
Where did you get the following text on polymorph engines? Is it available
somewhere? Where can I get other information on similar topics (Please resist
the urge to mention web search engines; I'm looking for specific
recommendations :-)
Thanks!
- Patrick
_______________________________________________________________________________
From: Neil Tunnicliffe on Tue, Jan 7, 1997 6:08
Read this for an understanding of a polymorhphic virus:
E#234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234#>>
#188# A GENERAL DESCRIPTION OF THE METHODS BEHIND A POLYMORPH ENGINE #188#
E#234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234##234#*
A small glossary of terms:
AAAAAAAAAAAAAAAAAAAAAAAAAA
ENCRYPT = Transform from it's original form to an altered form.
DECRYPT = Transform from it's altered form to it's original form.
KEY = The register or value used to encrypt/decrypt with.
SLIDING KEY = A KEY value that is INCREASED or DECREASED on each loop.
COUNT = The number of bytes in the encrypted code or data.
INDEX = A pointer to the encrypted code or data.
SIGNATURE = A unique group of bytes that can be used to check against
a programs content in the hope of detecting a particular
program.
HEURISTIC = A set of well defined rules to apply to a problem in the
hope of achieving a known result.
Question: What is a Polymorph?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Answer: Well, the Longman English Dictionary defines it as:
"POLYMORPHOUS also POLYMORPHIC adj fml or tech.
EXISTING IN VARIOUS DIFFERENT FORMS."
In other words, something that has the ability to change it's shape. Other
ways to describe such a thing might be; Mutable, Metamorphic, Etc...
Question: What is a Polymorph Engine?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Answer: A program with the abilities to encrypt (or jumble up) another
program or data and provide a unique decryptor for it, it must
do this in such a way that no two encryptions of the same program
or data will look alike.
Example: Take the following ultra-simple decryptor:
MOV SI,jumbled_data ;Point to the jumbled data
MOV CX,10 ;Ten bytes to decrypt
main_loop: XOR BYTE PTR [SI],55 ;XOR (un_scramble!) a byte
INC SI ;Next byte
LOOP main_loop ;Loop for the 9 remaining bytes
This small program will XOR the ten bytes at the location pointed to by SI
with the value 55. Providing the ten bytes were XORed with 55 prior to
running this decryptor the ten bytes will be restored to their original
state. If you are unsure as to why this is, brush up on your XOR logic!!
Ok, so you might say that if you change the KEY value on each generation it
will become Polymorphic? Well, yes and no! If you did that, the encrypted
portion would be Polymorphic, but the decryptor would still remain mostly the
same, the only change begin the KEY value! So, a signature scanner that
allows WILDCARDS (and most do!) would still be able to find your decryptor!
One way you could fool some signature scanners is to swap around some of the
instructions. So, with this in mind, the above decryptor might look like:
MOV CX,10
MOV SI,jumbled_data
main_loop: XOR BYTE PTR [SI],55
INC SI
LOOP main_loop
As you can see, still not much of a change, not really enough to fool some of
the better signature scanners.
"GET TO THE POINT! WHAT IS A TRUE POLYMORPH?", I hear you cry!
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Well, a "true" Polymorph would be a decryptor that looks completely different
on each generation! Take the following decryptor:
MOV CX,10
NOP
NOP
MOV SI,jumbled_data
NOP
main_loop: NOP
NOP
XOR BYTE PTR [SI],55
NOP
INC SI
NOP
NOP
NOP
NOP
LOOP main_loop
This decryptor is the same as the one before it, but it has has a few random
NOP instructions peppered throughout itself. On each generation you would
vary the amount of NOPs after each instruction. This is a Polymorph in it's
simplest form. Still, most of the good signature scanners would have no
problem with such a simple Polymorph. They would simply skip the NOPs, thus
having a clear view of the decryptor, to which they could apply a signature!
No, a "true" Polymorph has to be far far more complex then this! Instead of
peppering NOPs throughout the decryptor it would pepper totally random amounts
of totally random 8086 instructions, including JUMPS and CALLS. It would
also use a different main decryptor (possibly from a selection of pre-coded
ones) and would alter all the registers that the decryptor uses on each
generation, making sure that the JUNK code that it generates doesn't destroy
any of the registers used by the real decryptor! So, with these rules in
mind, here is our simple decryptor again:
MOV DX,10 ;Real part of the decryptor!
MOV SI,1234 ;junk
AND AX,[SI+1234] ;junk
CLD ;junk
MOV DI,jumbled_data ;Real part of the decryptor!
TEST [SI+1234],BL ;junk
OR AL,CL ;junk
main_loop: ADD SI,SI ;junk instruction, real loop!
XOR AX,1234 ;junk
XOR BYTE PTR [DI],55 ;Real part of the decryptor!
SUB SI,123 ;junk
INC DI ;Real part of the decryptor!
TEST DX,1234 ;junk
AND AL,[BP+1234] ;junk
DEC DX ;Real part of the decryptor!
NOP ;junk
XOR AX,DX ;junk
SBB AX,[SI+1234] ;junk
AND DX,DX ;Real part of the decryptor!
JNZ main_loop ;Real part of the decryptor!
As you should be able to see, quite a mess!! But, still executable code.
It is essential that any junk code generated by the Polymorph Engine is
executable, as it is going to be peppered throughout the decryptor. Note, in
this example, that some of the junk instructions use registers that we are
using in the decryptor! This is fine, providing the values in these
registers aren't destroyed. Also note, that now we have random registers and
random instructions on each generation it makes signature scanning (even for
the clever signature scanners) impossible! Instead, an HEURISTIC method must
be used, which can lead to false alarms.
So, a Polymorph Engine can be summed up into three major parts:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
1 .. The random number generator.
2 .. The junk code generator.
3 .. The decryptor generator.
There are other discrete parts but these three are the ones where most of the
work goes on!
How does it all work? Well, SMEG goes about generating random decryptors in
the following way:
1 .. Chooses a random selection of registers to use for the decryptor.
Leaving the remaining registers as "junk" registers for the junk code
generator.
2 .. Chooses one of the compressed pre-coded decryptors.
3 .. Goes into a loop generating the real decryptor, peppered with junk
code.
To understand how the selected registers are slotted into the decryptors and
the junk code you must look at the 8086 instructions from a binary level:
XOR AX,AX = 00110001 11000000
XOR AX,CX = 00110001 11001000
XOR AX,DX = 00110001 11010000
XOR AX,BX = 00110001 11011000
You should be able to see a pattern in the binary code for these four 8086
instructions? Well, all 8086 instructions follow logical patterns, and it is
these patterns that tell the 8086 processor which registers/addressing mode
to use for a particular instruction. The total amount of instruction formats
and the precise logic regarding the patterns is too complex to go into here.
However, all good 8086 tutorials/reference guides will explain in full.
SMEG exploits this pattern logic to generate junk code and decryptors with
random registers, as the patterns directly relate to the registers Etc.
SMEG generates junk code in the following way:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Inside SMEG there is a table of the basic binary patterns for all of the 8086
instruction set, but with one important difference, all the register/address
mode bits are zero. This is called the SKELETON INSTRUCTION TABLE. The
table also contains various other bytes used by SMEG to determine the
relevant bit positions to "plug in" the register bit patterns. These
patterns are plugged in via the logic processes OR and AND. Using this
method, SMEG can generate endless amounts of random 8086 instructions without
destroying any of the registers used by the decryptor proper.
SMEG also contains some discrete logic for producing false CALLS to dummy
subroutines and also false conditional JMPS around the junk code.
SMEG generates the decryptor proper in the following way:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Inside SMEG there is a table containing a selection of common 8086
instructions used in decryptors, such as XOR [index],reg Etc. These are,
again, stored in SKELETON FORM with some control bytes used by the decryptor
generator. Also, inside SMEG, there are several pre-coded decryptors stored
in a compressed form. On average, a complete decryptor can be described to
the decryptor generator in as few as 11 bytes and adding to the list of
pre-coded decryptors is both painless and economical with space!
SMEG generates the Polymorphed decryptor in the following way:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
First it chooses, at random, one of the pre-coded compressed decryptors.
Next it goes into a loop uncompressing each decryptor instruction, plugging
in the required registers, storing it and then generating (for each real
instruction) a random amount of random instructions. This loop repeats until
the complete decryptor has been constructed. The final result is a random
size, random register, random patterned decryptor!
It should also be noted that whenever SMEG generates an INDEXed instruction
it uses either SI, DI or BX at random, also it sometimes uses a random offset.
For example, say the encrypted code started at address 10h, the following
could be used to index this address:
MOV SI,10h ;Start address
MOV AL,[SI] ;Index from initial address
But sometimes SMEG will generate something like the following, again based on
the encrypted code starting at address 10h:
MOV DI,0BFAAh ;Indirect start address
MOV AL,[DI+4066h) ;4066h + 0BFAAh = 10010h (and FFFF = 10h)!!
These indexed and initial values are picked at complete random, and the
examples of 0BFAAh and 4066h are valid, but next time they will be completely
different!
The following are two decryptors that were generated with my SMEG Polymorph
Engine. It should be noted that I generated 4000 examples with no two alike!
Unfortunately I ran out of hard drive space! But it is fairly safe to say
that the total number of decryptor combinations would run into the BILLIONS!
All the lines marked with ";junk" in the following listings indicate random
junk instructions that were inserted throughout the actual decryptor, note
that SMEG has the ability to generate junk CALLS to false SUBROUTINES, as
well as general junk conditional jumps! All lines marked with a * indicate
an actual part of the decryptor proper. I chose the two generations shown
because their sizes were similar, 386 and 480 bytes. SMEG produces
decryptors ranging in size from as little as 288 to as much as 1536 bytes.
Even if two decryptors are generated that are the same size the chances of
them being the same are, literally, billions to one!
Return to January 1997
Return to ““Mullen Patrick” <Mullen.Patrick@mail.ndhm.gtegsc.com>”
1997-01-07 (Tue, 7 Jan 1997 06:47:47 -0800 (PST)) - Polymorph engine (was:RE: PWL’s how ?) - “Mullen Patrick” <Mullen.Patrick@mail.ndhm.gtegsc.com>