Skip over navigation

How to detect the types of executable files

Contents

Abstract

This article looks at how we examine a file to check if it is a DOS or Windows executable and, if so, whether it is a program file or a DLL.

We will develop a function – ExeType – that will test a file and return a value that describes what type of excutable file it is, if any.

An install program may need this information, or you may need to check a given file is a program or DLL before attempting to run or load it.

Outline design

Before we start coding our function, let us look at how we're going to accomplish this task. Our approach will be to open the file of interest and scan through it looking for markers to indicate its file type.

All Windows executable files begin with a MS-DOS executable stub, so we first test for a valid MS-DOS executable using information from the MS-DOS program header that is present in every executable file. We then check for markers for a 16 bit or 32 bit Windows executable or for a virtual device driver (VXD). If we establish the file is a Windows executable we look for information that determines whether the file is an application or is a DLL. A review of the MS-DOS, Windows NE (16 bit) and PE (32 bit) executable file formats leads us to note the following:

  • All DOS program files (and therefore Windows executables) begin with a "magic number"; the word value $5A4D ("MZ" in ASCII).
  • We use the DOS header to check that the file length exceeds or is equal to the minimum length of the DOS executable and that the offset of the DOS relocation table lies within the file.
  • Windows executables have a header record whose offset in the file is given by the long word at offset $3C.
  • The Windows header begins with a "magic number" word whose value indicates whether this is a 16bit (NE format) or 32 bit (PE format) executable or a virtual device driver (LE format). The word is $454E ("NE" in ASCII), $4550 ("PE") or $454C ("LE").
  • 32 bit Windows executables have an "image header" that starts four bytes after the beginning of the Windows header. This header structure has a Characteristics field which is a bit mask. If the bit mask contains the flag IMAGE_FILE_DLL then the file is a DLL, otherwise it is a program file.
  • 16 bit Windows programs have a byte sized field at offset $0D from the start of the Windows header which is a bit mask providing information about the file. If this field contains the flag $80 then the file is a DLL, otherwise it is a program.

The following flowchart illustrates the tests we will use, based on the information above:

Flowchart
Figure 1: Flowchart

So, in pseudo-code, our function can be described as:

Open file (Return Error type if doesn't exist)
Read DOS header from file (Return Unknown file type if can't read)
If first word = 'MZ' and file size is valid then
  Get offset of Window header record (Return DOS type if can't read)
  Get Windows header record data (Return DOS type if can't read)
  case first word of
    'PE':
      Read Windows exe header record (Return DOS type if can't read)
      if Characteristics field contains IMAGE_FILE_DLL flag then
        Return 32 bit Windows DLL type
      else
        Return 32 bit Windows application type
      end
    'NE':
      Read in byte at offest $0D (Return DOS type if can't read)
      if byte field contains flag $80 then
        Return 16 bit Windows DLL type
      else
        Return 16 bit Windows application type
      end
    'LE':
      Return VXD device driver type
    else:
      Return DOS type
  end
else
  Return Unknown file type
end
Listing 1

We now have an outline design from which to work. In the next section we'll begin coding the function.

Developing the function

Now we have an outline design, we can begin creating our function. Recall that it will analyse a given file and return a value to indicate the type of file found. Listing 2 shows the function prototype:

  1function ExeType(const FileName: string): TExeFileKind;
Listing 2

The return value is an enumerated type, which is defined as follows.

  1type
  2  // The kinds of files recognised.
  3  TExeFileKind = (
  4    fkUnknown,  // unknown file kind: not an executable
  5    fkError,    // error file kind: used for files that don't exist
  6    fkDOS,      // DOS executable
  7    fkExe32,    // 32 bit executable
  8    fkExe16,    // 16 bit executable
  9    fkDLL32,    // 32 bit DLL
 10    fkDLL16,    // 16 bit DLL
 11    fkVXD       // virtual device driver
 12  );
Listing 3

We will now examine the function's implementation. The code follows the logic presented in the first part of the article. First, here's the function prototype again:

  1// Examines given file and returns a code that indicates the type of
  2// executable file it is (or if it isn't an executable)
  3function ExeType(const FileName: string): TExeFileKind;
Listing 4

Immediately following the prototype we declare some constants to hold the fixed offsets, flags and "magic numbers" we will need to use:

  1const
  2  cDOSRelocOffset = $18;  // offset of "pointer" to DOS relocation table
  3  cWinHeaderOffset = $3C; // offset of "pointer" to windows header in file
  4  cNEAppTypeOffset = $0D; // offset in NE windows header of app type field
  5  cDOSMagic = $5A4D;      // magic number for a DOS executable
  6  cNEMagic = $454E;       // magic number for a NE executable (Win 16)
  7  cPEMagic = $4550;       // magic nunber for a PE executable (Win 32)
  8  cLEMagic = $454C;       // magic number for a Virtual Device Driver
  9  cNEDLLFlag = $80;       // flag in NE app type field indicating a DLL
Listing 5

The constants are followed by the the function's local variables. First we declare a variable to reference a file stream object that we will be using to read the file. We then have a variable to store Windows executable "magic numbers", along with another variable to store the offset of the Windows header record. We then declare variables to read information from the various header records - a byte in the case of the NE file format, an IMAGE_FILE_INFO structure for PE format files and an IMAGE_DOS_HEADER structure for the DOS file header. Finally we have a variable to store the minimum expected size of the MS-DOS file. Listing 6 has the the variable declarations:

  1var
  2  FS: TFileStream;              // stream to executable file
  3  WinMagic: Word;               // word containing PE or NE magic numbers
  4  HdrOffset: LongInt;           // offset of windows header in exec file
  5  ImgHdrPE: IMAGE_FILE_HEADER;  // PE file header record
  6  DOSHeader: IMAGE_DOS_HEADER;  // DOS header
  7  AppFlagsNE: Byte;             // byte defining DLLs in NE format
  8  DOSFileSize: Integer;         // size of DOS file
Listing 6

The IMAGE_DOS_HEADER structure is not declared in the Windows unit, so we must define it ourselves as shown in Listing 7.

  1type
  2  IMAGE_DOS_HEADER = packed record
  3    e_magic   : Word;     // Magic number (must be $5A4D)
  4    e_cblp    : Word;     // No of used bytes in last page in file
  5    e_cp      : Word;     // No of 512 byte pages in file
  6    e_crlc    : Word;     // Relocations
  7    e_cparhdr : Word;     // Size of header in paragraphs
  8    e_minalloc: Word;     // Minimum extra paragraphs needed
  9    e_maxalloc: Word;     // Maximum extra paragraphs needed
 10    e_ss      : Word;     // Initial (relative) SS value
 11    e_sp      : Word;     // Initial SP value
 12    e_csum    : Word;     // Checksum
 13    e_ip      : Word;     // Initial IP value
 14    e_cs      : Word;     // Initial (relative) CS value
 15    e_lfarlc  : Word;     // Address of relocation table
 16    e_ovno    : Word;     // Overlay number
 17    e_res     : packed array [0..3] of Word;  // Reserved words
 18    e_oemid   : Word;     // OEM identifier (for e_oeminfo)
 19    e_oeminfo : Word;     // OEM info; e_oemid specific
 20    e_res2    : packed array [0..9] of Word;  // Reserved words
 21    e_lfanew  : Longint;  // File address of new exe header
 22  end;
Listing 7

We now start to process the file. Before we can perform any analysis we need to open the file for reading. A read only file stream is used to do this. We also need to handle any exceptions raised when reading the file. A skeletal outline of the body of the function is shown in Listing 8. This listing illustrates how we open and close the file stream and handle any exceptions raised.

  1begin
  2  try
  3    // Open stream onto file: raises exception if can't be read
  4    FS := TFileStream.Create(FileName, fmOpenRead + fmShareDenyNone);
  5    try
  6      // ... process file here
  7    finally
  8      FS.Free;
  9    end;
 10  except
 11    // Exception raised in function => error result
 12    Result := fkError;
 13  end;
 14end;
Listing 8

If the file doesn't exist then an exception will be raised by the stream constructor. This, and any other exceptions, are trapped and converted into an error result. We use an inner try..finally block to ensure thefile stream gets closed.

The file analysis code fits inside the inner try..finally block in the above code fragment. The remainder of this section is devoted to a discussion of how we perform the analysis. We begin by attempting to read the DOS header record. We then use the information in the header to perform various checks. The code is shown in Listing 9.

  1      // Assume unkown file
  2      Result := fkUnknown;
  3      // Any exec file is at least size of DOS header long
  4      if FS.Size < SizeOf(DOSHeader) then
  5        Exit;
  6      FS.ReadBuffer(DOSHeader, SizeOf(DOSHeader));
  7      // Check for DOS magic number at start of file
  8      if DOSHeader.e_magic <> cDOSMagic then
  9        Exit;
 10      // DOS files have length >= size indicated by DOS header's
 11      // e_cblp and e_cb fields. e_cblp stores the number of 512 bytes
 12      // pages in the file. e_cp stores the number of bytes used in the
 13      // last page of the file.
 14      if (DOSHeader.e_cblp = 0) then
 15        DOSFileSize := DOSHeader.e_cp * 512
 16      else
 17        DOSFileSize := (DOSHeader.e_cp - 1) * 512 + DOSHeader.e_cblp;
 18      if FS.Size < DOSFileSize then
 19        Exit;
 20      // DOS file relocation offset must be within DOS file size.
 21      if DOSHeader.e_lfarlc > DOSFileSize then
 22        Exit;
 23      // We assume we have an executable file: assume its a DOS program
 24      Result := fkDOS;
Listing 9

We first assume the file format is unknown. Then we check that the file is large enough to contain the DOS header and exit if this is not the case. (Remember that the finally block is always executed following an Exit, so our file stream will be freed). If the file is large enough we read the DOS header before performing these three checks on it:

  1. That the e_magic field stores the required magic number.
  2. That the file has the required length. The e_cp field stores the number of 512 byte pages in the file. The e_cblp field stores the number of bytes in the last 512 byte page that are used. The expected file length is calculated from these two values and this is checked against the actual size of the file. (Note that the file size can be greater than the expected size, since it is possible to append data to an executable file – see article #7 for further information).
  3. That the offset of the DOS relocation table per the e_lfarlc field falls within the file.

If all these tests are passed then it is safe to assume we have at least a DOS executable, so we set the function result accordingly.

The next thing to do is to try to find a Windows file header and read the Windows executable magic number at the start of it. Listing 10 has the code to do this:

  1      // Try to find offset of Windows program header
  2      if FS.Size <= cWinHeaderOffset + SizeOf(LongInt) then
  3        // Windows header offset is too big for the file:
  4        // it's a DOS file
  5        Exit;
  6      // read the offset of the Window header
  7      FS.Position := cWinHeaderOffset;
  8      FS.ReadBuffer(HdrOffset, SizeOf(LongInt));
  9      // Now try to read first word of Windows program header
 10      if FS.Size <= HdrOffset + SizeOf(Word) then
 11        // file too small to contain header: it's a DOS file
 12        Exit;
 13      FS.Position := HdrOffset;
 14      // This word should be NE, PE or LE per file type
 15      FS.ReadBuffer(WinMagic, SizeOf(Word));
Listing 10

We first check that the file is large enough to store the Windows header offset, and bail out if not. If we bail out we assume the file is a DOS executable (the return value was set earlier). We then read in the offset and once again check that the file is large enough to hold a magic number located at the file position given by the offset. If so we move the file pointer to the start of the header and read the magic number into the WinMagic variable.

We now use the magic number to determine what kind of file we have. PE, NE & LE format files are then processed separately. In the case of NE & PE format files we also need to check whether the file is a DLL or application. A case statement is used to do the checking. Its outline is as follows:

  1      case WinMagic of
  2        cPEMagic:
  3          // ... PE format - check whether DLL or application
  4        cNEMagic:
  5          // ... NE format - check whether DLL or application
  6        cLEMagic:
  7          // ... LE format - return VXD type
  8        else
  9          // ... DOS file executable
 10      end;
Listing 11

We'll discuss each of the cases separately.

PE Format

For PE format files we attempt to read the Windows header record (of type IMAGE_FILE_HEADER) from the file. If the Characteristics field of the structure (a bit mask) contains the IMAGE_FILE_DLL flag then we have a DLL, otherwise we have an application.

Listing 12 has the code for the PE part of the above case statement:

  1          // 32 bit Windows application: now check whether app or DLL.
  2          // Windows Image Header starts 4 bytes from the start of the
  3          // Windows header offset.
  4          if FS.Size < HdrOffset + SizeOf(LongWord) + SizeOf(ImgHdrPE) then
  5            // file not large enough for image header: assume DOS
  6            Exit;
  7          // read Windows image header
  8          FS.Position := HdrOffset + SizeOf(LongWord);
  9          FS.ReadBuffer(ImgHdrPE, SizeOf(ImgHdrPE));
 10          if (ImgHdrPE.Characteristics and IMAGE_FILE_DLL)
 11            = IMAGE_FILE_DLL then
 12            // characteristics indicate a 32 bit DLL
 13            Result := fkDLL32
 14          else
 15            // characteristics indicate a 32 bit application
 16            Result := fkExe32;
Listing 12

Note that, once again, before attempting to read the header we check the file is large enough to contain it and bail out if not, assuming a DOS executable. If we successfully read the header we check the Characteristics field for the required flag.

NE Format

For NE format files we read the byte at offset $0D from the start of the header and check to see if it contains a bit flag $80. If so, we have a DLL and if not we have an application.

The code for the NE part of the above case statement, which follows similar logic to that for the PE header, is:

  1          // We have 16 bit Windows executable: check whether app or DLL
  2          if FS.Size <= HdrOffset + cNEAppTypeOffset
  3            + SizeOf(AppFlagsNE) then
  4            // app flags field would be beyond EOF: assume DOS
  5            Exit;
  6          // read app flags byte
  7          FS.Position := HdrOffset + cNEAppTypeOffset;
  8          FS.ReadBuffer(AppFlagsNE, SizeOf(AppFlagsNE));
  9          if (AppFlagsNE and cNEDLLFlag) = cNEDLLFlag then
 10            // app flags indicate DLL
 11            Result := fkDLL16
 12          else
 13            // app flags indicate program
 14            Result := fkExe16;
Listing 13

LE Format

For LE Format files there is no further checking to be done. We simply return that we have found a virtual device driver:

  1          // We have a Virtual Device Driver
  2          Result := fkVXD;
Listing 14

DOS Format

This just leaves the trivial case of when none of the magic numbers are present. In this case we assume that the file is a DOS application. Since we have already set the function result to the DOS file type there is nothing to do. We simply place a comment to document this fact:

  1          // DOS application
  2          {Do nothing - DOS result already set};
Listing 15

Putting it all together

Our ExeType function is now complete. Listing 16 shows the complete function, along with the required type definitions.

  1type
  2  // *** Whether this declaration is needed depends on whether
  3  //     the record is defined in the Windows unit.
  4  IMAGE_DOS_HEADER = packed record
  5    e_magic   : Word;     // Magic number (must be $5A4D)
  6    e_cblp    : Word;     // No of used bytes in last page in file
  7    e_cp      : Word;     // No of 512 byte pages in file
  8    e_crlc    : Word;     // Relocations
  9    e_cparhdr : Word;     // Size of header in paragraphs
 10    e_minalloc: Word;     // Minimum extra paragraphs needed
 11    e_maxalloc: Word;     // Maximum extra paragraphs needed
 12    e_ss      : Word;     // Initial (relative) SS value
 13    e_sp      : Word;     // Initial SP value
 14    e_csum    : Word;     // Checksum
 15    e_ip      : Word;     // Initial IP value
 16    e_cs      : Word;     // Initial (relative) CS value
 17    e_lfarlc  : Word;     // Address of relocation table
 18    e_ovno    : Word;     // Overlay number
 19    e_res     : packed array [0..3] of Word;  // Reserved words
 20    e_oemid   : Word;     // OEM identifier (for e_oeminfo)
 21    e_oeminfo : Word;     // OEM info; e_oemid specific
 22    e_res2    : packed array [0..9] of Word;  // Reserved words
 23    e_lfanew  : Longint;  // File address of new exe header
 24  end;
 25
 26  // The kinds of files recognised.
 27  TExeFileKind = (
 28    fkUnknown,  // unknown file kind: not an executable
 29    fkError,    // error file kind: used for files that don't exist
 30    fkDOS,      // DOS executable
 31    fkExe32,    // 32 bit executable
 32    fkExe16,    // 16 bit executable
 33    fkDLL32,    // 32 bit DLL
 34    fkDLL16,    // 16 bit DLL
 35    fkVXD       // virtual device driver
 36  );
 37  
 38function ExeType(const FileName: string): TExeFileKind;
 39  {Examines given file and returns a code that indicates the type of
 40  executable file it is (or if it isn't an executable)}
 41const
 42  cDOSRelocOffset = $18;  // offset of "pointer" to DOS relocation table
 43  cWinHeaderOffset = $3C; // offset of "pointer" to windows header in file
 44  cNEAppTypeOffset = $0D; // offset in NE windows header of app type field
 45  cDOSMagic = $5A4D;      // magic number for a DOS executable
 46  cNEMagic = $454E;       // magic number for a NE executable (Win 16)
 47  cPEMagic = $4550;       // magic nunber for a PE executable (Win 32)
 48  cLEMagic = $454C;       // magic number for a Virtual Device Driver
 49  cNEDLLFlag = $80        // flag in NE app type field indicating a DLL
 50var
 51  FS: TFileStream;              // stream to executable file
 52  WinMagic: Word;               // word containing PE or NE magic numbers
 53  HdrOffset: LongInt;           // offset of windows header in exec file
 54  ImgHdrPE: IMAGE_FILE_HEADER;  // PE file header record
 55  DOSHeader: IMAGE_DOS_HEADER;  // DOS header
 56  AppFlagsNE: Byte;             // byte defining DLLs in NE format
 57  DOSFileSize: Integer;         // size of DOS file
 58begin
 59  try
 60    // Open stream onto file: raises exception if can't be read
 61    FS := TFileStream.Create(FileName, fmOpenRead + fmShareDenyNone);
 62    try
 63      // Assume unkown file
 64      Result := fkUnknown;
 65      // Any exec file is at least size of DOS header long
 66      if FS.Size < SizeOf(DOSHeader) then
 67        Exit;
 68      FS.ReadBuffer(DOSHeader, SizeOf(DOSHeader));
 69      // Check for DOS magic number at start of file
 70      if DOSHeader.e_magic <> cDOSMagic then
 71        Exit;
 72      // DOS files have length >= size indicated by DOS header's
 73      // e_cblp and e_cb fields. e_cblp stores the number of 512 bytes
 74      // pages in the file. e_cp stores the number of bytes used in the
 75      // last page of the file.
 76      if (DOSHeader.e_cblp = 0) then
 77        DOSFileSize := DOSHeader.e_cp * 512
 78      else
 79        DOSFileSize := (DOSHeader.e_cp - 1) * 512 + DOSHeader.e_cblp;
 80      if FS.Size <  DOSFileSize then
 81        Exit;
 82      // DOS file relocation offset must be within DOS file size.
 83      if DOSHeader.e_lfarlc > DOSFileSize then
 84        Exit;
 85      // We assume we have an executable file: assume its a DOS program
 86      Result := fkDOS;
 87      // Try to find offset of Windows program header
 88      if FS.Size <= cWinHeaderOffset + SizeOf(LongInt) then
 89        // Windows header offset is too big for the file:
 90        // it's a DOS file
 91        Exit;
 92      // read the offset of the Window header
 93      FS.Position := cWinHeaderOffset;
 94      FS.ReadBuffer(HdrOffset, SizeOf(LongInt));
 95      // Now try to read first word of Windows program header
 96      if FS.Size <= HdrOffset + SizeOf(Word) then
 97        // file too small to contain header: it's a DOS file
 98        Exit;
 99      FS.Position := HdrOffset;
100      // This word should be NE, PE or LE per file type: check which
101      FS.ReadBuffer(WinMagic, SizeOf(Word));
102      case WinMagic of
103        cPEMagic:
104        begin
105          // 32 bit Windows application: now check whether app or DLL.
106          // Windows Image Header starts 4 bytes from the start of the
107          // Windows header offset.
108          if FS.Size < HdrOffset + SizeOf(LongWord) + SizeOf(ImgHdrPE) then
109            // file not large enough for image header: assume DOS
110            Exit;
111          // read Windows image header
112          FS.Position := HdrOffset + SizeOf(LongWord);
113          FS.ReadBuffer(ImgHdrPE, SizeOf(ImgHdrPE));
114          if (ImgHdrPE.Characteristics and IMAGE_FILE_DLL)
115            = IMAGE_FILE_DLL then
116            // characteristics indicate a 32 bit DLL
117            Result := fkDLL32
118          else
119            // characteristics indicate a 32 bit application
120            Result := fkExe32;
121        end;
122        cNEMagic:
123        begin
124          // We have 16 bit Windows executable: check whether app or DLL
125          if FS.Size <= HdrOffset + cNEAppTypeOffset
126            + SizeOf(AppFlagsNE) then
127            // app flags field would be beyond EOF: assume DOS
128            Exit;
129          // read app flags byte
130          FS.Position := HdrOffset + cNEAppTypeOffset;
131          FS.ReadBuffer(AppFlagsNE, SizeOf(AppFlagsNE));
132          if (AppFlagsNE and cNEDLLFlag) = cNEDLLFlag then
133            // app flags indicate DLL
134            Result := fkDLL16
135          else
136            // app flags indicate program
137            Result := fkExe16;
138        end;
139        cLEMagic:
140          // We have a Virtual Device Driver
141          Result := fkVXD;
142        else
143          // DOS application
144          {Do nothing - DOS result already set};
145      end;
146    finally
147      FS.Free;
148    end;
149  except
150    // Exception raised in function => error result
151    Result := fkError;
152  end;
153end;
Listing 16

Demo program

A demo program to accompany this article can be found in the delphidabbler/article-demos Git repository on GitHub.

You can view the code in the article-08 sub-directory. Alternatively download a zip file containing all the demos by going to the repository's landing page and clicking the Clone or download button and selecting Download ZIP.

See the demo's README.md file for details.

This source code is merely a proof of concept and is intended only to illustrate this article. It is not designed for use in its current form in finished applications. The code is provided on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.

The demo is open source. See the demo's LICENSE.md file for licensing details.

Conclusion

In this article we reviewed the key attributes of various types of executable file then sketched out an algorithm for detecting them. We then developed a Delphi function that analysed files and returned a code describing the file type. A demo program to exercise the function was provided.

Credits

Thanks are due to Flurin Honegger for suggesting some of the "reasonableness" checks on the DOS header to verify a valid MS-DOS file.

Feedback

I hope you found this article useful.

If you have any observations, comments, or have found any errors there are two places you can report them.

  1. For anything to do with the article content, but not the downloadable demo code, please use this website's Issues page on GitHub. Make sure you mention that the issue relates to "article #8".
  2. For bugs in the demo code see the article-demo project's README.md file for details of how to report them.