How to detect the types of executable files
Abstract
This article looks at how we examine a file to check if it is a DOS or Windows executable and, if so, whether it is a program file or a DLL.
We will develop a function – ExeType – that will test a file and return a value that describes what type of excutable file it is, if any.
An install program may need this information, or you may need to check a given file is a program or DLL before attempting to run or load it.
Outline design
Before we start coding our function, let us look at how we're going to accomplish this task. Our approach will be to open the file of interest and scan through it looking for markers to indicate its file type.
All Windows executable files begin with a MS-DOS executable stub, so we first test for a valid MS-DOS executable using information from the MS-DOS program header that is present in every executable file. We then check for markers for a 16 bit or 32 bit Windows executable or for a virtual device driver (VXD). If we establish the file is a Windows executable we look for information that determines whether the file is an application or is a DLL. A review of the MS-DOS, Windows NE (16 bit) and PE (32 bit) executable file formats leads us to note the following:
- All DOS program files (and therefore Windows executables) begin with a "magic number"; the word value $5A4D ("MZ" in ASCII).
- We use the DOS header to check that the file length exceeds or is equal to the minimum length of the DOS executable and that the offset of the DOS relocation table lies within the file.
- Windows executables have a header record whose offset in the file is given by the long word at offset $3C.
- The Windows header begins with a "magic number" word whose value indicates whether this is a 16bit (NE format) or 32 bit (PE format) executable or a virtual device driver (LE format). The word is $454E ("NE" in ASCII), $4550 ("PE") or $454C ("LE").
- 32 bit Windows executables have an "image header" that starts four bytes after the beginning of the Windows header. This header structure has a Characteristics field which is a bit mask. If the bit mask contains the flag IMAGE_FILE_DLL then the file is a DLL, otherwise it is a program file.
- 16 bit Windows programs have a byte sized field at offset $0D from the start of the Windows header which is a bit mask providing information about the file. If this field contains the flag $80 then the file is a DLL, otherwise it is a program.
The following flowchart illustrates the tests we will use, based on the information above:
So, in pseudo-code, our function can be described as:
Open file (Return Error type if doesn't exist)
Read DOS header from file (Return Unknown file type if can't read)
If first word = 'MZ' and file size is valid then
Get offset of Window header record (Return DOS type if can't read)
Get Windows header record data (Return DOS type if can't read)
case first word of
'PE':
Read Windows exe header record (Return DOS type if can't read)
if Characteristics field contains IMAGE_FILE_DLL flag then
Return 32 bit Windows DLL type
else
Return 32 bit Windows application type
end
'NE':
Read in byte at offest $0D (Return DOS type if can't read)
if byte field contains flag $80 then
Return 16 bit Windows DLL type
else
Return 16 bit Windows application type
end
'LE':
Return VXD device driver type
else:
Return DOS type
end
else
Return Unknown file type
end
Listing 1
We now have an outline design from which to work. In the next section we'll begin coding the function.
Developing the function
Now we have an outline design, we can begin creating our function. Recall that it will analyse a given file and return a value to indicate the type of file found. Listing 2 shows the function prototype:
1function ExeType(const FileName: string): TExeFileKind;
Listing 2
The return value is an enumerated type, which is defined as follows.
1type
2
3 TExeFileKind = (
4 fkUnknown,
5 fkError,
6 fkDOS,
7 fkExe32,
8 fkExe16,
9 fkDLL32,
10 fkDLL16,
11 fkVXD
12 );
Listing 3
We will now examine the function's implementation. The code follows the logic presented in the first part of the article. First, here's the function prototype again:
1
2
3function ExeType(const FileName: string): TExeFileKind;
Listing 4
Immediately following the prototype we declare some constants to hold the fixed offsets, flags and "magic numbers" we will need to use:
1const
2 cDOSRelocOffset = $18;
3 cWinHeaderOffset = $3C;
4 cNEAppTypeOffset = $0D;
5 cDOSMagic = $5A4D;
6 cNEMagic = $454E;
7 cPEMagic = $4550;
8 cLEMagic = $454C;
9 cNEDLLFlag = $80;
Listing 5
The constants are followed by the the function's local variables. First we declare a variable to reference a file stream object that we will be using to read the file. We then have a variable to store Windows executable "magic numbers", along with another variable to store the offset of the Windows header record. We then declare variables to read information from the various header records - a byte in the case of the NE file format, an IMAGE_FILE_INFO structure for PE format files and an IMAGE_DOS_HEADER structure for the DOS file header. Finally we have a variable to store the minimum expected size of the MS-DOS file. Listing 6 has the the variable declarations:
1var
2 FS: TFileStream;
3 WinMagic: Word;
4 HdrOffset: LongInt;
5 ImgHdrPE: IMAGE_FILE_HEADER;
6 DOSHeader: IMAGE_DOS_HEADER;
7 AppFlagsNE: Byte;
8 DOSFileSize: Integer;
Listing 6
The IMAGE_DOS_HEADER structure is not declared in the Windows unit, so we must define it ourselves as shown in Listing 7.
1type
2 IMAGE_DOS_HEADER = packed record
3 e_magic : Word;
4 e_cblp : Word;
5 e_cp : Word;
6 e_crlc : Word;
7 e_cparhdr : Word;
8 e_minalloc: Word;
9 e_maxalloc: Word;
10 e_ss : Word;
11 e_sp : Word;
12 e_csum : Word;
13 e_ip : Word;
14 e_cs : Word;
15 e_lfarlc : Word;
16 e_ovno : Word;
17 e_res : packed array [0..3] of Word;
18 e_oemid : Word;
19 e_oeminfo : Word;
20 e_res2 : packed array [0..9] of Word;
21 e_lfanew : Longint;
22 end;
Listing 7
We now start to process the file. Before we can perform any analysis we need to open the file for reading. A read only file stream is used to do this. We also need to handle any exceptions raised when reading the file. A skeletal outline of the body of the function is shown in Listing 8. This listing illustrates how we open and close the file stream and handle any exceptions raised.
1begin
2 try
3
4 FS := TFileStream.Create(FileName, fmOpenRead + fmShareDenyNone);
5 try
6
7 finally
8 FS.Free;
9 end;
10 except
11
12 Result := fkError;
13 end;
14end;
Listing 8
If the file doesn't exist then an exception will be raised by the stream constructor. This, and any other exceptions, are trapped and converted into an error result. We use an inner try..finally
block to ensure thefile stream gets closed.
The file analysis code fits inside the inner try..finally
block in the above code fragment. The remainder of this section is devoted to a discussion of how we perform the analysis. We begin by attempting to read the DOS header record. We then use the information in the header to perform various checks. The code is shown in Listing 9.
1
2 Result := fkUnknown;
3
4 if FS.Size < SizeOf(DOSHeader) then
5 Exit;
6 FS.ReadBuffer(DOSHeader, SizeOf(DOSHeader));
7
8 if DOSHeader.e_magic <> cDOSMagic then
9 Exit;
10
11
12
13
14 if (DOSHeader.e_cblp = 0) then
15 DOSFileSize := DOSHeader.e_cp * 512
16 else
17 DOSFileSize := (DOSHeader.e_cp - 1) * 512 + DOSHeader.e_cblp;
18 if FS.Size < DOSFileSize then
19 Exit;
20
21 if DOSHeader.e_lfarlc > DOSFileSize then
22 Exit;
23
24 Result := fkDOS;
Listing 9
We first assume the file format is unknown. Then we check that the file is large enough to contain the DOS header and exit if this is not the case. (Remember that the finally
block is always executed following an Exit, so our file stream will be freed). If the file is large enough we read the DOS header before performing these three checks on it:
- That the e_magic field stores the required magic number.
- That the file has the required length. The e_cp field stores the number of 512 byte pages in the file. The e_cblp field stores the number of bytes in the last 512 byte page that are used. The expected file length is calculated from these two values and this is checked against the actual size of the file. (Note that the file size can be greater than the expected size, since it is possible to append data to an executable file – see article #7 for further information).
- That the offset of the DOS relocation table per the e_lfarlc field falls within the file.
If all these tests are passed then it is safe to assume we have at least a DOS executable, so we set the function result accordingly.
The next thing to do is to try to find a Windows file header and read the Windows executable magic number at the start of it. Listing 10 has the code to do this:
1
2 if FS.Size <= cWinHeaderOffset + SizeOf(LongInt) then
3
4
5 Exit;
6
7 FS.Position := cWinHeaderOffset;
8 FS.ReadBuffer(HdrOffset, SizeOf(LongInt));
9
10 if FS.Size <= HdrOffset + SizeOf(Word) then
11
12 Exit;
13 FS.Position := HdrOffset;
14
15 FS.ReadBuffer(WinMagic, SizeOf(Word));
Listing 10
We first check that the file is large enough to store the Windows header offset, and bail out if not. If we bail out we assume the file is a DOS executable (the return value was set earlier). We then read in the offset and once again check that the file is large enough to hold a magic number located at the file position given by the offset. If so we move the file pointer to the start of the header and read the magic number into the WinMagic variable.
We now use the magic number to determine what kind of file we have. PE, NE & LE format files are then processed separately. In the case of NE & PE format files we also need to check whether the file is a DLL or application. A case statement is used to do the checking. Its outline is as follows:
1 case WinMagic of
2 cPEMagic:
3
4 cNEMagic:
5
6 cLEMagic:
7
8 else
9
10 end;
Listing 11
We'll discuss each of the cases separately.
PE Format
For PE format files we attempt to read the Windows header record (of type IMAGE_FILE_HEADER) from the file. If the Characteristics field of the structure (a bit mask) contains the IMAGE_FILE_DLL flag then we have a DLL, otherwise we have an application.
Listing 12 has the code for the PE part of the above case statement:
1
2
3
4 if FS.Size < HdrOffset + SizeOf(LongWord) + SizeOf(ImgHdrPE) then
5
6 Exit;
7
8 FS.Position := HdrOffset + SizeOf(LongWord);
9 FS.ReadBuffer(ImgHdrPE, SizeOf(ImgHdrPE));
10 if (ImgHdrPE.Characteristics and IMAGE_FILE_DLL)
11 = IMAGE_FILE_DLL then
12
13 Result := fkDLL32
14 else
15
16 Result := fkExe32;
Listing 12
Note that, once again, before attempting to read the header we check the file is large enough to contain it and bail out if not, assuming a DOS executable. If we successfully read the header we check the Characteristics field for the required flag.
NE Format
For NE format files we read the byte at offset $0D from the start of the header and check to see if it contains a bit flag $80. If so, we have a DLL and if not we have an application.
The code for the NE part of the above case statement, which follows similar logic to that for the PE header, is:
1
2 if FS.Size <= HdrOffset + cNEAppTypeOffset
3 + SizeOf(AppFlagsNE) then
4
5 Exit;
6
7 FS.Position := HdrOffset + cNEAppTypeOffset;
8 FS.ReadBuffer(AppFlagsNE, SizeOf(AppFlagsNE));
9 if (AppFlagsNE and cNEDLLFlag) = cNEDLLFlag then
10
11 Result := fkDLL16
12 else
13
14 Result := fkExe16;
Listing 13
LE Format
For LE Format files there is no further checking to be done. We simply return that we have found a virtual device driver:
DOS Format
This just leaves the trivial case of when none of the magic numbers are present. In this case we assume that the file is a DOS application. Since we have already set the function result to the DOS file type there is nothing to do. We simply place a comment to document this fact:
Putting it all together
Our ExeType function is now complete. Listing 16 shows the complete function, along with the required type definitions.
1type
2
3
4 IMAGE_DOS_HEADER = packed record
5 e_magic : Word;
6 e_cblp : Word;
7 e_cp : Word;
8 e_crlc : Word;
9 e_cparhdr : Word;
10 e_minalloc: Word;
11 e_maxalloc: Word;
12 e_ss : Word;
13 e_sp : Word;
14 e_csum : Word;
15 e_ip : Word;
16 e_cs : Word;
17 e_lfarlc : Word;
18 e_ovno : Word;
19 e_res : packed array [0..3] of Word;
20 e_oemid : Word;
21 e_oeminfo : Word;
22 e_res2 : packed array [0..9] of Word;
23 e_lfanew : Longint;
24 end;
25
26
27 TExeFileKind = (
28 fkUnknown,
29 fkError,
30 fkDOS,
31 fkExe32,
32 fkExe16,
33 fkDLL32,
34 fkDLL16,
35 fkVXD
36 );
37
38function ExeType(const FileName: string): TExeFileKind;
39
40
41const
42 cDOSRelocOffset = $18;
43 cWinHeaderOffset = $3C;
44 cNEAppTypeOffset = $0D;
45 cDOSMagic = $5A4D;
46 cNEMagic = $454E;
47 cPEMagic = $4550;
48 cLEMagic = $454C;
49 cNEDLLFlag = $80
50var
51 FS: TFileStream;
52 WinMagic: Word;
53 HdrOffset: LongInt;
54 ImgHdrPE: IMAGE_FILE_HEADER;
55 DOSHeader: IMAGE_DOS_HEADER;
56 AppFlagsNE: Byte;
57 DOSFileSize: Integer;
58begin
59 try
60
61 FS := TFileStream.Create(FileName, fmOpenRead + fmShareDenyNone);
62 try
63
64 Result := fkUnknown;
65
66 if FS.Size < SizeOf(DOSHeader) then
67 Exit;
68 FS.ReadBuffer(DOSHeader, SizeOf(DOSHeader));
69
70 if DOSHeader.e_magic <> cDOSMagic then
71 Exit;
72
73
74
75
76 if (DOSHeader.e_cblp = 0) then
77 DOSFileSize := DOSHeader.e_cp * 512
78 else
79 DOSFileSize := (DOSHeader.e_cp - 1) * 512 + DOSHeader.e_cblp;
80 if FS.Size < DOSFileSize then
81 Exit;
82
83 if DOSHeader.e_lfarlc > DOSFileSize then
84 Exit;
85
86 Result := fkDOS;
87
88 if FS.Size <= cWinHeaderOffset + SizeOf(LongInt) then
89
90
91 Exit;
92
93 FS.Position := cWinHeaderOffset;
94 FS.ReadBuffer(HdrOffset, SizeOf(LongInt));
95
96 if FS.Size <= HdrOffset + SizeOf(Word) then
97
98 Exit;
99 FS.Position := HdrOffset;
100
101 FS.ReadBuffer(WinMagic, SizeOf(Word));
102 case WinMagic of
103 cPEMagic:
104 begin
105
106
107
108 if FS.Size < HdrOffset + SizeOf(LongWord) + SizeOf(ImgHdrPE) then
109
110 Exit;
111
112 FS.Position := HdrOffset + SizeOf(LongWord);
113 FS.ReadBuffer(ImgHdrPE, SizeOf(ImgHdrPE));
114 if (ImgHdrPE.Characteristics and IMAGE_FILE_DLL)
115 = IMAGE_FILE_DLL then
116
117 Result := fkDLL32
118 else
119
120 Result := fkExe32;
121 end;
122 cNEMagic:
123 begin
124
125 if FS.Size <= HdrOffset + cNEAppTypeOffset
126 + SizeOf(AppFlagsNE) then
127
128 Exit;
129
130 FS.Position := HdrOffset + cNEAppTypeOffset;
131 FS.ReadBuffer(AppFlagsNE, SizeOf(AppFlagsNE));
132 if (AppFlagsNE and cNEDLLFlag) = cNEDLLFlag then
133
134 Result := fkDLL16
135 else
136
137 Result := fkExe16;
138 end;
139 cLEMagic:
140
141 Result := fkVXD;
142 else
143
144 ;
145 end;
146 finally
147 FS.Free;
148 end;
149 except
150
151 Result := fkError;
152 end;
153end;
Listing 16
Demo program
A demo program to accompany this article can be found in the delphidabbler/article-demos
Git repository on GitHub.
You can view the code in the article-08
sub-directory. Alternatively download a zip file containing all the demos by going to the repository's landing page and clicking the Clone or download button and selecting Download ZIP.
See the demo's README.md file for details.
This source code is merely a proof of concept and is intended only to illustrate this article. It is not designed for use in its current form in finished applications. The code is provided on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.
The demo is open source. See the demo's LICENSE.md file for licensing details.
Conclusion
In this article we reviewed the key attributes of various types of executable file then sketched out an algorithm for detecting them. We then developed a Delphi function that analysed files and returned a code describing the file type. A demo program to exercise the function was provided.
Credits
Thanks are due to Flurin Honegger for suggesting some of the "reasonableness" checks on the DOS header to verify a valid MS-DOS file.
Feedback
I hope you found this article useful.
If you have any observations, comments, or have found any errors there are two places you can report them.
- For anything to do with the article content, but not the downloadable demo code, please use this website's Issues page on GitHub. Make sure you mention that the issue relates to "article #8".
- For bugs in the demo code see the
article-demo
project's README.md
file for details of how to report them.