Attacking The Return Address

Part of 22C:169, Computer Security Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Defending Against Buffer Overflow Attacks

In the last lecture, we showed how a vulnerable C or C++ program could be attacked using a buffer-overflow attack. Now, it is time to ask, how do we defend against such an attack?

Don't use vulnerable routines

The obvious answer to the question, how do you defend yourself against such an attack is: Don't write that kind of code. For example, never use gets, the get-string routine used in our victim code. Most versions of the Unix programmer's reference manual say something like this:

When using gets, if the length of an input line exceeds the size of s, indeterminate behavior may result. For this reason, it is strongly recommended that gets be avoided in favor of fgets.

That is, replace gets(s) with fgets(s,SIZEOFS,stdin). The gets routine reads only from standard input and is not given any hints about the buffer size. In contrast, the fgets routine can read from any file, so you have to tell it which file, and it requires explicit information about the buffer length.

Unfortunately, because of the fault-prone semantics of arrays in C and C++, it is up to the programmer to tell fgets the array size. One way to suppress warnings about use of gets is to simply substitute fgets(s,10000,stdin) using a big number for the buffer size, since it's easier to pick big numbers out of thin air than it is to look up the declaration of the array and put in the right number.

The list of routines in the C library that create buffer overflow vulnerabilities is long. Among them are the strcat and strcpy routines in the strings library -- they concatenate and copy strings, using a buffer provided by the caller without knowing how long that buffer is. Again, alternative routines are available that take a buffer limit as a parameter, but again, it is up to the programmer to pass the correct value.

Don't use vulnerable languages

The gets routine in C can be trivially replaced with this equally unsafe code:

mygets( char * s ) {
        int ch;
        for (;;) {
                ch = getchar();
                if (ch == EOF) break;
                if (ch == '\n') break;
                *s = ch;
        *s = '\0';

This code does exactly what gets does. Searching programs for code like this is extremely difficult. The code is easy to read, well formatted and passes most guidelines for C and C++ programming style (excepting the complete lack of comments). Only when you read it and discover how it actually works, descending below the level of style into substance do you see the danger is poses.

The solution, of course, is to abandon languages like C and C++ in favor of "type safe" languages such as Java and Python (or even Pascal, Ada and other type-safe languages of the 1970s). This poses two problems:

First, type safety prevents writing some system code. The most critical such piece of code is the memory manager that underlies the heap-based storage management used for dynamically created objects. Object instantiation in C++, Java and Python requires that there be such a memory manager. In the latter two languages, this memory manager is more complex than in C because it does garbage collection.

Second, type safety at the operating system interface is impossible with current operating systems. The problem is that the operating system interface itself is outside the control of the programming language designer. The system interface is designed by the operating system designer, frequently in terms of a programming environment quite different from that envisioned by the language designer.

As a result, calls to operating system services invariably involve calls outside the boundary of the programming language being used. So, programming language desingers generally include tools in their languages for calling external routines, sometimes called native routines. Java certainly includes such hooks. Every such call is inherently dangerous, and the dangers cannot be dealt with in any simple way.

If you use a nominally type-safe language, remember that someone had to implement that language, and that the implementation almost certainly was done in an unsafe environment. This means that the implementation of a nominally type safe language may itself contain vunlerabilities. Occasionally, as a result, code that, on the surface, looks free from vulnerabilities is, in fact, unsafe because of an implementation error in the language.

Use Middleware

Instead of directly calling dangerous operating system calls, call them through an intermediate software layer. We do this all the time. Very few applications programs directly call read(0,&ch,1) (read from standard input into the variable ch one character). Instead, C programmers call getchar() which, for practical purposes, is equivalent to the call to read just given. In fact, getchar is better than the call to read several reasons, but the most important for our purposes is that the call to read has exactly the same safety problems as fgets -- it requires the buffer length (one character, in this example) to be passed as a parameter separately from the pointer to the buffer.

The middleware layers supporting languages such as Java are so well developed that most programmers never see the actual operating system interface. Nonetheless, it is there. The disadvantages of these middleware layers is that they can add considerably to the size of programs, and that they can, if improperly used, considerably slow down applications. Typically, this occurs because programmers don't understand the computational cost of the middleware when they design the algorithms they're using.

A serious problem with large middleware packages is that the middleware itself grows so complex that nobody understands it. The middleware isolates users from the underlying system, so that they don't understand it either, and this leads programmers to develop new middleware layers that sit on top of the old middleware layers in order to present a reasonable and comprehensible applicatons programming interface. This leads to a house of cards phenomona, where tiny little applications sit atop tall towers of middleware, where the lower layers are poorly understood and have unknown vulnerabilities that have long since been forgotten by the user community.

Don't Help the Attacker

Avoid giving away information that the attacker might need. The debugging information that compilers, by default, include in their object files allows an attacker to easily find the subroutines that they might want to get into. Obviously, making the attacker do more work is a good way to slow down the attack. With the Gnu C compiler, there are a huge number of debugging options. These can be turned off, typically, by asking the compiler to optimize its output and strip out all symbolic information from the object file.

Patch the Unsafe Language

The easy buffer overflow attacks all attack return addresses on the stack, so one solution is to use a compiler that does everything it can to separate addresses from other data. There are several sets of patches to the Gnu C compiler, for example, that replace use of a single stack for local variables and return addresses with two stacks, one for local variables, and the other used only for return addresses.

The price of this modificaton is an approximate doubling of the cost of subroutine entry. Where, formerly, there was just one stack-pointer register, there are now two, each of which gets incremented and decremented separately.

It's also possible to reorganize activation record structures in several ways. For example, pointer variables in the activation record can be carefully isolated from arrays. The language is still unsafe if arrays are not bounded, but by having the array storage separated from the stack used for other variables, buffer overflows will only damage other arrays.

Another option is to launch programs so that the code of the program runs at unpredictable addresses. If, each time a program is loaded from disk, it is relocated into a different memory address, selected at random, then a buffer overflow attack that relies on each routine being at a predictable address will fail. Buffer overflows can still create random damage, but it's much harder to create a predictable effect.

Each of these defenses is imperfect, but given the large volume of C and C++ code that is out in the world today, they can improve the situation by making that code harder to attack.

Other Sources of Information

There is a short tutorial on the Gnu debugger gdb on the web. It has links to an on-line reference card that contains, on one page, a useful summary of the debugger's commands. No doubt, Google can find other sources.

The Princeton GDB Tutorial

Unix programmers reference manual pages are available on most Unix and Linux systems using the man command. Type man man for the manual page on the man command, or type man gets to see what your system says about the gets command. These man pages are also spread all over the web. See, for example,

gets, fgets - get a string from a stream

The compiler options for GCC are extensive. They are all listed on-line:

Invoking GCC - Using the GNU Compiler Collection (GCC)

Here is a link to one interest web page on protecting GCC from these attacks:

GCC extension for protecting applications from stack-smashing attacks