Parameters Validity Checking

Part of 22C:169, Computer Security Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

The Problem

When a user program passes a pointer to an operating system service, there are several problems that can occur:

What if the user had no right to use the memory referenced by that pointer? Consider this situation: The user calls read, passing a pointer to a buffer that is entirely within the operating system's part of the address space. In this case, the protection mechanisms that prevented the user from directly modifying the operating system no longer apply because the system is the one that actually copies data from the input device into the memory pointed to by the user's pointer. The code for a carelessly written read system service could therefore be used by a careless or malicious user to overwrite any part of the operating system.
What if the user had legitimate rights to read some page of memory, but not to write it -- for example, that page might be a page of the shared code segment of the user process. If the user does a read operation, passing an address in that page as the buffer address, what prevents the system from reading data from some file directly into that page, changing the shared read-only code segment. This could end up having an effect on other users of that code segment.
What if the user had legitimate rights to the first page referenced by that pointer, but not to later pages. Consider this situation: The user buffer is in the last read-write page of one memory segment. The next page in sequence is the code segment of a shared subroutine library. The program asks the operating system to read into a buffer that spans from one page to the next. As a result, the user causes the operating system to change a read-only shared page.

The first scenario above is specific to the single-address-space model, but the other two scenarios are equally problematic in either single or multiple address-space models. There is yet another problem that is specific to the single-address-space model:

What if the operating system pushes activation records for system calls on the same stack as was used by the user for user activation records? In this case, during the execution of a system call, some of the data on the stack in the user's stack segment is system data. The user should, of course, be free to allocate buffer space as local variables in a user activaiton record, so the system must be willing to read and write buffers in the user's stack segment. If, however, the user overwrites a system activation record, all kinds of bad things might happen: Overwriting a system return address can force execution of arbitrary code within the system. Overwriting a stored processor status word can cause the system to return from a system call leaving the CPU in user state.

Another problem occurs in either model:

What happens if there is a trap within a trap service routine? Some architectures push the saved processor status word and program counter on "the" stack when there is a trap. With these, traps may cause traps that cause traps. Other systems allot one kind of trap to cause another kind, for example, the trap service routine for an illegal instruction trap may cause an illegal address trap, but the system cannot handle an illegal instruction within the illegal instruction trap service routine. Yet other systems flatly forbid any trap within a trap service routine.

What happens if there is a trap that the system can't handle? The usual consequence is either a CPU hang (that is, the fetch-execute cycle of the CPU actually stops working) or an infinite loop (the trap service routine immediately causes another trap that immediately causes another trap that ...). Some well behaved systems have a special trap for such circumstances that takes the system into a controlled state of inactivity -- the famed "blue screen of death" of an old Microsoft operating system, or the very rare "I'm sorry, you will have to reboot your system" message under MacOS-X are examples of such controlled failures.

Pointer Parameter Validation

All of the above examples illustrate the need for any system call that takes a pointer as a paremter to validate that parameter -- that is, to check to see that the memory referred to by that pointer is indeed memory that the user is entitled to access. Where you might write this code in a user-written subroutine to zero out a block of memory:

void zero( char * buf, int len ) {
        int i;
        for (i = 0; i < len; i++) {
                *buf = 0;
                buf ++;
        }
}

The operating system will have to do something like this instead:

void zero( char * buf, int len ) {
        int i;
        for (i = 0; i < len; i++) {
	char * sysbuf;
                if (user_access_rights_allow( buf, write )) {
                        sysbuf = make_system_pointer( buf );
                        *sysbuf = 0;
                        buf ++;
                } else {
                        report_user_protection_violation();
                }
        }
}

Note that on a system with the single-address-space model, the make_system_pointer routine is trivial -- it does exactly nothing, but note that parameter validation is the major cost. In many systems, the system architects simplify this code by combining the two functions into one, allowing the above to be rewritten as:

void zero( char * buf, int len ) {
        int i;
        for (i = 0; i < len; i++) {
	char * sysbuf;
                sysbuf = make_and_verify_system_pointer( buf, write );
                *sysbuf = 0;
                buf ++;
        }
}

Here, the make_and_verify_system_pointer routine is expected to either make a system pointer in the case that the requested operation is permitted, or to raise an appropriate exception in the case that the user should not be permitted the requested operation.

What If A System Call Fails?

It is easy to suggest that the code inside a system call should raise an exception, aborting that call in the event that the user passes a pointer that is invalid, but this poses its own problems. C++, Java, C# and Ada all have well defined exception models, but they are not the same. Other languages such as C and Cobol have no exception model. An operating system that is designed to provide a neutral base on which any of these languages can be implemented cannot rely on the the user supporting any particular exception model.

There is another problem. The gate-crossing mechanisms we have discussed are based on two hardware mechanisms, the trap mechanism for user calls to the system, plus the return-from-trap mechanism for all returns of control from the system to the user. Given that we are limeted to these two mechanisms, how does the system report exception conditions to user programs?

The designers of Unix (Dennis Ritchie and Ken Thompson) opted for a very simpleminded apporoach to reporting exception conditions raised in system calls: Many Unix system calls are functions that return an integer value. A return value of -1 (or more generally, a negative return value) is used to indicate that an exception occurred. Further details are returned in the global variable errno which is an integer error code. (defined in the intro to section 2 of the Unix programmer's reference manual, accessible with man 2 intro). Windows has copied this usage from Unix. The perror standard library routine prints the textual interpretation of errno to the standard error stream.

In the case of the Unix read and write Unix system calls, the return value indicates the number of bytes actually read or written into or from the user's buffer. There are many reasons that these calls might read or write fewer than the number of characters specified as the buffer size in the call:

On read, an end of file was encountered before the entire buffer was filled.
On read, the user hit the enter key on a device that was configured for reading input one line at a time.
On read or write, a timer expired that cut the I/O transfer short.
On read or write, the buffer was not entirely readable or writable.

There are two ways that the return from trap could set errno. In one, prior to returning from the trap, the system would reach into a pre-agreed location in the user's address space -- the address of the errno variable, and plant the error code. Some Unix variants have probably done this, but this scheme is not thread-safe. The alternative has kernel calls return two integers, for example, returning the normal return value in register 3, if that is the usual place for single word return values from functions, and returning the error code in register 4, in the case that using registers 3 and 4 is the normal way to return double word return values. The kernel call stub code in the standard library would then be responsible for moving the error code to errno in a way that was compaatable with the thread model, if any, of the language being used.

There are alternatives to this crude model of exception handling. One alternative is actually present in Unix, although it was added to Unix after the basic error handling model for system calls had been finalized. In Unix, it is possible to register a function as an signal handler. The Unix kernel has a long list of signal conditions (use man signal to see the list). Many of these signals are used to report hardware trap conditions to the program (SIGSEG reports segmentation violtion, SIGILL reports an illegal instruction).

Unix user programs may register signal handlers for any such condition using the signal system call (actually a standard library call in most modern Unix variants, but that is because the actual system call has become very much more complicated). If a user calls signal(SIGSEG,p), then, when a segmentation violation occurs, the system will call the user function p(SIGSEG).

In a language like C++, the most natural thing to do would be to install a handler for SIGSEG that raised a C++ segmentation violation exception, that is, a translation from the Unix model of signals to the C++ model of what an exception ought to be.

Callbacks

The Unix signal mechanism requires that the kernel be able to call a user subroutine, the signal handler, and that, on return, the user subroutine returns to the operating system. While it is running, the user routine is running in system state, while when it returns, it returns control to the system, running in system state.

Recall that, while the system has control, all of the user's registers are stored in a register save area available to the system, and recall that the system can peek into the user's address space and poke around, as needed, to do anything at all.

To do a callback, the system adjusts the contents of the user's registers and memory to simulate the effect of a normal procedure call to the user's routine that the system wants to call, and then does a return from trap instruction to begin execution of that routine. All this requires is that the user and operating system agree on the calling sequence to be used for callback subroutines. In the case of C, C++ and Unix, the agreement is that the operating system will adhere to the standard calling sequence conventions for C and C++.

Return from a callback requires that the return instruction at the end of the callback subroutine cause a trap -- any pre-arranged trap will do. So, if the system never maps real memory into page zero of the user's address space, the system can pass the user's callback routine a return address of zero, and then, on encountering a segmentation violation trap where the PC value was zero, interpret this as a return from callback.

This model has one problem: The system called the user's callback by doing a return from trap, and the user's return from the callback to the system was done by a trap. This forces very awkward code structures into the operating system. The system call to the user's handler does not resemble, in any sensible way, a call to a normal system subroutine or user subroutine. This is a great place to make serious errors in the coding of an operating system, and looking at the history of real systems, this is a common source of security faults.

References

For some light reading on the history of Unix, consider reading this:

The Evolution of the Unix Time-sharing System by Dennis M. Ritchie. This is a first-hand account of the early years of Unix.

For a general introduction to many system call issues, see:

Unix System Call Links from Softpanorama.

There's an on-line book chapter with considerable discussion of parameter validation:

Introduction to Computer Security, Section 20.4, by Matt Bishop.