What is a System Call
22C:169, Computer Security Notes
If you look at the Unix programmer's reference manual, you will find documentation for system calls and subroutines in the standard library look substantially the same. Consider the sbrk system call:
NAME sbrk - change data segment size SYNOPSIS #include <unistd.h> void *sbrk(intptr_t increment);
Contrast this with the interface to malloc in the standard C/C++ library:
NAME malloc - a memory allocator SYNOPSIS #include <stdlib.h> void *malloc(size_t size);
By design, C and C++ programmers can call either of these routines with essentially identical syntax. The following program illustrates this:
void * a = malloc( 64 ); void * b = sbrk( 64 );
Here, the variables a and b are declared identically, and the results of the two calls are two practically interchangable 64 byte blocks of memory. The only difference is that the block allocated by malloc can be freed later, returning it to the heap, while the other block is outside the heap manager's control and thus difficult to reclaim for other purposes. Mixing the two can lead to other problems because the heap manager may not be able to expand the heap if the programmer has also directly allocated storage using sbrk.
In point of fact, malloc() is an example of middleware, part of the C standard library, while sbrk() is a system call. Every byte of memory that that is returned to the application using malloc() was originally allocated by a call to sbrk() made within the system. The middleware adds functionality. Specifically, storage allocated by sbrk() is always added at the end of the static segment. It can be deallocated by shrinking the static segment, but this means that the net effect is LIFO allocation, akin to using a stack. In contrast, storage allocated by malloc() may be deallocated using free() and allocated memory blocks may be deallocated in any order - a pool of storage allowing this is called a heap, and the software that manages the heap, malloc() and free() in this case, is called a heap manager. It is up to the heap manager to maintain a data structure to organize the free space under its control (usually called the free list) and to efficiently reuse previously freed memory so that the total size of the static segment remains small.
But, the question we are interested in here is not "what functionality does malloc() add?" Rather, we are interested in how a call to a middleware routine in the system library differs from a system call. Put simply, calls to routines in the system library involve calls that are in the same protection domain as the caller.
User programs have very limited protection domains, while parts of the operating system need unlimited access to various system resources. The file system needs unlimited access to the disk in order to create files. The memory manager needs unlimited access to physical memory and to the memory management unit in order to create address spaces for user processes, and so on.
The simplest and most widespread hardware support for protection mechanisms involves adding a single flipflop to the CPU, the protection mode flipflop. The state of this flipflop defines two operating modes for the CPU:
On systems implementing this model, when the hardware detects an interrupt or trap condition, it saves not only the program counter and other registers, but also the value of the protection mode flipflop, sometimes called the protection state. On return from trap or return from interrupt, the CPU provides a way to restore not only the program counter and other registers but also the protection mode. On such a machine, therefore, all traps that occur while a program is in user state cause a change of protection domain.
Some of these traps, of course, signal program errors. Illegal instruction traps might be the result of programs that accidentally attempt to execute random data. Memory addressing traps are a typical result of attempting to follow undefined pointers.
Once we have a trap mechanism such as was described above, we have a mechanism that can be used to transfer control from a user program, running in user mode, to part of the operating system, running in kernel mode.
If you think of the protection mode bit in the CPU (and its associated semantics) as creating a fence between the current user's protection domain and the operating system's protection domain, then the trap mechanism can be thought of as creating a gate through this fence.
The most common implementation of system calls involves setting aside any particular trap or traps and using those traps as system calls. Consider these examples:
System calls are conceptually subroutine calls, so once the trap handler has determined that a system call is taking place, it simply calls the code of the appropriate system call, and then, on return, does a return from trap.
The system call mechanisms outlined above require significant bits of code to determie whether or not a particular trap is a system call or just an error on the part of the application program. Some (but not all) hardware designers have been aware of this problem since the 1960s and have invented a variety of mechanisms to speed up system calls:
In the 1960's, the developers of Multics invented an alternative way to do system calls. In their model, each page of the address space was marked with its protection level. In a simplified two-level version of this, pages are marked as either user pages or system pages. In user state, programs may not access system pages. This allows the operating system and the user program to share the same address space. When the user attempts to use or modify data a system page, there is a trap.
The developers of Multics added one more bit to each page of the address space (actually, to each page table entry). If this bit was set, it marked that page as a gateway. User programs could not read or write data from system pages, nor could they jump to arbitrary locations in system pages, but if a user program executed a call instruction to address zero in a system page that was marked as a gateway, the call was permitted. This allows each system call to be implemented as a call to a different gateway into the operating system.
Calls to gateway pages must push not only the return address but also the protection state of the CPU onto the stack, so that a return from the gateway will return to the callers protection state.
The developers of Multics went overboard with their design, using a 4-bit protection state that they called a ring number. Memory references were legal if the current protection state was less than or equal to the ring number marked on the page. So, ring 0000 was the innermost and most secure level, while level 1111 was the outermost or least secure level.
Back to the original question: How is it that calls to routines in the standard library and system calls look the same in C or C++ programs? Calls to user routines use simple subroutine call instructions, while calls to system routines must do strange things for gate crossing.
The answer is, the developers of Unix and all systems descended from Unix have created a special library of regular subroutines, one per system call. When you call, say, sbrk, this is a call to a library routine that gets the parameters, puts them in the right places to pass to the actual system call, then does whatever is required to make the system call occur. The trap handler then follows through, figuring out what system call is involved, getting the parameters for that call from the useer, and then using a regular call instruction to calling the real sbrk instruction in the operating system.
This ingenious approach to the problem of system calls hides the complexity from most users and from most system programs. Only the caller-side stub routines in the library and the code of the trap handler that calls the real system call need to be written in assembly language with an understanding of the eccentric mechanisms of the particular CPU being used. Everything else is just normal code.
General concepts covered here are covered in several other places. Google finds many copies of lecture notes on line that cover this materia, but the following is more likely to remain available:
IBM's Tutorial on the system call mechanism in their AIX flavor of Unix is very specific to one system but otherwise covers the right concepts.
General definitions of many of the terms used here can be found in or are linked from:
The Wikipedia entry for System call includes pointers to a number of relevant definitions